HDInsight

Welcome! You can use this site to tell the Microsoft HDInsight team what features you would like to see.

Remember that this site is for feature suggestions and ideas…

If you have technical questions, please visit our forums.
If you are looking for tutorials and documentation, please visit our getting started page.

  1. Support control M client on an HDInsight Interactive Query Cluster

    I see that Microsoft do not recommend installing Control M on HDInsight cluster. Please consider this as an suggestion and support this feature.. Thanks..

    2 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    0 comments  ·  Workload  ·  Flag idea as inappropriate…  ·  Admin →
  2. Low Priority (Spot) VM support in HDInsight

    Hey HDInsight team, I was excited to see that Azure announced spot pricing for VMs recently.

    https://azure.microsoft.com/en-us/pricing/spot/

    It would be great to also have this type of low-priority VM pricing available for HDInsight worker nodes. Would greatly reduce my cost and allow me to move more workload to Azure (AWS EMR currently does support spot pricing which makes it more cost-competitive).

    Thanks!

    1 vote
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    0 comments  ·  Workload  ·  Flag idea as inappropriate…  ·  Admin →
  3. apache atlas

    Having Apache Atlas with Hdinsight for Data Catalog and lineage would be great feature. Any plans for this on the road map ?

    79 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    2 comments  ·  Workload  ·  Flag idea as inappropriate…  ·  Admin →
  4. HDInsight support for backend DB's for Hive/Oozie/Ranger

    HDP does support MySQL and MariaDB as backend DB's for Hive/Oozie/Ranger, but HDI only supports MS-SQL. MS-SQL is not officially supported as a backend DB for HDP stack though. Since Azure portal offers Azure DB for MariaDB and Azure DB for MySQL, these options should be offered to HDI as well. Is that on the roadmap to give users options and to have a supported environment?

    1 vote
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    0 comments  ·  Workload  ·  Flag idea as inappropriate…  ·  Admin →
  5. Externalize Grafana. Let us connect external grafana to the HDInsight (HBase) grafana datasource

    We have a centralized Grafana server for monitoring Azure services and Infra components. We would like to integrate HDInsight Grafana with the central server. Can you provide the Data source for HDInsight so that we can achieve this?

    5 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    0 comments  ·  Workload  ·  Flag idea as inappropriate…  ·  Admin →
  6. Update kafka version

    Current version is two years old... Latest is greatest.

    5 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    0 comments  ·  Workload  ·  Flag idea as inappropriate…  ·  Admin →
  7. Add persistent storage(ADL/Blob) as a backend storage for HDInsight Kafka Cluster

    Like other HDInsight cluster i.e. Hadoop, HBase , Kafka cluster should also have option to use Azure storage account or datalake as backend storage.

    It will help user to restore the kafka logs in case if cluster crashes.

    9 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    1 comment  ·  Workload  ·  Flag idea as inappropriate…  ·  Admin →
  8. Remove ":22" from copyable SSH command

    In the SSH + Cluster login section, applications for an HDInsight cluster have a copyable command of ssh sshuser@<hdinsight-hostname>:22.

    On Linux and macOS this is not a valid SSH command. ssh sshuser@<hdinsight-hostname> is a valid SSH command, without the ":22".

    Please remove the ":22" suffix for HDInsight applications.

    4 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    0 comments  ·  Workload  ·  Flag idea as inappropriate…  ·  Admin →
  9. Enable dynamic allocation for Spark executors by default

    I would like the default executor allocation in Spark to be dynamic instead of static as it is now.

    12 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    1 comment  ·  Workload  ·  Flag idea as inappropriate…  ·  Admin →
  10. Support reading Azure Data Lake data from Apache Spark on HDInsight

    Currently many open source applications (eg. Apache Hive) are supported (https://azure.microsoft.com/en-gb/documentation/articles/data-lake-store-compatible-oss-other-applications/). It would be great to have support for Apache Spark running in HDInsight clusters, too.

    60 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    4 comments  ·  Workload  ·  Flag idea as inappropriate…  ·  Admin →
  11. Add support for AppendBlob in hdinsight

    According to https://social.msdn.microsoft.com/Forums/sqlserver/en-US/3001af0c-7f0b-440a-ae65-08d563a5823f/azure-append-blob-storage-does-not-support-spark-textfile-api?forum=hdinsight

    HDInsight only support blockblob.

    Appendblob is ideal for archiving data in time slices, but it can't be consumed by Spark over hdinsight etc.

    60 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    3 comments  ·  Workload  ·  Flag idea as inappropriate…  ·  Admin →
  12. Add Support for Seaborn data visualization python library

    Deploying Spark code that runs using PySpark kernel on HDInsight does not support code that includes Seaborn libraries for visualization.

    15 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    0 comments  ·  Workload  ·  Flag idea as inappropriate…  ·  Admin →
  13. Provide support for reading Azure Table Storage data from Apache Spark

    Currently Azure Tables are not supported. Only Azure blobs support the HDFS interface required by Hadoop & Spark.

    77 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    1 comment  ·  Workload  ·  Flag idea as inappropriate…  ·  Admin →
  14. 29 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    1 comment  ·  Workload  ·  Flag idea as inappropriate…  ·  Admin →
  15. Provide several industry standard data mining algorithms designed to be processed in a mapreduce hadoop cluster; complete with visualization

    Looking at data mining in analysis services along with its visualization. Provide these same algorithms (maybe more) to be processed instead of on a data source view, in a mapreduce fashion against data in HDFS, whereby data selection and algorithm processing is distributed, collected, re-distributed, until a logical regression limit is met, then assemble the results and provide great visualizations.

    34 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    1 comment  ·  Workload  ·  Flag idea as inappropriate…  ·  Admin →
    under review  ·  matt winkler responded

    Dave,

    Thanks for the feedback, we’re looking into enabling scenarios like this. I would be curious to learn the type of algorithms you’d like to see here.

    —matt

  16. Support MySQL / PostgreSQL metastores

    HDInsight supports only Azure SQL as metastores. At least through the portal. Through Ambari there are further options for other database providers such as MySQL or PostgreSQL. Can these be enabled through the portal please?

    1 vote
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    0 comments  ·  Workload  ·  Flag idea as inappropriate…  ·  Admin →
  17. Please make Azure Cloud Shell support winscp

    As a windows user, I choose Azure because it is a Microsoft product, so I suppose they can work well together. However, I have to use ssh client for Hadoop cluster operation, but it seems most of your ssh client does not work well for windows machine, for example, scp command cannot work well for putting local file from windows machine to clusters, while winscp can do it well, so why don't you make your ssh client support winscp well?

    1 vote
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    0 comments  ·  Workload  ·  Flag idea as inappropriate…  ·  Admin →
  18. Specify which version of HDP to use during cluster creation

    HDP 2.6.5 supports Spark 2.3.0 which has additional functionality to work with Pandas UDFs.

    https://databricks.com/blog/2017/10/30/introducing-vectorized-udfs-for-pyspark.html

    What would it take to upgrade HDP? Would I have to follow this guide to upgrade or is there a parameter I can pass during cluster creation that allows me to specify the HDP version?

    https://docs.hortonworks.com/HDPDocuments/Ambari-2.6.2.2/bkambari-upgrade/content/ambariupgrade_guide.html

    3 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    0 comments  ·  Workload  ·  Flag idea as inappropriate…  ·  Admin →
  19. "Vertex Failure" issue, arising out of Hive LLAP, when running complex queries in HDP 2.6 cluster. We are using Azure Blob as default FS.

    Repetitive and random "Vertex Failure" error arising out of Hive LLAP, when running complex queries in HDP 2.6 cluster. We are using Azure Blob as default FS and have appropriate keys in place. Sometimes the queries are running and other time, it is not.

    Please find the exact error below and I have attached the error log report as well:

    Vertex Failure: Container * in account .blob.core.windows.net not found, and we can't create it using anoynomous credentials, and no credentials found for them in the configuration.

    Please assist us in resolving this (or you can direct us to a place)…

    1 vote
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    1 comment  ·  Workload  ·  Flag idea as inappropriate…  ·  Admin →
  20. Better support of additional Big Data components on HDInsight

    Please provide support for BD components that are allowed to be installed on HDI by MS using automated scripts. For example, Solr install is allowed, but only installs on local file systems. There is no support and lack of guidance on how to deploy Solr to write index collection data to ADL (thru HDFS).

    1 vote
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    0 comments  ·  Workload  ·  Flag idea as inappropriate…  ·  Admin →
← Previous 1 3
  • Don't see your idea?

HDInsight

Categories

Feedback and Knowledge Base