HDInsight
Welcome! You can use this site to tell the Microsoft HDInsight team what features you would like to see.
Remember that this site is for feature suggestions and ideas…
If you have technical questions, please visit our forums.
If you are looking for tutorials and documentation, please visit our getting started page.
-
Support control M client on an HDInsight Interactive Query Cluster
I see that Microsoft do not recommend installing Control M on HDInsight cluster. Please consider this as an suggestion and support this feature.. Thanks..
2 votes -
Low Priority (Spot) VM support in HDInsight
Hey HDInsight team, I was excited to see that Azure announced spot pricing for VMs recently.
https://azure.microsoft.com/en-us/pricing/spot/
It would be great to also have this type of low-priority VM pricing available for HDInsight worker nodes. Would greatly reduce my cost and allow me to move more workload to Azure (AWS EMR currently does support spot pricing which makes it more cost-competitive).
Thanks!
1 vote -
apache atlas
Having Apache Atlas with Hdinsight for Data Catalog and lineage would be great feature. Any plans for this on the road map ?
82 votes -
HDInsight support for backend DB's for Hive/Oozie/Ranger
HDP does support MySQL and MariaDB as backend DB's for Hive/Oozie/Ranger, but HDI only supports MS-SQL. MS-SQL is not officially supported as a backend DB for HDP stack though. Since Azure portal offers Azure DB for MariaDB and Azure DB for MySQL, these options should be offered to HDI as well. Is that on the roadmap to give users options and to have a supported environment?
1 vote -
Externalize Grafana. Let us connect external grafana to the HDInsight (HBase) grafana datasource
We have a centralized Grafana server for monitoring Azure services and Infra components. We would like to integrate HDInsight Grafana with the central server. Can you provide the Data source for HDInsight so that we can achieve this?
5 votes -
Update kafka version
Current version is two years old... Latest is greatest.
8 votes -
Add persistent storage(ADL/Blob) as a backend storage for HDInsight Kafka Cluster
Like other HDInsight cluster i.e. Hadoop, HBase , Kafka cluster should also have option to use Azure storage account or datalake as backend storage.
It will help user to restore the kafka logs in case if cluster crashes.9 votes -
Remove ":22" from copyable SSH command
In the SSH + Cluster login section, applications for an HDInsight cluster have a copyable command of
ssh sshuser@<hdinsight-hostname>:22
.On Linux and macOS this is not a valid SSH command.
ssh sshuser@<hdinsight-hostname>
is a valid SSH command, without the ":22".Please remove the ":22" suffix for HDInsight applications.
4 votes -
Enable dynamic allocation for Spark executors by default
I would like the default executor allocation in Spark to be dynamic instead of static as it is now.
12 votes -
Support reading Azure Data Lake data from Apache Spark on HDInsight
Currently many open source applications (eg. Apache Hive) are supported (https://azure.microsoft.com/en-gb/documentation/articles/data-lake-store-compatible-oss-other-applications/). It would be great to have support for Apache Spark running in HDInsight clusters, too.
60 votes -
Add support for AppendBlob in hdinsight
HDInsight only support blockblob.
Appendblob is ideal for archiving data in time slices, but it can't be consumed by Spark over hdinsight etc.
60 votes -
Add Support for Seaborn data visualization python library
Deploying Spark code that runs using PySpark kernel on HDInsight does not support code that includes Seaborn libraries for visualization.
15 votes -
Provide support for reading Azure Table Storage data from Apache Spark
Currently Azure Tables are not supported. Only Azure blobs support the HDFS interface required by Hadoop & Spark.
77 votes -
29 votes
-
Provide several industry standard data mining algorithms designed to be processed in a mapreduce hadoop cluster; complete with visualization
Looking at data mining in analysis services along with its visualization. Provide these same algorithms (maybe more) to be processed instead of on a data source view, in a mapreduce fashion against data in HDFS, whereby data selection and algorithm processing is distributed, collected, re-distributed, until a logical regression limit is met, then assemble the results and provide great visualizations.
34 votesDave,
Thanks for the feedback, we’re looking into enabling scenarios like this. I would be curious to learn the type of algorithms you’d like to see here.
—matt
-
Support MySQL / PostgreSQL metastores
HDInsight supports only Azure SQL as metastores. At least through the portal. Through Ambari there are further options for other database providers such as MySQL or PostgreSQL. Can these be enabled through the portal please?
1 vote -
Please make Azure Cloud Shell support winscp
As a windows user, I choose Azure because it is a Microsoft product, so I suppose they can work well together. However, I have to use ssh client for Hadoop cluster operation, but it seems most of your ssh client does not work well for windows machine, for example, scp command cannot work well for putting local file from windows machine to clusters, while winscp can do it well, so why don't you make your ssh client support winscp well?
1 vote -
Specify which version of HDP to use during cluster creation
HDP 2.6.5 supports Spark 2.3.0 which has additional functionality to work with Pandas UDFs.
https://databricks.com/blog/2017/10/30/introducing-vectorized-udfs-for-pyspark.html
What would it take to upgrade HDP? Would I have to follow this guide to upgrade or is there a parameter I can pass during cluster creation that allows me to specify the HDP version?
3 votes -
"Vertex Failure" issue, arising out of Hive LLAP, when running complex queries in HDP 2.6 cluster. We are using Azure Blob as default FS.
Repetitive and random "Vertex Failure" error arising out of Hive LLAP, when running complex queries in HDP 2.6 cluster. We are using Azure Blob as default FS and have appropriate keys in place. Sometimes the queries are running and other time, it is not.
Please find the exact error below and I have attached the error log report as well:
Vertex Failure: Container * in account .blob.core.windows.net not found, and we can't create it using anoynomous credentials, and no credentials found for them in the configuration.
Please assist us in resolving this (or you can direct us to a place)…
1 vote -
Better support of additional Big Data components on HDInsight
Please provide support for BD components that are allowed to be installed on HDI by MS using automated scripts. For example, Solr install is allowed, but only installs on local file systems. There is no support and lack of guidance on how to deploy Solr to write index collection data to ADL (thru HDFS).
1 vote
- Don't see your idea?