HDInsight
Welcome! You can use this site to tell the Microsoft HDInsight team what features you would like to see.
Remember that this site is for feature suggestions and ideas…
If you have technical questions, please visit our forums.
If you are looking for tutorials and documentation, please visit our getting started page.
-
Start/Stop cluster HDInsight
The possibility to start and stop a cluster. Now is only available delete the cluster and I do not want any charge unnecessarily if I don't use the cluster for several days.
1,226 votes[Update] Thanks for your continued feedback on this capability! Rest assured that we are tracking this request closely along with several other platform capabilities our customers have requested. In the meanwhile, you can leverage cluster scaling capability to adjust HDInsight cluster size according to your varying compute needs. Azure Data Factory is another option you can explore for scheduling jobs with automatic creation and deletion of clusters: https://azure.microsoft.com/en-us/documentation/articles/data-factory-data-transformation-activities/
Adnan Ijaz
Program Manager
Microsoft Azure HDInsight -
Add a feature to "shut down" an HD Insight cluster instead of deleting it when not in use.
With HDinsight clusters being promoted as something that one can disable or turn off when not in use (cost concerns), I would like to suggest a way to just "shut down" or "deallocate" a cluster when not in use to avoid charges. This can work out pretty much the same as VMs. Users would expect to be billed for the SQL and/or storage parts while the cluster is disabled.
137 votesThanks, This is a common ask from our customers and something we are seriously thinking about. In the meantime you can use Azure Data Factory to “delete” the cluster and you can use persistent metastore using Azure SQL and persistent store like Azure Data Lake Store or Azure Blob which will make it seem like it is “shut down”. Thanks for your feedback. Rashim Gupta (HDInsight Engineering team)
-
HDInsight Security insight and integration with Active Directory documentation
Document how security is implemented with AD integration in an Enterprise HDInsight multi-node cluster.
131 votesWe are working on making Active Directory integration available with HDInsight. We’ll provide further updates on this thread as we get closer to general availability.
Adnan Ijaz
Program Manager
Microsoft Azure HDInsight -
HDInsight AutoScale
Please provide auto scale option in HDInsight for scaling down and scale up of Cluster based on usage/query running.
103 votesIn private preview now
-
87 votes
Thanks for the feedback! This is a common request and we are jointly investigating with Azure Storage team in bringing this support to HDInsight.
Adnan Ijaz
Program Manager
Microsoft Azure HDInsight -
apache atlas
Having Apache Atlas with Hdinsight for Data Catalog and lineage would be great feature. Any plans for this on the road map ?
79 votes -
Provide support for reading Azure Table Storage data from Apache Spark
Currently Azure Tables are not supported. Only Azure blobs support the HDFS interface required by Hadoop & Spark.
75 votes -
HDInsight on private vNet network
The deployment of HDInsight configure the cluster with PublicIPs and and makes it accessable from internet. Please make an option to set up the clutser so that it can only be accessed from the private IP in a vNet . The vNet can then have VPN or Express route connectivity to on-premise networks and all access to the cluster should be limited to this.
74 votes -
Support reading Azure Data Lake data from Apache Spark on HDInsight
Currently many open source applications (eg. Apache Hive) are supported (https://azure.microsoft.com/en-gb/documentation/articles/data-lake-store-compatible-oss-other-applications/). It would be great to have support for Apache Spark running in HDInsight clusters, too.
60 votes -
Add support for AppendBlob in hdinsight
HDInsight only support blockblob.
Appendblob is ideal for archiving data in time slices, but it can't be consumed by Spark over hdinsight etc.
60 votes -
Supported JSON.SerDe for HIVE in HDinsight
In our setup we're dealing with data with a complex schemas, so we're using a custom build json SerDe downloaded from here https://github.com/rcongiu/Hive-JSON-Serde in relation with HIVE. Each time HDinsight is updated to a newer version we run into issues related to this SerDe. It could be nice if MS could provide a SerDe that was tested and supported when a new HDinsight distribution is released.
54 votes -
Define NSG Rules for Restricting Outbound Internet Access
The documentation states clearly that if you add an HDInsight cluster to a VNet, then you cannot apply outbound NSG rules. Having unrestricted outbound internet access is a significant risk. Are there any other mitigating controls in place to detect data leakage?
51 votes -
Create a developer 'sandbox' option
As an alternative to the emulator, create a low cost single machine 'sandbox' option that runs on a single server for developers, data scientist etc to use, similar to HortonWorks/Cloudera's VM download.
50 votes -
Support Mobius out of the box in HDInsight Spark cluster
Several Mobius[1] customers have asked about the support in HDInsight Spark. Currently the experience is not smooth[2]. It would be nice to make Mobius work out of the box in HDInsight Spark and possibly even make the end-to-end experience building and deploying Spark jobs in .NET richer.
[1] Mobius: .NET API for Spark - https://github.com/Microsoft/Mobius
[2] Using Mobius in HDInsight - https://github.com/Microsoft/Mobius/blob/master/notes/running-mobius-app.md#mobius-in-azure-hdinsight-spark-cluster
46 votes -
46 votes
-
Currently Custom Dns is not supported in HDInsight.
"Currently Custom Dns is not supported in HDInsight."
We tested the configuration (HDInsight cluster with Windows/Linux, Hadoop and HDInsight 3.2 & 3.4) on new portal and got the error.
However, if we use the classic portal, and create the classic virtual network with the custom DNS server registered, and then specify the virtual network during Windows version of HDInsight cluster provisioning, it seems that we can start the provision.
But we use Linux Hadoop and cannot provision Linux version of Hadoop with custom DNS in virtual network, it is not supported in the old classic portal.
Is there any suggestion…42 votes -
Show only supported VM sizes when we create a HDInsight cluster on Azure portal
When we create a HDInsight cluster on Azure portal, we can choose any VM sizes even if it isn't supported at the specified region. For example, Japan West doesn't support v2 size but we can choose it for worker nodes. Then, the deployment failed with the error "DeploymentDocument 'CsmDocuemtn20' failed the validation. Error: 'VM size xxxxx_V2' provided in the CSM document is invalid or not supported for role 'workdernode''". Moreover, we can see HDInsight resource on Azure portal although the deployment failed. So, users have to delete it manually.
It would be better if unsuppoted VM sizes wouldn't…
37 votes -
Support Selecting a Certain Node When Scaling In
The existing scaling out/in feature in HDInsight has a bad implication when it comes to scaling in, which is the inevitable failure of any pending or running jobs. It would be nice to have the ability to select certain nodes when scaling in, in order to safely shrink the cluster without loosing active jobs.
37 votes -
Provide several industry standard data mining algorithms designed to be processed in a mapreduce hadoop cluster; complete with visualization
Looking at data mining in analysis services along with its visualization. Provide these same algorithms (maybe more) to be processed instead of on a data source view, in a mapreduce fashion against data in HDFS, whereby data selection and algorithm processing is distributed, collected, re-distributed, until a logical regression limit is met, then assemble the results and provide great visualizations.
34 votesDave,
Thanks for the feedback, we’re looking into enabling scenarios like this. I would be curious to learn the type of algorithms you’d like to see here.
—matt
-
Allow resize of Gateway node
User love hive and user being user like to dump large data out from their hive connection, at the moment seems gateway that "build-in" into HDI cluster are limiting this capability.
Please allow us to scale up the gateway node to bigger spec to support more concurrent and larger dataset.
31 votes
- Don't see your idea?