HDInsight
Welcome! You can use this site to tell the Microsoft HDInsight team what features you would like to see.
Remember that this site is for feature suggestions and ideas…
If you have technical questions, please visit our forums.
If you are looking for tutorials and documentation, please visit our getting started page.
-
87 votes
Thanks for the feedback! This is a common request and we are jointly investigating with Azure Storage team in bringing this support to HDInsight.
Adnan Ijaz
Program Manager
Microsoft Azure HDInsight -
apache atlas
Having Apache Atlas with Hdinsight for Data Catalog and lineage would be great feature. Any plans for this on the road map ?
82 votes -
Provide support for reading Azure Table Storage data from Apache Spark
Currently Azure Tables are not supported. Only Azure blobs support the HDFS interface required by Hadoop & Spark.
77 votes -
Support reading Azure Data Lake data from Apache Spark on HDInsight
Currently many open source applications (eg. Apache Hive) are supported (https://azure.microsoft.com/en-gb/documentation/articles/data-lake-store-compatible-oss-other-applications/). It would be great to have support for Apache Spark running in HDInsight clusters, too.
60 votes -
Add support for AppendBlob in hdinsight
HDInsight only support blockblob.
Appendblob is ideal for archiving data in time slices, but it can't be consumed by Spark over hdinsight etc.
60 votes -
Supported JSON.SerDe for HIVE in HDinsight
In our setup we're dealing with data with a complex schemas, so we're using a custom build json SerDe downloaded from here https://github.com/rcongiu/Hive-JSON-Serde in relation with HIVE. Each time HDinsight is updated to a newer version we run into issues related to this SerDe. It could be nice if MS could provide a SerDe that was tested and supported when a new HDinsight distribution is released.
54 votes -
Support Mobius out of the box in HDInsight Spark cluster
Several Mobius[1] customers have asked about the support in HDInsight Spark. Currently the experience is not smooth[2]. It would be nice to make Mobius work out of the box in HDInsight Spark and possibly even make the end-to-end experience building and deploying Spark jobs in .NET richer.
[1] Mobius: .NET API for Spark - https://github.com/Microsoft/Mobius
[2] Using Mobius in HDInsight - https://github.com/Microsoft/Mobius/blob/master/notes/running-mobius-app.md#mobius-in-azure-hdinsight-spark-cluster
46 votes -
Provide several industry standard data mining algorithms designed to be processed in a mapreduce hadoop cluster; complete with visualization
Looking at data mining in analysis services along with its visualization. Provide these same algorithms (maybe more) to be processed instead of on a data source view, in a mapreduce fashion against data in HDFS, whereby data selection and algorithm processing is distributed, collected, re-distributed, until a logical regression limit is met, then assemble the results and provide great visualizations.
34 votesDave,
Thanks for the feedback, we’re looking into enabling scenarios like this. I would be curious to learn the type of algorithms you’d like to see here.
—matt
-
29 votes
-
Provide the %pyspark intrpreter for Zeppelin
Other distributions of Zeppelin notebook include %pyspark interpreter. The one on HDinsight has only %spark, %sql, %dep, %md. Would be really nice to have %pyspark.
24 votes -
Support Spark SQL job submission using .NET Client Library
Currently it's not possible to submit s Spark SQL jobs to spark cluster using Livy (https://issues.cloudera.org/browse/LIVY-19). As there are many teams who would want to convert their Hive code to Spark SQL, and benefit from interactivity of Spark, it would be very nice if Microsoft would create a .NET library that would allow submission of Spark SQL jobs to the HDInsight cluster, ideally using .NET library (or at least an implementation of the LIVY-19 ticket would be nice).
24 votes -
Install Microsoft R Open on non-premium Spark clusters
The non-premium Spark clusters include support for SparkR, but the nodes don't have R installed - which SparkR requires to be used.
Please update the HDInsight clusters to include the R binaries (with CRAN R or MRO).
24 votes -
Add Support for Seaborn data visualization python library
Deploying Spark code that runs using PySpark kernel on HDInsight does not support code that includes Seaborn libraries for visualization.
15 votes -
Enable dynamic allocation for Spark executors by default
I would like the default executor allocation in Spark to be dynamic instead of static as it is now.
12 votes -
Use Apache Spark for reading data from a U-SQL Catalog.
Implement a Spark package for reading data in a U-SQL Catalog.
Similar to DataStax Cassandra Spark driver which knows also the internals of U-SQL Catalog and hence can read structured data efficiently.12 votes -
spark 2.1 support BI connector
Can you please support the BI Connector in Spark 2.1 HDI 3.6?
Thanks!
9 votes -
Add devtools package to HDInsight R Server edge node by default
devtools (https://github.com/hadley/devtools) is a very popular package for package management in R. It is also quite large, and has many dependencies, so it can take a long time to install. It would be very convenient if this was installed by default.
9 votes -
Add persistent storage(ADL/Blob) as a backend storage for HDInsight Kafka Cluster
Like other HDInsight cluster i.e. Hadoop, HBase , Kafka cluster should also have option to use Azure storage account or datalake as backend storage.
It will help user to restore the kafka logs in case if cluster crashes.9 votes -
Update kafka version
Current version is two years old... Latest is greatest.
8 votes -
Storm - ability to use .Net 4.6+ for Spouts and bolts via SCP.Net SDK
.Net 4.6 is almost 1 year in production, so we'd like to leverage the latest Microsoft framework within Storm infrastructure on SCp.Net SDK (C#) as well.
7 votes
- Don't see your idea?