HDInsight

Welcome! You can use this site to tell the Microsoft HDInsight team what features you would like to see.

Remember that this site is for feature suggestions and ideas…

If you have technical questions, please visit our forums.
If you are looking for tutorials and documentation, please visit our getting started page.

How can we improve HDInsight?

You've used all your votes and won't be able to post a new idea, but you can still search and comment on existing ideas.

There are two ways to get more votes:

  • When an admin closes an idea you've voted on, you'll get your votes back from that idea.
  • You can remove your votes from an open idea you support.
  • To see ideas you have already voted on, select the "My feedback" filter and select "My open ideas".
(thinking…)

Enter your idea and we'll search to see if someone has already suggested it.

If a similar idea already exists, you can support and comment on it.

If it doesn't exist, you can post your idea so others can support it.

Enter your idea and we'll search to see if someone has already suggested it.

  1. Start/Stop cluster HDInsight

    The possibility to start and stop a cluster. Now is only available delete the cluster and I do not want any charge unnecessarily if I don't use the cluster for several days.

    579 votes
    Vote
    Sign in
    Check!
    (thinking…)
    Reset
    or sign in with
    • facebook
    • google
      Password icon
      I agree to the terms of service
      Signed in as (Sign out)
      You have left! (?) (thinking…)
      17 comments  ·  Flag idea as inappropriate…  ·  Admin →

      [Update] Thanks for your continued feedback on this capability! Rest assured that we are tracking this request closely along with several other platform capabilities our customers have requested. In the meanwhile, you can leverage cluster scaling capability to adjust HDInsight cluster size according to your varying compute needs. Azure Data Factory is another option you can explore for scheduling jobs with automatic creation and deletion of clusters: https://azure.microsoft.com/en-us/documentation/articles/data-factory-data-transformation-activities/

      Adnan Ijaz
      Program Manager
      Microsoft Azure HDInsight

    • HDInsight Security insight and integration with Active Directory documentation

      Document how security is implemented with AD integration in an Enterprise HDInsight multi-node cluster.

      97 votes
      Vote
      Sign in
      Check!
      (thinking…)
      Reset
      or sign in with
      • facebook
      • google
        Password icon
        I agree to the terms of service
        Signed in as (Sign out)
        You have left! (?) (thinking…)
        5 comments  ·  Flag idea as inappropriate…  ·  Admin →
      • 78 votes
        Vote
        Sign in
        Check!
        (thinking…)
        Reset
        or sign in with
        • facebook
        • google
          Password icon
          I agree to the terms of service
          Signed in as (Sign out)
          You have left! (?) (thinking…)
          2 comments  ·  Flag idea as inappropriate…  ·  Admin →
        • Add a feature to "shut down" an HD Insight cluster instead of deleting it when not in use.

          With HDinsight clusters being promoted as something that one can disable or turn off when not in use (cost concerns), I would like to suggest a way to just "shut down" or "deallocate" a cluster when not in use to avoid charges. This can work out pretty much the same as VMs. Users would expect to be billed for the SQL and/or storage parts while the cluster is disabled.

          56 votes
          Vote
          Sign in
          Check!
          (thinking…)
          Reset
          or sign in with
          • facebook
          • google
            Password icon
            I agree to the terms of service
            Signed in as (Sign out)
            You have left! (?) (thinking…)
            1 comment  ·  Flag idea as inappropriate…  ·  Admin →

            Thanks, This is a common ask from our customers and something we are seriously thinking about. In the meantime you can use Azure Data Factory to “delete” the cluster and you can use persistent metastore using Azure SQL and persistent store like Azure Data Lake Store or Azure Blob which will make it seem like it is “shut down”. Thanks for your feedback. Rashim Gupta (HDInsight Engineering team)

          • Supported JSON.SerDe for HIVE in HDinsight

            In our setup we're dealing with data with a complex schemas, so we're using a custom build json SerDe downloaded from here https://github.com/rcongiu/Hive-JSON-Serde in relation with HIVE. Each time HDinsight is updated to a newer version we run into issues related to this SerDe. It could be nice if MS could provide a SerDe that was tested and supported when a new HDinsight distribution is released.

            50 votes
            Vote
            Sign in
            Check!
            (thinking…)
            Reset
            or sign in with
            • facebook
            • google
              Password icon
              I agree to the terms of service
              Signed in as (Sign out)
              You have left! (?) (thinking…)
              under review  ·  0 comments  ·  Flag idea as inappropriate…  ·  Admin →
            • Provide support for reading Azure Table Storage data from Apache Spark

              Currently Azure Tables are not supported. Only Azure blobs support the HDFS interface required by Hadoop & Spark.

              43 votes
              Vote
              Sign in
              Check!
              (thinking…)
              Reset
              or sign in with
              • facebook
              • google
                Password icon
                I agree to the terms of service
                Signed in as (Sign out)
                You have left! (?) (thinking…)
                1 comment  ·  Flag idea as inappropriate…  ·  Admin →
              • Create a developer 'sandbox' option

                As an alternative to the emulator, create a low cost single machine 'sandbox' option that runs on a single server for developers, data scientist etc to use, similar to HortonWorks/Cloudera's VM download.

                43 votes
                Vote
                Sign in
                Check!
                (thinking…)
                Reset
                or sign in with
                • facebook
                • google
                  Password icon
                  I agree to the terms of service
                  Signed in as (Sign out)
                  You have left! (?) (thinking…)
                  under review  ·  1 comment  ·  Flag idea as inappropriate…  ·  Admin →
                • Define NSG Rules for Restricting Outbound Internet Access

                  The documentation states clearly that if you add an HDInsight cluster to a VNet, then you cannot apply outbound NSG rules. Having unrestricted outbound internet access is a significant risk. Are there any other mitigating controls in place to detect data leakage?

                  42 votes
                  Vote
                  Sign in
                  Check!
                  (thinking…)
                  Reset
                  or sign in with
                  • facebook
                  • google
                    Password icon
                    I agree to the terms of service
                    Signed in as (Sign out)
                    You have left! (?) (thinking…)
                    2 comments  ·  Flag idea as inappropriate…  ·  Admin →
                  • Support reading Azure Data Lake data from Apache Spark on HDInsight

                    Currently many open source applications (eg. Apache Hive) are supported (https://azure.microsoft.com/en-gb/documentation/articles/data-lake-store-compatible-oss-other-applications/). It would be great to have support for Apache Spark running in HDInsight clusters, too.

                    37 votes
                    Vote
                    Sign in
                    Check!
                    (thinking…)
                    Reset
                    or sign in with
                    • facebook
                    • google
                      Password icon
                      I agree to the terms of service
                      Signed in as (Sign out)
                      You have left! (?) (thinking…)
                      2 comments  ·  Flag idea as inappropriate…  ·  Admin →
                    • Provide several industry standard data mining algorithms designed to be processed in a mapreduce hadoop cluster; complete with visualization

                      Looking at data mining in analysis services along with its visualization. Provide these same algorithms (maybe more) to be processed instead of on a data source view, in a mapreduce fashion against data in HDFS, whereby data selection and algorithm processing is distributed, collected, re-distributed, until a logical regression limit is met, then assemble the results and provide great visualizations.

                      34 votes
                      Vote
                      Sign in
                      Check!
                      (thinking…)
                      Reset
                      or sign in with
                      • facebook
                      • google
                        Password icon
                        I agree to the terms of service
                        Signed in as (Sign out)
                        You have left! (?) (thinking…)
                        0 comments  ·  Flag idea as inappropriate…  ·  Admin →
                      • Currently Custom Dns is not supported in HDInsight.

                        "Currently Custom Dns is not supported in HDInsight."
                        We tested the configuration (HDInsight cluster with Windows/Linux, Hadoop and HDInsight 3.2 & 3.4) on new portal and got the error.
                        However, if we use the classic portal, and create the classic virtual network with the custom DNS server registered, and then specify the virtual network during Windows version of HDInsight cluster provisioning, it seems that we can start the provision.
                        But we use Linux Hadoop and cannot provision Linux version of Hadoop with custom DNS in virtual network, it is not supported in the old classic portal.
                        Is there any suggestion…

                        33 votes
                        Vote
                        Sign in
                        Check!
                        (thinking…)
                        Reset
                        or sign in with
                        • facebook
                        • google
                          Password icon
                          I agree to the terms of service
                          Signed in as (Sign out)
                          You have left! (?) (thinking…)
                          0 comments  ·  Flag idea as inappropriate…  ·  Admin →
                        • Support Mobius out of the box in HDInsight Spark cluster

                          Several Mobius[1] customers have asked about the support in HDInsight Spark. Currently the experience is not smooth[2]. It would be nice to make Mobius work out of the box in HDInsight Spark and possibly even make the end-to-end experience building and deploying Spark jobs in .NET richer.

                          [1] Mobius: .NET API for Spark - https://github.com/Microsoft/Mobius

                          [2] Using Mobius in HDInsight - https://github.com/Microsoft/Mobius/blob/master/notes/running-mobius-app.md#mobius-in-azure-hdinsight-spark-cluster

                          33 votes
                          Vote
                          Sign in
                          Check!
                          (thinking…)
                          Reset
                          or sign in with
                          • facebook
                          • google
                            Password icon
                            I agree to the terms of service
                            Signed in as (Sign out)
                            You have left! (?) (thinking…)
                            0 comments  ·  Flag idea as inappropriate…  ·  Admin →
                          • Make the HDInsight emulator into a full-fledged multi-cluster environment instead of a single cluster.

                            Purchasing online azure membership for a multi-cluster HDInsight cloud service is too costly for a C# developer like me. I want to be able to install HDInsight emulator on my local desktop machines and be able to set-up a local cluster of my own. Right now the only I can do is use Hadoop and java. But being a C# developer I would love HDInsight locally to play around. Thankx.

                            27 votes
                            Vote
                            Sign in
                            Check!
                            (thinking…)
                            Reset
                            or sign in with
                            • facebook
                            • google
                              Password icon
                              I agree to the terms of service
                              Signed in as (Sign out)
                              You have left! (?) (thinking…)
                              2 comments  ·  Flag idea as inappropriate…  ·  Admin →
                            • Add support for AppendBlob in hdinsight

                              According to https://social.msdn.microsoft.com/Forums/sqlserver/en-US/3001af0c-7f0b-440a-ae65-08d563a5823f/azure-append-blob-storage-does-not-support-spark-textfile-api?forum=hdinsight

                              HDInsight only support blockblob.

                              Appendblob is ideal for archiving data in time slices, but it can't be consumed by Spark over hdinsight etc.

                              25 votes
                              Vote
                              Sign in
                              Check!
                              (thinking…)
                              Reset
                              or sign in with
                              • facebook
                              • google
                                Password icon
                                I agree to the terms of service
                                Signed in as (Sign out)
                                You have left! (?) (thinking…)
                                0 comments  ·  Flag idea as inappropriate…  ·  Admin →
                              • HDInsight on private vNet network

                                The deployment of HDInsight configure the cluster with PublicIPs and and makes it accessable from internet. Please make an option to set up the clutser so that it can only be accessed from the private IP in a vNet . The vNet can then have VPN or Express route connectivity to on-premise networks and all access to the cluster should be limited to this.

                                24 votes
                                Vote
                                Sign in
                                Check!
                                (thinking…)
                                Reset
                                or sign in with
                                • facebook
                                • google
                                  Password icon
                                  I agree to the terms of service
                                  Signed in as (Sign out)
                                  You have left! (?) (thinking…)
                                  1 comment  ·  Flag idea as inappropriate…  ·  Admin →
                                • Install Microsoft R Open on non-premium Spark clusters

                                  The non-premium Spark clusters include support for SparkR, but the nodes don't have R installed - which SparkR requires to be used.

                                  Please update the HDInsight clusters to include the R binaries (with CRAN R or MRO).

                                  22 votes
                                  Vote
                                  Sign in
                                  Check!
                                  (thinking…)
                                  Reset
                                  or sign in with
                                  • facebook
                                  • google
                                    Password icon
                                    I agree to the terms of service
                                    Signed in as (Sign out)
                                    You have left! (?) (thinking…)
                                    2 comments  ·  Flag idea as inappropriate…  ·  Admin →
                                  • Provide the %pyspark intrpreter for Zeppelin

                                    Other distributions of Zeppelin notebook include %pyspark interpreter. The one on HDinsight has only %spark, %sql, %dep, %md. Would be really nice to have %pyspark.

                                    21 votes
                                    Vote
                                    Sign in
                                    Check!
                                    (thinking…)
                                    Reset
                                    or sign in with
                                    • facebook
                                    • google
                                      Password icon
                                      I agree to the terms of service
                                      Signed in as (Sign out)
                                      You have left! (?) (thinking…)
                                      1 comment  ·  Flag idea as inappropriate…  ·  Admin →
                                    • 19 votes
                                      Vote
                                      Sign in
                                      Check!
                                      (thinking…)
                                      Reset
                                      or sign in with
                                      • facebook
                                      • google
                                        Password icon
                                        I agree to the terms of service
                                        Signed in as (Sign out)
                                        You have left! (?) (thinking…)
                                        1 comment  ·  Flag idea as inappropriate…  ·  Admin →
                                      • Support attaching to a read-only storage account

                                        It is common to attach two storage accounts to an HDInsight cluster. The secondary storage account is the source and is intended to be read-only. We create an external table on top of the source files and then insert to tables living in the default storage account which is read-write.

                                        Please add support for HDInsight to attach to a storage account as read-only so we can 100% ensure we don't write to the source account?

                                        I don't know if this would require blob storage adding a read-only storage account key or if this would be a flag in HDInsight or…

                                        16 votes
                                        Vote
                                        Sign in
                                        Check!
                                        (thinking…)
                                        Reset
                                        or sign in with
                                        • facebook
                                        • google
                                          Password icon
                                          I agree to the terms of service
                                          Signed in as (Sign out)
                                          You have left! (?) (thinking…)
                                          1 comment  ·  Flag idea as inappropriate…  ·  Admin →

                                          HDInsight supports creating read-only clusters using SAS tokens. We are currently updating our documentation to provide sample and more details. I’ll update this thread with a link to the sample and documentation by the end of this week.

                                          -Adnan | PM | HDInsight

                                          As a follow up to Adnan’s comment we do have a blog post below which shows how to do this:

                                          https://gist.github.com/mwinkle/a0b16be59b4e00de3bba

                                          The config to use as provided in the link is fs.azure.sas…blob.core.windows.net in core-site.xml

                                        • Support Spark SQL job submission using .NET Client Library

                                          Currently it's not possible to submit s Spark SQL jobs to spark cluster using Livy (https://issues.cloudera.org/browse/LIVY-19). As there are many teams who would want to convert their Hive code to Spark SQL, and benefit from interactivity of Spark, it would be very nice if Microsoft would create a .NET library that would allow submission of Spark SQL jobs to the HDInsight cluster, ideally using .NET library (or at least an implementation of the LIVY-19 ticket would be nice).

                                          16 votes
                                          Vote
                                          Sign in
                                          Check!
                                          (thinking…)
                                          Reset
                                          or sign in with
                                          • facebook
                                          • google
                                            Password icon
                                            I agree to the terms of service
                                            Signed in as (Sign out)
                                            You have left! (?) (thinking…)
                                            2 comments  ·  Flag idea as inappropriate…  ·  Admin →
                                          ← Previous 1 3
                                          • Don't see your idea?

                                          HDInsight

                                          Feedback and Knowledge Base