HDInsight

Welcome! You can use this site to tell the Microsoft HDInsight team what features you would like to see.

Remember that this site is for feature suggestions and ideas…

If you have technical questions, please visit our forums.
If you are looking for tutorials and documentation, please visit our getting started page.

How can we improve HDInsight?

You've used all your votes and won't be able to post a new idea, but you can still search and comment on existing ideas.

There are two ways to get more votes:

  • When an admin closes an idea you've voted on, you'll get your votes back from that idea.
  • You can remove your votes from an open idea you support.
  • To see ideas you have already voted on, select the "My feedback" filter and select "My open ideas".
(thinking…)

Enter your idea and we'll search to see if someone has already suggested it.

If a similar idea already exists, you can support and comment on it.

If it doesn't exist, you can post your idea so others can support it.

Enter your idea and we'll search to see if someone has already suggested it.

  1. Start/Stop cluster HDInsight

    The possibility to start and stop a cluster. Now is only available delete the cluster and I do not want any charge unnecessarily if I don't use the cluster for several days.

    296 votes
    Vote
    Sign in
    Check!
    (thinking…)
    Reset
    or sign in with
    • facebook
    • google
      Password icon
      I agree to the terms of service
      Signed in as (Sign out)
      You have left! (?) (thinking…)
      9 comments  ·  Flag idea as inappropriate…  ·  Admin →

      [Update] Thanks for your continued feedback on this capability! Rest assured that we are tracking this request closely along with several other platform capabilities our customers have requested. In the meanwhile, you can leverage cluster scaling capability to adjust HDInsight cluster size according to your varying compute needs. Azure Data Factory is another option you can explore for scheduling jobs with automatic creation and deletion of clusters: https://azure.microsoft.com/en-us/documentation/articles/data-factory-data-transformation-activities/

      Adnan Ijaz
      Program Manager
      Microsoft Azure HDInsight

    • HDInsight Security insight and integration with Active Directory documentation

      Document how security is implemented with AD integration in an Enterprise HDInsight multi-node cluster.

      46 votes
      Vote
      Sign in
      Check!
      (thinking…)
      Reset
      or sign in with
      • facebook
      • google
        Password icon
        I agree to the terms of service
        Signed in as (Sign out)
        You have left! (?) (thinking…)
        2 comments  ·  Flag idea as inappropriate…  ·  Admin →
      • Supported JSON.SerDe for HIVE in HDinsight

        In our setup we're dealing with data with a complex schemas, so we're using a custom build json SerDe downloaded from here https://github.com/rcongiu/Hive-JSON-Serde in relation with HIVE. Each time HDinsight is updated to a newer version we run into issues related to this SerDe. It could be nice if MS could provide a SerDe that was tested and supported when a new HDinsight distribution is released.

        37 votes
        Vote
        Sign in
        Check!
        (thinking…)
        Reset
        or sign in with
        • facebook
        • google
          Password icon
          I agree to the terms of service
          Signed in as (Sign out)
          You have left! (?) (thinking…)
          under review  ·  0 comments  ·  Flag idea as inappropriate…  ·  Admin →
        • Provide several industry standard data mining algorithms designed to be processed in a mapreduce hadoop cluster; complete with visualization

          Looking at data mining in analysis services along with its visualization. Provide these same algorithms (maybe more) to be processed instead of on a data source view, in a mapreduce fashion against data in HDFS, whereby data selection and algorithm processing is distributed, collected, re-distributed, until a logical regression limit is met, then assemble the results and provide great visualizations.

          34 votes
          Vote
          Sign in
          Check!
          (thinking…)
          Reset
          or sign in with
          • facebook
          • google
            Password icon
            I agree to the terms of service
            Signed in as (Sign out)
            You have left! (?) (thinking…)
            0 comments  ·  Flag idea as inappropriate…  ·  Admin →
          • 33 votes
            Vote
            Sign in
            Check!
            (thinking…)
            Reset
            or sign in with
            • facebook
            • google
              Password icon
              I agree to the terms of service
              Signed in as (Sign out)
              You have left! (?) (thinking…)
              1 comment  ·  Flag idea as inappropriate…  ·  Admin →
            • Add a feature to "shut down" an HD Insight cluster instead of deleting it when not in use.

              With HDinsight clusters being promoted as something that one can disable or turn off when not in use (cost concerns), I would like to suggest a way to just "shut down" or "deallocate" a cluster when not in use to avoid charges. This can work out pretty much the same as VMs. Users would expect to be billed for the SQL and/or storage parts while the cluster is disabled.

              32 votes
              Vote
              Sign in
              Check!
              (thinking…)
              Reset
              or sign in with
              • facebook
              • google
                Password icon
                I agree to the terms of service
                Signed in as (Sign out)
                You have left! (?) (thinking…)
                0 comments  ·  Flag idea as inappropriate…  ·  Admin →

                Thanks, This is a common ask from our customers and something we are seriously thinking about. In the meantime you can use Azure Data Factory to “delete” the cluster and you can use persistent metastore using Azure SQL and persistent store like Azure Data Lake Store or Azure Blob which will make it seem like it is “shut down”. Thanks for your feedback. Rashim Gupta (HDInsight Engineering team)

              • Create a developer 'sandbox' option

                As an alternative to the emulator, create a low cost single machine 'sandbox' option that runs on a single server for developers, data scientist etc to use, similar to HortonWorks/Cloudera's VM download.

                31 votes
                Vote
                Sign in
                Check!
                (thinking…)
                Reset
                or sign in with
                • facebook
                • google
                  Password icon
                  I agree to the terms of service
                  Signed in as (Sign out)
                  You have left! (?) (thinking…)
                  under review  ·  1 comment  ·  Flag idea as inappropriate…  ·  Admin →
                • Make the HDInsight emulator into a full-fledged multi-cluster environment instead of a single cluster.

                  Purchasing online azure membership for a multi-cluster HDInsight cloud service is too costly for a C# developer like me. I want to be able to install HDInsight emulator on my local desktop machines and be able to set-up a local cluster of my own. Right now the only I can do is use Hadoop and java. But being a C# developer I would love HDInsight locally to play around. Thankx.

                  21 votes
                  Vote
                  Sign in
                  Check!
                  (thinking…)
                  Reset
                  or sign in with
                  • facebook
                  • google
                    Password icon
                    I agree to the terms of service
                    Signed in as (Sign out)
                    You have left! (?) (thinking…)
                    1 comment  ·  Flag idea as inappropriate…  ·  Admin →
                  • Provide support for reading Azure Table Storage data from Apache Spark

                    Currently Azure Tables are not supported. Only Azure blobs support the HDFS interface required by Hadoop & Spark.

                    17 votes
                    Vote
                    Sign in
                    Check!
                    (thinking…)
                    Reset
                    or sign in with
                    • facebook
                    • google
                      Password icon
                      I agree to the terms of service
                      Signed in as (Sign out)
                      You have left! (?) (thinking…)
                      0 comments  ·  Flag idea as inappropriate…  ·  Admin →
                    • Provide the %pyspark intrpreter for Zeppelin

                      Other distributions of Zeppelin notebook include %pyspark interpreter. The one on HDinsight has only %spark, %sql, %dep, %md. Would be really nice to have %pyspark.

                      15 votes
                      Vote
                      Sign in
                      Check!
                      (thinking…)
                      Reset
                      or sign in with
                      • facebook
                      • google
                        Password icon
                        I agree to the terms of service
                        Signed in as (Sign out)
                        You have left! (?) (thinking…)
                        1 comment  ·  Flag idea as inappropriate…  ·  Admin →
                      • 13 votes
                        Vote
                        Sign in
                        Check!
                        (thinking…)
                        Reset
                        or sign in with
                        • facebook
                        • google
                          Password icon
                          I agree to the terms of service
                          Signed in as (Sign out)
                          You have left! (?) (thinking…)
                          0 comments  ·  Flag idea as inappropriate…  ·  Admin →
                        • Do not automatically start charging for HDInsight when a new cluster is created

                          I find that I create a cluster, and as I'm waiting for it to finish being set up I move onto other things. Only to return a day or two later and find I've already been billed over a hundred dollars. Has happened twice now.

                          8 votes
                          Vote
                          Sign in
                          Check!
                          (thinking…)
                          Reset
                          or sign in with
                          • facebook
                          • google
                            Password icon
                            I agree to the terms of service
                            Signed in as (Sign out)
                            You have left! (?) (thinking…)
                            1 comment  ·  Flag idea as inappropriate…  ·  Admin →

                            Thanks for feedback Badrul. I recommend using Azure Data Factory which can bring up and delete the cluster when not in use. You can also consider using Azure Data Lake Analytics which only bills you for the time your jobs are running. In the meantime, we will brainstorm how we can bring this feature in HDInsight.

                          • Support attaching to a read-only storage account

                            It is common to attach two storage accounts to an HDInsight cluster. The secondary storage account is the source and is intended to be read-only. We create an external table on top of the source files and then insert to tables living in the default storage account which is read-write.

                            Please add support for HDInsight to attach to a storage account as read-only so we can 100% ensure we don't write to the source account?

                            I don't know if this would require blob storage adding a read-only storage account key or if this would be a flag in HDInsight or…

                            7 votes
                            Vote
                            Sign in
                            Check!
                            (thinking…)
                            Reset
                            or sign in with
                            • facebook
                            • google
                              Password icon
                              I agree to the terms of service
                              Signed in as (Sign out)
                              You have left! (?) (thinking…)
                              0 comments  ·  Flag idea as inappropriate…  ·  Admin →

                              HDInsight supports creating read-only clusters using SAS tokens. We are currently updating our documentation to provide sample and more details. I’ll update this thread with a link to the sample and documentation by the end of this week.

                              -Adnan | PM | HDInsight

                              As a follow up to Adnan’s comment we do have a blog post below which shows how to do this:

                              https://gist.github.com/mwinkle/a0b16be59b4e00de3bba

                              The config to use as provided in the link is fs.azure.sas…blob.core.windows.net in core-site.xml

                            • Create Eclipse plugin to connect to HDinsight and deploy jobs directly

                              Create an eclipse plugin which will have a HDinsight perspective to be able to create MapReduce Applications in Java and deploy the jar directly in HDinsight server.

                              7 votes
                              Vote
                              Sign in
                              Check!
                              (thinking…)
                              Reset
                              or sign in with
                              • facebook
                              • google
                                Password icon
                                I agree to the terms of service
                                Signed in as (Sign out)
                                You have left! (?) (thinking…)
                                2 comments  ·  Flag idea as inappropriate…  ·  Admin →
                              • Java Gateway bug with PySpark on Azure HdInsight

                                A previously working Jupyter Notebook failes with the exception "Java gateway process exited before sending the driver its port number".
                                The pyspark source contains at that point the comment "In Windows, ensure the Java child processes do not linger after Python has exited.".
                                Even restarting the HDInsight instance doesn't fixes that issue.

                                4 votes
                                Vote
                                Sign in
                                Check!
                                (thinking…)
                                Reset
                                or sign in with
                                • facebook
                                • google
                                  Password icon
                                  I agree to the terms of service
                                  Signed in as (Sign out)
                                  You have left! (?) (thinking…)
                                  0 comments  ·  Flag idea as inappropriate…  ·  Admin →
                                • Flume support

                                  Allow flume to stream data directly to HDInsight

                                  4 votes
                                  Vote
                                  Sign in
                                  Check!
                                  (thinking…)
                                  Reset
                                  or sign in with
                                  • facebook
                                  • google
                                    Password icon
                                    I agree to the terms of service
                                    Signed in as (Sign out)
                                    You have left! (?) (thinking…)
                                    0 comments  ·  Flag idea as inappropriate…  ·  Admin →
                                  • HBase stop and start, or some sort of archival option

                                    At the moment, to use HBase and related services, you have to spin up a new DB, do your stuff, and then delete it when you're done, unless you want to pay $2 and change an hour at the minimum while you're not using it. There's no option for archival of data, or even saving the creation options.

                                    The HDInsight stuff is really cool, but the billing structure makes it untenable and aggravating for the small developer. Please implement some kind of archival process, or some way to temporarily turn off an HBase cluster without deleting the whole thing.

                                    3 votes
                                    Vote
                                    Sign in
                                    Check!
                                    (thinking…)
                                    Reset
                                    or sign in with
                                    • facebook
                                    • google
                                      Password icon
                                      I agree to the terms of service
                                      Signed in as (Sign out)
                                      You have left! (?) (thinking…)
                                      0 comments  ·  Flag idea as inappropriate…  ·  Admin →
                                    • HDInsight tools for Visual Studio shoudl let you run multiple Hive queries and see results

                                      Currently if you Submit query1.hql it pops up the Hive Job Summary pane which I can monitor to see if that query succeeded and to see the results.

                                      If in the meantime I Submit query2.hql, it replaces query1 in the Hive Job Summary pane. As far as I can see, there's no way to get back to query1 job summary.

                                      I wish the Hive Job Summary pane were attached to the bottom of the HQL window like it is with most other SQL query tools in Visual Studio. Then we could have one results pane per .hql file.

                                      3 votes
                                      Vote
                                      Sign in
                                      Check!
                                      (thinking…)
                                      Reset
                                      or sign in with
                                      • facebook
                                      • google
                                        Password icon
                                        I agree to the terms of service
                                        Signed in as (Sign out)
                                        You have left! (?) (thinking…)
                                        3 comments  ·  Flag idea as inappropriate…  ·  Admin →
                                      • 1 vote
                                        Vote
                                        Sign in
                                        Check!
                                        (thinking…)
                                        Reset
                                        or sign in with
                                        • facebook
                                        • google
                                          Password icon
                                          I agree to the terms of service
                                          Signed in as (Sign out)
                                          You have left! (?) (thinking…)
                                          0 comments  ·  Flag idea as inappropriate…  ·  Admin →
                                        • Grrr

                                          The fact that you're assuming I want to talk about hdinsight is part of the problem. I created an azure account to try document db. And on the management page it is nowhere to be found. This is not rocket science if your ad says you sell some service then I should be able to add some service to my account. If I can't for some reason then it should still show up and explain why not. Making simple things hard is going to make me spend more time at amazon. Grrr

                                          1 vote
                                          Vote
                                          Sign in
                                          Check!
                                          (thinking…)
                                          Reset
                                          or sign in with
                                          • facebook
                                          • google
                                            Password icon
                                            I agree to the terms of service
                                            Signed in as (Sign out)
                                            You have left! (?) (thinking…)
                                            1 comment  ·  Flag idea as inappropriate…  ·  Admin →
                                          ← Previous 1
                                          • Don't see your idea?

                                          HDInsight

                                          Feedback and Knowledge Base