HDInsight

Welcome! You can use this site to tell the Microsoft HDInsight team what features you would like to see.

Remember that this site is for feature suggestions and ideas…

If you have technical questions, please visit our forums.
If you are looking for tutorials and documentation, please visit our getting started page.

  1. Start/Stop cluster HDInsight

    The possibility to start and stop a cluster. Now is only available delete the cluster and I do not want any charge unnecessarily if I don't use the cluster for several days.

    1,184 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    46 comments  ·  Platform  ·  Flag idea as inappropriate…  ·  Admin →

    [Update] Thanks for your continued feedback on this capability! Rest assured that we are tracking this request closely along with several other platform capabilities our customers have requested. In the meanwhile, you can leverage cluster scaling capability to adjust HDInsight cluster size according to your varying compute needs. Azure Data Factory is another option you can explore for scheduling jobs with automatic creation and deletion of clusters: https://azure.microsoft.com/en-us/documentation/articles/data-factory-data-transformation-activities/

    Adnan Ijaz
    Program Manager
    Microsoft Azure HDInsight

  2. Need HDInsight attach/detach edge node capability

    Need to be able to attach/detach edge nodes from an HDInsight cluster. This is not currently supported.

    201 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    19 comments  ·  Platform  ·  Flag idea as inappropriate…  ·  Admin →
  3. Add a feature to "shut down" an HD Insight cluster instead of deleting it when not in use.

    With HDinsight clusters being promoted as something that one can disable or turn off when not in use (cost concerns), I would like to suggest a way to just "shut down" or "deallocate" a cluster when not in use to avoid charges. This can work out pretty much the same as VMs. Users would expect to be billed for the SQL and/or storage parts while the cluster is disabled.

    112 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    5 comments  ·  Platform  ·  Flag idea as inappropriate…  ·  Admin →

    Thanks, This is a common ask from our customers and something we are seriously thinking about. In the meantime you can use Azure Data Factory to “delete” the cluster and you can use persistent metastore using Azure SQL and persistent store like Azure Data Lake Store or Azure Blob which will make it seem like it is “shut down”. Thanks for your feedback. Rashim Gupta (HDInsight Engineering team)

  4. HDInsight AutoScale

    Please provide auto scale option in HDInsight for scaling down and scale up of Cluster based on usage/query running.

    102 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    5 comments  ·  Platform  ·  Flag idea as inappropriate…  ·  Admin →
  5. Create a developer 'sandbox' option

    As an alternative to the emulator, create a low cost single machine 'sandbox' option that runs on a single server for developers, data scientist etc to use, similar to HortonWorks/Cloudera's VM download.

    50 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    under review  ·  2 comments  ·  Platform  ·  Flag idea as inappropriate…  ·  Admin →
  6. Support Selecting a Certain Node When Scaling In

    The existing scaling out/in feature in HDInsight has a bad implication when it comes to scaling in, which is the inevitable failure of any pending or running jobs. It would be nice to have the ability to select certain nodes when scaling in, in order to safely shrink the cluster without loosing active jobs.

    37 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    3 comments  ·  Platform  ·  Flag idea as inappropriate…  ·  Admin →
  7. Show only supported VM sizes when we create a HDInsight cluster on Azure portal

    When we create a HDInsight cluster on Azure portal, we can choose any VM sizes even if it isn't supported at the specified region. For example, Japan West doesn't support v2 size but we can choose it for worker nodes. Then, the deployment failed with the error "DeploymentDocument 'CsmDocuemtn_2_0' failed the validation. Error: 'VM size xxxxx_V2' provided in the CSM document is invalid or not supported for role 'workdernode''". Moreover, we can see HDInsight resource on Azure portal although the deployment failed. So, users have to delete it manually.

    It would be better if unsuppoted VM sizes wouldn't be shown…

    36 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    0 comments  ·  Platform  ·  Flag idea as inappropriate…  ·  Admin →
  8. Make the HDInsight emulator into a full-fledged multi-cluster environment instead of a single cluster.

    Purchasing online azure membership for a multi-cluster HDInsight cloud service is too costly for a C# developer like me. I want to be able to install HDInsight emulator on my local desktop machines and be able to set-up a local cluster of my own. Right now the only I can do is use Hadoop and java. But being a C# developer I would love HDInsight locally to play around. Thankx.

    27 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    2 comments  ·  Platform  ·  Flag idea as inappropriate…  ·  Admin →
  9. Support attaching to a read-only storage account

    It is common to attach two storage accounts to an HDInsight cluster. The secondary storage account is the source and is intended to be read-only. We create an external table on top of the source files and then insert to tables living in the default storage account which is read-write.

    Please add support for HDInsight to attach to a storage account as read-only so we can 100% ensure we don't write to the source account?

    I don't know if this would require blob storage adding a read-only storage account key or if this would be a flag in HDInsight or…

    19 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    1 comment  ·  Platform  ·  Flag idea as inappropriate…  ·  Admin →

    HDInsight supports creating read-only clusters using SAS tokens. We are currently updating our documentation to provide sample and more details. I’ll update this thread with a link to the sample and documentation by the end of this week.

    -Adnan | PM | HDInsight

    As a follow up to Adnan’s comment we do have a blog post below which shows how to do this:

    https://gist.github.com/mwinkle/a0b16be59b4e00de3bba

    The config to use as provided in the link is fs.azure.sas…blob.core.windows.net in core-site.xml

  10. Integrated On Premesis +Cloud HDInsight Clusters

    Customers (Banks and retailers) are finding it difficult to trust the cloud with their business data and the fact that data can be store in any location in the cloud as a result not adopting HDInsight at expected rate.

    It would be great if the HDInsight 'namenode'/'head' is able to create nodes on-premises data centers such that 'vital' dat stay within customer premises and other 'none sensitive' data recide in the cloud storage.
    Benefits: customer can run jobs across location and only need to move limited data to cloud (cost), they are rest assured.

    the headache of managing the DR…

    15 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    0 comments  ·  Platform  ·  Flag idea as inappropriate…  ·  Admin →
  11. Do not automatically start charging for HDInsight when a new cluster is created

    I find that I create a cluster, and as I'm waiting for it to finish being set up I move onto other things. Only to return a day or two later and find I've already been billed over a hundred dollars. Has happened twice now.

    9 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    1 comment  ·  Platform  ·  Flag idea as inappropriate…  ·  Admin →

    Thanks for feedback Badrul. I recommend using Azure Data Factory which can bring up and delete the cluster when not in use. You can also consider using Azure Data Lake Analytics which only bills you for the time your jobs are running. In the meantime, we will brainstorm how we can bring this feature in HDInsight.

  12. Cluster provisioning takes too long

    It takes at least 20 minutes to provision an HDInsight cluster with Standard A3 VMs. Repeated feedback from multiple customers, that it takes longer to provision it on Azure than on AWS or Google.

    9 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    0 comments  ·  Platform  ·  Flag idea as inappropriate…  ·  Admin →
  13. Create Eclipse plugin to connect to HDinsight and deploy jobs directly

    Create an eclipse plugin which will have a HDinsight perspective to be able to create MapReduce Applications in Java and deploy the jar directly in HDinsight server.

    8 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    2 comments  ·  Platform  ·  Flag idea as inappropriate…  ·  Admin →
    under review  ·  matt winkler responded

    Thanks for the suggestion. We’re currently evaluating a number of potential integration points within Eclipse. Would you prefer to see the Azure Eclipse tooling provide help here, or would you prefer to see the HADOOP Developer Tooling project offer support for HDINSIGHT[ http://hdt.incubator.apache.org/ ] ?

  14. Capture Log Information for all queries

    HDInsight portal should provide a way to capture all log associated with query like Tez log, hive Log and Templeton log

    6 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    0 comments  ·  Platform  ·  Flag idea as inappropriate…  ·  Admin →
  15. Provide a Unified Cluster Experience for HDInsight

    Instead of HDInsight seperating the core functionality into disparate packages, provide the ability to run multiple roles on the same cluster.

    This would enable multi-step processing use cases to be performed using a single cluster (instead of multiple).

    For example: Kafka Stream -> Spark -> HBase

    In the current topology, this scenario requires 3 clusters which is expensive and complicated.

    4 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    0 comments  ·  Platform  ·  Flag idea as inappropriate…  ·  Admin →
  16. Change VM size of Worker, Head and Zookeeper

    Change VM size of Worker, Head and Zookeeper from Azure Portal. Today this option are locked, if I need change the vm size, I need to deploy a new cluster.

    4 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    1 comment  ·  Platform  ·  Flag idea as inappropriate…  ·  Admin →
  17. Ability to resize HDInsight nodes without deleting cluster

    Would like the ability to resize the head, worker, zookeeper, and edge nodes of an HDInsight cluster without having to delete and recreate the cluster.

    4 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    0 comments  ·  Platform  ·  Flag idea as inappropriate…  ·  Admin →
  18. Add Cluster Performance alerts

    We often struggle with lack of alerts when CPU and Memory reaches certain threshold. When this happens our jobs suffers and process hangs etc. In view of that we would like to have alters that are configurable and set thresholds on key resources such as CPU/Memory/Disk etc. System should able to generate emails to the group once resources reach certain threshold. I am sure every one will love this feature.

    4 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    0 comments  ·  Platform  ·  Flag idea as inappropriate…  ·  Admin →
  19. Please create all resources belonging to an HDInsight cluster in the same Resource Group.

    Please create all resources belonging to an HDInsight cluster in one Resource Group.
    When you create an HDInsight cluster into a Vnet, all network resources such as NIC's, Load Balancers and Public IP addresses are created within the resource group that contains the Vnet. Only the HDInsight object itself is created in the original resource group. This is fine for testing but in multi-subscription environments with shared Vnets across resource groups, this is not ideal.

    3 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    0 comments  ·  Platform  ·  Flag idea as inappropriate…  ·  Admin →
  20. Ability to add Azure Data Lake store after the HDI cluster has been created

    Currently the only way to add ADLS as an additional storage is during the cluster creation time. However, if there is way to add ADLS after the HDI cluster is created, customer can files on ADLS for further analysis

    3 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    1 comment  ·  Platform  ·  Flag idea as inappropriate…  ·  Admin →
← Previous 1
  • Don't see your idea?

HDInsight

Categories

Feedback and Knowledge Base