Azure Databricks

Azure Databricks is an Apache Spark-based analytics platform optimized for the Microsoft Azure cloud services platform. Designed with the founders of Apache Spark, Databricks is integrated with Azure to provide one-click setup, streamlined workflows, and an interactive workspace that enables collaboration between data scientists, data engineers, and business analysts.

We would love to hear any feedback you have for Azure Databricks.
For more details about Azure Databricks, try our documentation page.

  1. STOP the non-sense of making Resource Groups for these services if you really want us to use them!! Completely annoying.

    Totally insane. Databricks is the WORST offender of this, but Network Watcher does it as well. I won't allow RGs to be created unless they are NAMED and TAGGED according to OUR rules, so people cannot use this service. Period.

    14 votes
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)

    We’ll send you updates on this idea

    2 comments  ·  Strong Feedback  ·  Flag idea as inappropriate…  ·  Admin →

    Thanks for the valid suggestion. Your feedback is now open for the user community to upvote & comment on. This allows us to effectively prioritize your request against our existing feature backlog and also gives us insight into the potential impact of implementing the suggested feature.

  2. Add AzureDevOps as a git repository of Azure Databricks

    I wishi to set the git repository with Azure DevOps

    12 votes
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)

    We’ll send you updates on this idea

    2 comments  ·  Flag idea as inappropriate…  ·  Admin →
  3. enahnce Databricks for CI/CD with ARM support for Databricks PAT from an AAD identity and linking KeyVault as a Databricks secret scope.

    ARM templates only create the Databricks workspace. Adding support for AAD identities obtaining a Databricks PAT and linking to KeyVault would really help with cluster deployment.

    At Build there was an announcement that scripts would soon be included in ARM templates so updating the Databricks API to support these actions would probably allow this.

    11 votes
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)

    We’ll send you updates on this idea

    0 comments  ·  Flag idea as inappropriate…  ·  Admin →
  4. Extract/write feature weights of Linear Regression model

    Business case: Build and deploy a Market Mix Model to measure the incremental sales and ROI from Media, TradePromo and other marketing components

    The most basic method for solving this is to run a multilinear regression with all marketing variables and calculate the incremental sales using the coefficient estimates (or feature weights) of the trained regression model.
    Currently, we can only view the feature weights through visualize option but not possible to save this weights as a table/dataset. Unless we can access these weight as a table, further calculation is not possible, therefore tangible insights cannot be derived.

    The only…

    11 votes
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)

    We’ll send you updates on this idea

    1 comment  ·  Flag idea as inappropriate…  ·  Admin →
  5. support reading avro files from azure blob storage

    I use EventHub capture to storage account feature and want to load it's avro files from Azure Databricks, it works perfectly on dbfs, but when i try to load it directly from blob storage (based on this article https://docs.azuredatabricks.net/spark/latest/data-sources/azure/azure-storage.html) it fails with the following error:

    shaded.databricks.org.apache.hadoop.fs.azure.AzureException: shaded.databricks.org.apache.hadoop.fs.azure.AzureException: Container events in account pilfleetml.blob.core.windows.net not found, and we can't create it using anoynomous credentials, and no credentials found for them in the configuration.

    Py4JJavaError Traceback (most recent call last)
    <command-486363724788735> in <module>()
    ----> 1 avroDf = spark.read.format("com.databricks.spark.avro").load("wasbs://events@pilfleetml.blob.core.windows.net/pil-fleet-eh/pil-fleet-ml/0/2018/01/25/14/55/*")

    /databricks/spark/python/pyspark/sql/readwriter.py in load(self, path, format, schema, **options)

    157         self.options(**options)
    
    158 if isinstance(path,
    11 votes
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)

    We’ll send you updates on this idea

    2 comments  ·  Flag idea as inappropriate…  ·  Admin →
  6. Support Data Lake Gen2 `abfss://` URI in Azure SQL Data Warehouse connector for `tempdir` option

    Support for abfss:// URI would allow the use of Data Lake Gen2 storage in the Azure SQL Data Warehouse connector tempdir option. The current Azure SQL Data Warehouse connector currently only supports wasbs:// URIs.

    ```
    com.databricks.spark.sqldw.SqlDWConnectorException: Exception encountered in SQL DW connector code.

    The temp data location (option 'tempdir') must be a URI of the form
    "wasbs://containerName@storageAccountName.blob.core.windows.net/somePath".
    Right now, only Azure BlobStore locations are supported.
    ```

    9 votes
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)

    We’ll send you updates on this idea

    0 comments  ·  Flag idea as inappropriate…  ·  Admin →
  7. Replicate Jupyter Shift+Tab functionality

    In Jupyter notebooks, you can press 'Shift+Tab' within a method call to see the arguments and help text of the method - this is one of the nicest features of Jupyter and it would be nice to see similar functionality in Databricks notebooks.

    9 votes
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)

    We’ll send you updates on this idea

    0 comments  ·  Flag idea as inappropriate…  ·  Admin →
  8. High availability for driver nodes

    Currently Azure Databricks clusters contain a single driver node, which creates a single point of failure should a process fail. This can causes clusters to become unresponsive during jobs -- affecting streaming jobs greatly.

    I would propose a second driver node be made available (when desired) to support automatic failover (HA) should a driver becoming unresponsive.

    8 votes
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)

    We’ll send you updates on this idea

    1 comment  ·  Flag idea as inappropriate…  ·  Admin →
  9. Launching Databricks WorkSpace from Azure Portal

    In order to launch the databricks workspace, the user needs to be an owner /contributor at the databricks resource level in azure portal, which is annoying for any enterprise users who are planning to roll out to larger audiences.

    Providing the direct workspace backend URL to the end user manually is not the ideal way , Since there are few now and will be 100's in the future.

    Permissions are set at the workspace and cluster level, When a user launches the workspace from the azure portal , whatever the api that is calling the databricks should validate the existing…

    8 votes
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)

    We’ll send you updates on this idea

    1 comment  ·  Strong Feedback  ·  Flag idea as inappropriate…  ·  Admin →
  10. Key Vault integration

    When Databricks creates Key Vault-backed secret scope (https://docs.azuredatabricks.net/user-guide/secrets/secret-scopes.html#create-an-azure-key-vault-backed-secret-scope), it grants too many permissions over Keys in KV access policy to AzureDatabricks principal. Currently, process gives nearly all posibble permissions to the Keys:
    Display Name : AzureDatabricks (2ff814a6-3304-4ab8-85cb-cd0e6f879c1d)
    Permissions to Keys : wrapKey, decrypt, list, purge, create, recover, restore, verify, encrypt, unwrapKey, import, delete, backup, All, sign, get, update
    Permissions to Secrets : get, list

    Ideally, it should not grant any permissions to the KV Keys as only Secrets in use by Databricks secrets scope

    7 votes
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)

    We’ll send you updates on this idea

    0 comments  ·  Flag idea as inappropriate…  ·  Admin →
  11. Allow to pause service when not in use

    Similar to SQL DW and VMs, can we get the option to pause the service when not in use to minimize costs?

    7 votes
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)

    We’ll send you updates on this idea

    2 comments  ·  Flag idea as inappropriate…  ·  Admin →
  12. Show the rowcount for rows returned in Notebooks

    The notebook shows the data returned but not the number of rows - otherwise I have to rerun the command with a select count(*) on it.

    This should be easy to implement

    6 votes
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)

    We’ll send you updates on this idea

    0 comments  ·  Flag idea as inappropriate…  ·  Admin →
  13. Cluster initialization time is too huge while databricks job run

    The simple job run even for a "print hello_world program" in databricks takes a minimum and fixed time lag of 10-12 seconds for spark initialization which is quite a significant latency. This time lag should be made as minimal as possible, there are certain other cloud providers like google etc. who are doing the same.

    6 votes
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)

    We’ll send you updates on this idea

    0 comments  ·  Strong Feedback  ·  Flag idea as inappropriate…  ·  Admin →
  14. Databricks ControlPlane Ip & Webapp Ip as ServiceTags in NSG/Azure Firewall

    It would be nice if there was a ServiceTag for the Databricks Control Plane and Webapp IP ranges.
    Thanks.

    6 votes
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)

    We’ll send you updates on this idea

    0 comments  ·  Flag idea as inappropriate…  ·  Admin →
  15. Support Single-Sign On with custom identity providers

    Databricks on AWS already supports multiple identity providers for SSO. Check https://docs.databricks.com/administration-guide/users-groups/single-sign-on/index.html.

    There is no reason why Azure Databricks should be limited only to AAD for SSO.

    5 votes
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)

    We’ll send you updates on this idea

    0 comments  ·  Flag idea as inappropriate…  ·  Admin →
  16. Please add feature that can use Table access control when use "R"

    At now, we can use Table access control with python and SQL only .
    So, please add feature that can use Table access control when use "R".

    5 votes
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)

    We’ll send you updates on this idea

    0 comments  ·  Flag idea as inappropriate…  ·  Admin →
  17. Support changing the VNet Address space or subnet CIDR of an existing Azure Databricks workspace

    Modification of the CIDR is quite common, especially when a proof of concept (POC) is a success and you want to go further and connect it to a corporate network.

    RFC 1918 addresses are a real challenge to maintain, and when you perform a POC, you cannot quickly obtain a / 16 or / 24 for POC as requested by the Databricks virtual network injection function.

    For more information, I missed the URL below saying it was not supported and the impact I saw was that the spark cmdlet were no longer working (dubutil were).
    https://docs.microsoft.com/en-us/azure/databricks/kb/cloud/azure-vnet-jobs-not-progressing

    4 votes
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)

    We’ll send you updates on this idea

    0 comments  ·  Strong Feedback  ·  Flag idea as inappropriate…  ·  Admin →
  18. Azure Databricks should be FedRamp compliant across all US regions

    Azure Databricks seems to have every compliance certification other than FedRamp. I would think it would not be an issue to become FedRamp compliant, which would allow government data to be transformed in the platform.

    4 votes
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)

    We’ll send you updates on this idea

    0 comments  ·  Flag idea as inappropriate…  ·  Admin →
  19. Implement access token auto refresh when using credential passthrough

    When a cluster is configured with credential passthrough we are getting an access denied error after 1 hour of running a notebook due to the AD access token expiration. Because of that, it would be nice to have the access token auto refresh feature, with no need to an Azure Active Directory admin increase the AccessTokenLifetime for users.

    This feature is also cited in a comment here: https://feedback.azure.com/forums/909463-azure-databricks/suggestions/36879865-enable-azure-ad-credential-passthrough-to-adls-gen

    4 votes
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)

    We’ll send you updates on this idea

    0 comments  ·  Strong Feedback  ·  Flag idea as inappropriate…  ·  Admin →
  20. Azure Diagnosticks logs are collected with up to 24 hour delay, alert cannot be used

    As the doc says :
    On any given day, Azure Databricks delivers at least 99% of diagnostic logs within the first 24 hours, and the remaining 1% in no more than 72 hours.
    Refer : https://docs.microsoft.com/en-us/azure/databricks/administration-guide/account-settings/azure-diagnostic-logs#diagnostic-log-delivery

    In this case, if logs are sent to log analytcis, log search alert can not be used to monitior those logs due to the unpredictable delay . This has been posted by multiple customers, hope this can be enhanced

    4 votes
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)

    We’ll send you updates on this idea

    0 comments  ·  Strong Feedback  ·  Flag idea as inappropriate…  ·  Admin →
  • Don't see your idea?

Azure Databricks

Categories

Feedback and Knowledge Base