Enable Azure AD credential passthrough to ADLS Gen2
Add a feature of passing AAD credential of the user working with Azure Databricks cluster to Azure Data Lake Store Gen2 filesystems to build secure and enterprise data lake analytics on top of ADLS Gen2 with Databricks. This feature should not be limited to the high concurrency clusters, since these clusters do not support many features (including Scala), and because a typical advanced analytics scenario for the enterprise is to run a dedicated cluster for a small group of departmental analysts (Standard clusters are the most popular).
I had connected Power BI with Databricks, but the credential passthrough to ADLS Gen2 is not running. Is it possible to do it?
Michał Pawlikowski commented
@Kristina correct, it's very confusing.
TO everybody that wants this feature. IT'S IMPLEMENTED and exists from 5.3 version of Databricks!
Just use abfss instead of adl: in your path to file:spark.read.csv("abfss://<my-file-system-name>@<my-storage-account-name>.dfs.core.windows.net/MyData.csv").collect()
Unfortunately, it's still available only in high concurrency cluster (but there are technical reasons and limitations.. :| so i don't think it will ever change)
Kristina Florea commented
the documentation is confusing -- it seems that it should work already when looking here https://docs.azuredatabricks.net/spark/latest/data-sources/azure/azure-datalake-gen2.html, just to navigate deeper and find info only on adlsgen1. And the documentation on Databricks https://docs.databricks.com/spark/latest/data-sources/azure/azure-datalake-gen2.html does not even mention the AAD passthrough...
Another huge plus for this would be if the AAD token would auto-refresh for users logged into Databricks with no need for an Azure Active Directory admin to increase the AccessTokenLifetime for users. I will likely be dealing with strict corporate constraints concerning this area that won't allow for this to be adjusted.