I am searching for the feature in data factory for databricks activity, suppose there is pipeline and in that pipeline there are multiple databricks activity, as of now i can make use of new job cluster to execute all the databricks activities but by doing this spin up the cluster and terminate the cluster for each activity is taking lot of time, i would like to have a functionality where i can create a cluster at the begining of the pipeline and all activities make use of the existing cluster and at the end we can terminate the cluster. there is other option as well to reuse the existing cluster but start and stop and use the cluster in the pipeline would be great and added advantage
I am searching for the feature in data factory for databricks activity, suppose there is pipeline and in that pipeline there are multiple databricks activity, as of now i can make use of new job cluster to execute all the databricks activities but by doing this spin up the cluster and terminate the cluster for each activity is taking lot of time, i would like to have a functionality where i can create a cluster at the begining of the pipeline and all activities make use of the existing cluster and at the end we can terminate the cluster.…70 votes
When a Web Activity calls an API that returns a JSON array as the response we get an error that says "Response Content is not a valid JObject". Please support JSON arrays as the top level of the response.68 votes
Sometimes mistakes are made - like deleting a pipeline. I should be able to restore the data factory or the pipeline. I am not finding any documentation on how to do this, so am assuming it isn't available.67 votes
ADF must support Elastic database transactions towards Azure SQL Database.
This is equivalent to the on-premise scenario, where SSIS transactions use MSDTC towards SQL Server.
Currently, if you set TransactionOption=Required on a data flow, and use an OLEDB connection to an Azure SQL Database, you receive an error like:
"The SSIS runtime has failed to enlist the OLE DB connection in a distributed transaction with error 0x80070057 "The parameter is incorrect".63 votes
In Azure Data Factory pipeline level alerts are required,Pipeline may have many activities, But single alert email should come once execute
In Azure Data Factory pipeline level alerts are required,Pipeline may have many activities(Since activity level alerts are available now . mailbox will be filled with alert emails) , So single alert email should come once the pipeline is executed60 votes
An ADLS (gen 1) Linked Service is authenticated with a Managed Identity (MSI) or a Service Principal. When authenticating with MSI, we can't use Mapping Data Flows. Will this functionality be added?59 votes
It will be great if Azure data factory jobs could be executed/run by webhooks using their default schedule.
The current limitation to re-run via powershell or Azure portal is not that graceful for production environment and to be automated.
Ideally, if the job could run on http post to the webhook will be great! and will resolve many automation challenges.
Potentially this could be integrated in Azure Logic Apps.59 votes
The copy performance of the ADF Copy Data Activity going from a file system source to a Blob FileSystem or Blob source is quite slow and CPU intensive relative to other copy mechanisms available when copying a large number (tens of thousands to millions) of small files (<1MB).
Both AzCopy & Azure Storage Explorer are able to complete the copy operations from the same source to the same sink approximately 3-5x faster while using less CPU than the ADF Copy Activity.
At a minimum, we would like to see performance parity with AzCopy / Azure Storage Explorer.58 votes
It is currently not possible to pull down docker images and run those as tasks through Data Factory, even though this is already possible through Batch itself.57 votes
Please provide for support on connector to sink owner field.57 votes
Currently, Testing Connection for the linked services in ADF is only possible from the ADF GUI.
Being able to perform this test programmatically is essential to be able to build a proper automated CI/CD pipeline for ADF and include automated connection tests.
Therefore, Test Connection should be available via:
- SDKs like Python, .NET, etc.
- REST API
- Other55 votes
Currently, the only way to rename Linked Services and other components is to delete and recreate the linked service. Doing this then requires each assosciated dataset to be updated manually.
Functionality to rename this within the GUI tool would add value in allowing these components to be renamed with the confidence it will not break anything.
Whilst it is possible to edit the JSON by hand, when I tried this and uploaded back into the GIT repository, it broke the connections. The behind the scenes magic seems not able to handle it.54 votes
For wall clock trigger schedule, should have some property by which we can control whether to allow new run of pipeline if a previous run already in progress.54 votes
Talk to the O365 OneRM team and get a copy of their O365DataTransfer program -- it does a lot of things DF needs to do.
O365 OneRM team is solving the same problems that you are, and has a mature platform that does many of the things that Azure DF will need to do. Talk to Zach, Naveen, and Karthik in building 2. Also talk to Pramod. It'll accelerate you in terms of battle-hardened user needs and things pipeline automation has needed to do in the field. You'll want to get a copy of the bits/code.53 votes
Automatic mapping of field names should be case-insensitive in SQL Azure connector.
In Azure SQL Datawarehouse connector fields with identical names but different case (upper-/lowercase) characters are mapped smoothly.
Not so in Azure SQL connector. Everything must be done manually. Every refresh will void the mappings, which is rather painful.52 votes
It would be great and very useful in my opinion if there was a Google Sheets connector.
Thanks in advance.51 votes
When you setup Key Vault to periodically rotate the storage account key, it stores the key not in a secret but under a URI similar to https://<keyvault>.vault.azure.net/storage/<storageaccountname>
The setup instructions for this automatic key rotation are here:
Please enhance Azure Data Factory so that you can pull the storage account key for use in a linked service from this place in Azure Key Vault. Currently ADF only supports pulling from secrets, not from storage keys in key vault.50 votes
Currently if you make a call to a web API and the JObject returned is greater than 1mb in size then the activity fails with the error:
"The length of execution ouput is over limit (around 1M currently). "
This is a big limitation and would be great if it were removed or increased.48 votes
When output is ready, it should be possible to save the files to an FTP folder i.e. FTP becomes an output set as well.48 votes
Integrate Azure Data Factory Managed Identity in Databricks service.. like you did for Keyvault, storage, etc.46 votes
- Don't see your idea?