Data Factory
Azure Data Factory allows you to manage the production of trusted information by offering an easy way to create, orchestrate, and monitor data pipelines over the Hadoop ecosystem using structured, semi-structures and unstructured data sources. You can connect to your on-premises SQL Server, Azure database, tables or blobs and create data pipelines that will process the data with Hive and Pig scripting, or custom C# processing. The service offers a holistic monitoring and management experience over these pipelines, including a view of their data production and data lineage down to the source systems. The outcome of Data Factory is the transformation of raw data assets into trusted information that can be shared broadly with BI and analytics tools.
Do you have an idea, suggestion or feedback based on your experience with Azure Data Factory? We’d love to hear your thoughts.
-
Execute Pipeline activity automatic rerun
Possibility to automatically rerun the related pipeline when a failure occurs.
This is to help cases where a single activity rerun will not get the pipeline on track, for example, when data must be submitted again from the beginning. In these cases, it might be necessary to rerun the complete pipeline.
As of today, the Execute Pipeline activity does not have possibility to specify the number of retries that can be executed before the activity is set to failed.
The workaround to implement a solution involves several components and seems unnecessarily complex.
The attached picture describes a linear pipeline including…
82 votes -
Clear errors and "unused" data slices
There should be a option to clear old errors.
When there is no pipeline that produces or consumes a data slice, and this slice has errors the counter still shows "current" errors, and this is not the case. I would like to remove these unused slices and their errors.82 votes -
Support Azure app service API
Can it consume or push data to Azure app service API? Supporting Swagger API.
79 votesThank you for the feedback. We will look into this.
-
Publish Azure Data Factory ARM templates to a custom folder in the publish branch
Provide the ability to publish Azure Data Factory ARM templates to a custom folder in the publish branch. An additional property could be added to the publish_config.json file in order to cater for this e.g.
{
"publishBranch":"release/adf_publish",
"publishFolder":"Deployment/ARM/"
}https://docs.microsoft.com/en-us/azure/data-factory/source-control#configure-publishing-settings
77 votes -
Run containers through Data Factory custom activity
It is currently not possible to pull down docker images and run those as tasks through Data Factory, even though this is already possible through Batch itself.
75 votes -
Google Sheets connector
Hello,
It would be great and very useful in my opinion if there was a Google Sheets connector.
Thanks in advance.
74 votes -
Support PATCH method in Web Activity
Some Azure REST APIs and other third parties APIs use the PATCH method.
Please add support for this method or make the method parameter a string so that we can use any method.
73 votes -
GitLab Integration in Azure Data Factory
Will be useful to have GitLab integration in Azure Data Factory along with GitHub and Azure Repos as it's one of the most popular tools
71 votes -
Support for Elastic database transactions
ADF must support Elastic database transactions towards Azure SQL Database.
This is equivalent to the on-premise scenario, where SSIS transactions use MSDTC towards SQL Server.
Currently, if you set TransactionOption=Required on a data flow, and use an OLEDB connection to an Azure SQL Database, you receive an error like:
"The SSIS runtime has failed to enlist the OLE DB connection in a distributed transaction with error 0x80070057 "The parameter is incorrect".70 votes -
Web Activity should support JSON array response
When a Web Activity calls an API that returns a JSON array as the response we get an error that says "Response Content is not a valid JObject". Please support JSON arrays as the top level of the response.
68 votes -
Allow MSI authentication for AzureDataLakeStore in Mapping Data Flow
An ADLS (gen 1) Linked Service is authenticated with a Managed Identity (MSI) or a Service Principal. When authenticating with MSI, we can't use Mapping Data Flows. Will this functionality be added?
67 votes -
Improve performance of Copy Data Activity when dealing with a large number of small files
The copy performance of the ADF Copy Data Activity going from a file system source to a Blob FileSystem or Blob source is quite slow and CPU intensive relative to other copy mechanisms available when copying a large number (tens of thousands to millions) of small files (<1MB).
Both AzCopy & Azure Storage Explorer are able to complete the copy operations from the same source to the same sink approximately 3-5x faster while using less CPU than the ADF Copy Activity.
At a minimum, we would like to see performance parity with AzCopy / Azure Storage Explorer.
64 votes -
Add Support for Apache Kakfa
Add support for Apache Kafka Producer, Consumer, streams and KSQL API's into Azure Data Factory
62 votes -
In Azure Data Factory pipeline level alerts are required,Pipeline may have many activities, But single alert email should come once execute
In Azure Data Factory pipeline level alerts are required,Pipeline may have many activities(Since activity level alerts are available now . mailbox will be filled with alert emails) , So single alert email should come once the pipeline is executed
61 votes -
Data factory Pipeline to have webhook execution
It will be great if Azure data factory jobs could be executed/run by webhooks using their default schedule.
The current limitation to re-run via powershell or Azure portal is not that graceful for production environment and to be automated.
Ideally, if the job could run on http post to the webhook will be great! and will resolve many automation challenges.
Potentially this could be integrated in Azure Logic Apps.60 votes -
60 votes
-
Configure for singleton Pipeline Run
For wall clock trigger schedule, should have some property by which we can control whether to allow new run of pipeline if a previous run already in progress.
59 votes -
Remove output limitations on Web and Azure Function activities
Currently if you make a call to a web API and the JObject returned is greater than 1mb in size then the activity fails with the error:
"The length of execution ouput is over limit (around 1M currently). "
This is a big limitation and would be great if it were removed or increased.
58 votes -
support web linking with rest api pagination
REST API pagination needs to support RFC 5988 style links in the header.
Examples are ServiceNow and Greenhouse.
See: https://tools.ietf.org/html/rfc5988#page-6 for RFC
See: https://stackoverflow.com/questions/54589413/azure-data-factory-rest-api-to-service-now-pagination-issue for a related stack overflow questionGreenhouse link header example:
link →<https://harvest.greenhouse.io/v1/applications?page=2&perpage=100>; rel="next",<https://harvest.greenhouse.io/v1/applications?page=129&perpage=100>; rel="last"
Need to grab the 'next' url which is not currently possible with pagination support:
https://docs.microsoft.com/en-us/azure/data-factory/connector-rest#pagination-supportOnly way around this seems to be go outside data factory to fetch the data (e.g. databricks python) which defeats the purpose.
56 votes -
Web Activity and Rest Connector OAuth support
The usefulness of the Web Activity and the REST Connector are hamstrung without OAuth support for authentication. Many 3rd party services require this to consume.
54 votes
- Don't see your idea?