Data Factory
Azure Data Factory allows you to manage the production of trusted information by offering an easy way to create, orchestrate, and monitor data pipelines over the Hadoop ecosystem using structured, semi-structures and unstructured data sources. You can connect to your on-premises SQL Server, Azure database, tables or blobs and create data pipelines that will process the data with Hive and Pig scripting, or custom C# processing. The service offers a holistic monitoring and management experience over these pipelines, including a view of their data production and data lineage down to the source systems. The outcome of Data Factory is the transformation of raw data assets into trusted information that can be shared broadly with BI and analytics tools.
Do you have an idea, suggestion or feedback based on your experience with Azure Data Factory? We’d love to hear your thoughts.
-
Bitbucket Integration
We need to use bitbucket for a project. We are mirroring our azure devops repo with the pipelines to bitbucket. It would be easier if there was integration with bitbucket.
106 votes -
Azure Data Factory - Google Analytics Connector
Some customers have the necessity to extract information from Google Analytics in order to create a data lake or sql dw to gather marketing insights mixing another kind of data.
Now we have some custom SSIS packages that are paid or developing some custom code.
Or if it is not possible in Azure Data Factory, could have anoter way to extract this data in native connector in Azure … maybe Logic Apps
59 votes -
Allow choosing logical AND or logical OR in activity dependencies
We have activity dependencies today, but they are always logical AND. If we have Activity1 -> Activity 2 -> Activity3 and we want to say if any of these activities fail, run activity 4, it isn't straight forward. In SSIS, we can choose an expression and choose whether we need one or all conditions to be true when there are multiple constraints. We need similar functionality here. It can be achieved with a bit of creativity (repeat the failure activity as the single failure path after each of the original activities use the If Condition to write logic that would…
52 votes -
Increase activity name length limit
In ETL/ELT scenario's we generate ADF pipelines per entity that we ingest. We give each pipeline the name of the entity that we ingest. In some cases this can be a long name. We also generate a orchestration pipeline that executes all individual pipelines per table. However activity names have a relatively short limit (55 characters), this complicates the generation of the said pipeline. Please increase the allowed length of the activity name.
25 votes -
RESIZE ACTIVITY
Allow resizing of Activities in the designer, or add an Annotation element so we can mark up the workflow.
The current display does not show enough characters to differentiate similarly-named Activities. Many teams use prefixes or suffixes to name things, and this designer needs to enable us do resize Activities.
13 votes -
Please add support to specify longer timeout for Web Activity
Data Factory version 2 currently supports Web Activities with a default timeout of 1 minute:
https://docs.microsoft.com/en-us/azure/data-factory/control-flow-web-activity
"REST endpoints that the web activity invokes must return a response of type JSON. The activity will timeout at 1 minute with an error if it does not receive a response from the endpoint."
Please add ability to specify a longer timeout period for complex tasks.
73 votes -
Azure Stack blob storage support
Add Azure Stack capabilities for blob storage transfers /integration. This moment only Azure integrations are possible. Even when trying SAS connection strings
13 votes -
Allow ForEach activity to work at full batchCount while inner activities are waiting to retry
Currently, I can run a ForEach activity with batchCount 3 to allow processing 3 inner activities in parallel. However, when one of the inner activities fail and need to e.g. wait 10 minutes to retry, only 2 inner activities are performed in parallel.
It would be great if one could simply put the activities waiting to retry "to the side" and allow for all 3 "threads" to work during the retry interval. As I understand, the retryIntervalInSeconds is anyway only an "at the earliest" value.
15 votes -
Add Google Analytics connector to use in a data pipeline
Hi, inside ADF it should be very useful to have a Google Analytics connector to use as a data source for a data pipeline (e.g. for the copy task). Thanks
95 votes -
Run containers through Data Factory custom activity
It is currently not possible to pull down docker images and run those as tasks through Data Factory, even though this is already possible through Batch itself.
23 votes -
Identify IP Address of Data Factory
It is not currently possible to identify the IP Address of the DF, which you need for firewall rules, including Azure SQL Server firewall....
612 votes -
Support ADF Projects in Visual Studio 2017
Currently Visual Studio 2017 does not support Azure Data Factory projects.
Despite the Azure SDK now being included in VS2017 with all other services the ADF project files aren't.
Can you please include this feature so developers can upgrade from VS2015?
Thanks
1,920 votes -
ADF v2 - Rerun from the point of failure OR Ability to rerun activities
ADF v2 - Rerun from the point of failure OR Ability to rerun activities
When we chain multiple child pipelines using a Master pipeline and if the master pipeline fails - There should be an option to start the master pipeline from the point of failure.
Example Master pipeline 1 - 2 - 3 - 4, there are 4 Execute Pipeline activities chained in the Master pipeline - for some reason 3 fails. Assume the issue fixed. When we rerun from Monitor portal the pipeline should execute from 3 and not from 1.
80 votes -
Parametrize PostgreSQL Linked Services with Dynamic Contents
Parametrize PostgreSQL linked services with dynamic contents, as it is already done for SQL Server, ORACLE [1]
we need to address 50 PostgreSQL servers having the same data structure. How can we dynamically choose the server ?
[1] https://docs.microsoft.com/en-us/azure/data-factory/parameterize-linked-services
18 votes -
Support extracting contents from TAR file
My source gives me a file that is compressed and packaged as .tar.gz. Currently Azure data factory can only handle the decompression step, and not unpacking the tar file. I think will have to write a custom activity to handle this now.
59 votes -
Azure Data Factory Self-hosted IR in Service Fabric clusters
Support running the ADF Self-hosted IR in Service Fabric clusters. This will require support for non Windows Service hosting modes. Being able to run the Self-hosted IR from command line is all that is required.
7 votes -
Web Activity should support JSON array response
When a Web Activity calls an API that returns a JSON array as the response we get an error that says "Response Content is not a valid JObject". Please support JSON arrays as the top level of the response.
66 votes -
Allow copying subset of columns with implicit mapping
A copy activity will fail if if my source has more columns than my destination. I would like to use the implicit mapping (let data factory match on column name) but have it not fail if a source column has no matching destination. For example, if I am copying from a text file in ADLS to a table in Azure SQL DB and my source file has 200 columns but I only need 20, I don't want to have to bring in all 200 fields. I also don't want to have to map them all. Instead of failing, ADF should…
16 votes -
Ability to parameterize the url property on a http linked service
Currently there does not seem to be any way to do this. The service I want to connect to has different versions deployed to different environments.
5 votes -
deactivate activities within a pipeline
Sometimes it makes sense to deactivate specific activities in an ADF Pipeline, e.g. after a run has failed and you do not want to start it over from the beginning but from the failed step. Second use case I am thinking of would be a pipeline that usually acts in delta mode, but for a historic full load needs to perform some cleanup steps. These could be deactivated for the daily processing and be activated for the full load. In SSIS in contrast, we use this feature quite often due to the fact, that you do not need additional condition…
2 votes
- Don't see your idea?