Data Factory
Azure Data Factory allows you to manage the production of trusted information by offering an easy way to create, orchestrate, and monitor data pipelines over the Hadoop ecosystem using structured, semi-structures and unstructured data sources. You can connect to your on-premises SQL Server, Azure database, tables or blobs and create data pipelines that will process the data with Hive and Pig scripting, or custom C# processing. The service offers a holistic monitoring and management experience over these pipelines, including a view of their data production and data lineage down to the source systems. The outcome of Data Factory is the transformation of raw data assets into trusted information that can be shared broadly with BI and analytics tools.
Do you have an idea, suggestion or feedback based on your experience with Azure Data Factory? We’d love to hear your thoughts.
-
Sample query to check physical partition -
the query "Sample query to check physical partition" is not working
1 vote -
Allow to Ignore Missing Columns in Copy Activity
Please add the ability to allow (could be on/off toggle) for inserts to SQL target where there are missing destination columns that exist in source when using dynamic mapping.
Have seen this fail for JSON, SQL, and Parquet sources where won't insert anything if schema of source has columns that don't exist in target and would like to be able to choose if the activity fails.1 vote -
Add Azure Active Directory Authentication for SQL Servers
Add Azure AD as an authentication method for SQL Servers (On Prem, Azure SQL, Managed Instance), specifically "Azure Active Directory - Password" like is in SSMS.
Our company uses AD users for groups and individuals to secure SQL access and would like to move away from SQL users.
Currently ADF only support Windows Auth and SQL User.1 vote -
Can Tumbling windows Triggers also have Day Light Saving feature ?
This is most important thing in a scheduler , please enable Day Light Saving feature in Tumbling window triggers
1 vote -
support the ability to set optional dynamic properties
Often we need a way to define optional dynamic properties. for example in a copy activity I may have some tables that don't have a precopy script.
the current approach which doesn't work in all cases. is to set the value to something like select 1; a better approach would be that if the dynamic property value came through as a null or empty string or should treat it as being "unset". so in this case if the precopy script was a blank string nothing would run currently in this scenario you will get an invalid syntax error. this would…
1 vote -
Can we please Increase Web page response in Web activity of the Azure data Factory. Currently the maximum supported output response is 4MB.
Hello Team,
We are using the web activity in azure data factory. The maximum web page response we can receive it 4 MB and if it is exceeding this the pipeline is getting failed.
So, we are doing the web pagination and adjusting the count to get the web page response as 4 MB.
For example :- let's say we are having total 100 records (20 MB total Data) in one of the rest API table and problem is all the records can't be fetched as it will exceed the 4MB limit.
So by doing pagination we are taking 25…1 vote -
branch
Allow to set the branch of the databricks code executed with Data Factory (for example to be able to execute dev code instead of prod code without having to duplicate a notebook when developping new features)
1 vote -
I would like to use from iPad.
Currentry I can’t use from iPad. I can’t see string in dropdown box.
1 vote -
Debug Dataflows
Being able to debug each Activity on its own just like you would do in databricks notebook rerunning each cell. This would speed up the process of creating complex pipelines without the need of rerunning the whole pipeline. I understand that there is the data preview but somehow even with the preview errors happen while debugging.
1 vote -
Open dropdown upwards for row selection
I often struggle when I have to choose a row with the fact that the selection is at the bottom of the page. I cannot scroll further to pull it up nor can I make the window bigger as it keeps the last row at the bottom, resulting in not even one complete choice being visible. A simple workaround is to add a column at the end and then delete it again but this is tedious. This is using Edge.
1 vote -
Web Activity should support Azure AD Auth
If I protect my web app with Azure AD Authentication, the web activity in Azure Data Factory should still have a way to post to its API.
1 vote -
We need to choose a table even when we don't want a table
This seems to be a UI bug:
When using a data source to a database, such as Synapse Analytics or other databases, we need to create the linked service, the dataset and we need to specify a table on the dataset. Only after this we can, on the pipeline activity, specify we would like to use a query instead of the table.
The table on the dataset will have no use and I'm choosing any table not related to the data factory work.
The UI should allow us to create a dataset based on a query, not only specify the…
1 vote -
Rest Service doesn't support cookies
The rest service doesn't support cookies in order to retrieve data.
This creates a problem. When we need to make a web request that only works with cookies, we can't use the rest service directly.
I built a work around for these situations, but I shouldn't need to have all this work:
I use the web activity, because the web activity supports cookie
In order to save the result, I use a copy activity
The source on the copy activity needs to point to a valid folder with an empty json fileI include in the copy activity, as an…
1 vote -
Top Level Rerun From Failed Activity
rerun from failed activity should be available as a top level button on the pipeline monitoring view. This would save me 10 min a day. We run close to 25,000 activities a day and due to Databricks instability, we have transient failures on ~15-20. Clicking into the pipeline view to rerun from failed activity is an unnecessary time waster.
1 vote -
Fix and improve the UI for monitoring pipeline execution
When a pipeline executes a big number of activities, in my example, more than 5K due to foreach executions, the monitoring UI fails to show precise information.
When checking the monitoring UI for this pipeline, it doesn't show all activity executions every time. Each refresh can bring a different number of activity executions. Sometimes more, sometimes less, leaving many activity executions missing.
Besides the bug problem, it would also be useful to have a better filter for the activities on this screen
1 vote -
Sources for Direct Copy to Snowflake
Add support for direct copy from any supported ADF datasource to Snowflake. We will likely abandon ADF all together for this reason.
1 vote -
Response Headers from failed Web Activity
We should be able to get output.ADFWebActivityResponseHeaders from a web activity that fails.
When a web activity is successful, we can get the reponse header information using an expression like @activity('PrevWebActivity').output.ADFWebActivityResponseHeaders['Content-Length']. But when the web activity fails, it appears that the property ADFWebActivityResponseHeaders does not exist. However, in some cases, the response headers of failed requests do contain valuable information.
For example "https://discovery.crmreplication.azure.net/crm/exporter/aad/challenge" does return 401 Unauthorized, but we don't care because we need the value of the "WWW-Authenticate" header.
I see no reason not to include ADFWebActivityResponseHeaders in the output of a failed web activity.1 vote -
Unable to connect Azure SQL Managed Instance to ADF through AutoResolved IR using private end point
In ADF data flows, we can only use sources having AutoResolved IR connectivity.
But we are unable to connect Azure SQL Managed Instance through AutoResolveIR using private end point.
We can't use public end point as data will expose to outside network and for security reasons.
Also your new feature Managed Private Endpoints in ADF is not launch yet.
suggest a solution how to connect Azure SQL to ADF using private end point through default IR.
1 vote -
Provide a way to clear debug output history to improve UI performance
The ADF Portal becomes extremely slow and unresponsive after a debug run has completed. This is only the case when the Output tab of the debug mode is showing. If there were a way to clear Output Debug history (clearing that cache) it would improve browser performance
1 vote -
Allow wildcards in ADF Event based triggers
I'd like to be able to use wildcards in ADF event triggers. For example, if I have blobs that are being dropped with a filename of
DatasetNameDateGuid.zip
and I want to filter by DatasetName, I would like to be able to put
Orders_*.zip
in the "Blob path ends with" field.
1 vote
- Don't see your idea?