Data Factory

Azure Data Factory allows you to manage the production of trusted information by offering an easy way to create, orchestrate, and monitor data pipelines over the Hadoop ecosystem using structured, semi-structures and unstructured data sources. You can connect to your on-premises SQL Server, Azure database, tables or blobs and create data pipelines that will process the data with Hive and Pig scripting, or custom C# processing. The service offers a holistic monitoring and management experience over these pipelines, including a view of their data production and data lineage down to the source systems. The outcome of Data Factory is the transformation of raw data assets into trusted information that can be shared broadly with BI and analytics tools.

Do you have an idea, suggestion or feedback based on your experience with Azure Data Factory? We’d love to hear your thoughts.

  1. Fault Tolerance - Skip Incompatible Columns

    I am loading data dynamically from multiple SQL Server databases. All of the databases share the same table structure. However, there may be some exceptions where one of the databases has a missing column in one of the tables.

    When such a situation occurs, I would like to set the Fault Tolerance on to Skip the incompatible column. Meaning, instead of it failing or skipping all the rows (columns), it should skip the single column instead.

    That way, instead of losing all the columns, I lose one column, which is not used anyways since it never exists.

    96 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    0 comments  ·  Flag idea as inappropriate…  ·  Admin →
  2. Support more complex types in Avro format, like Dictionaries and Arrays

    When trying to integrate a more complex scenario using Event Hub archive feature, I wasn't able to process this messages because Data Factory copy activity didn't support Dictionaries. When trying to use Stream Analytics writing to Avro format it didn't work because of the Arrays. More complex end-to-end scenarios should be supported.

    93 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    planned  ·  6 comments  ·  Flag idea as inappropriate…  ·  Admin →
  3. Ability to name activities in Data Factory dynamically

    When copying data from e.g. Azure SQL db to Azure DWH, you may want to use the FOREACH iterator similar to the pattern described in the tutorial at https://docs.microsoft.com/en-us/azure/data-factory/tutorial-bulk-copy.
    The downside of the this approach is that the logging in the monitor window is somewhat useless because you cannot see what activity has failed because they are all named the same (but with different RunId's, of course).

    If would be much better if the name of the activity could be named during runtime, e.g CopyTableA, CopyTableB, CopyTableC instead of CopyTable, CopyTable, CopyTable.

    92 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    12 comments  ·  Flag idea as inappropriate…  ·  Admin →
  4. Allow linking one factory to another

    I have been using the Walkthrough sample and successfully completed the exercise. This seems fairly straightforward and the entire experience of building a network of dependency between pipelines is great. This is very similar to SSIS but allows me to perform data integration @ scale with hybrid capabilities. My scenario is that we have few different teams within our organization and we need to have separate billing for each of these teams. I believe separating the subscription is the only option currently in Azure for separate billing. But we would like to allow one department to use the data of…

    92 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    planned  ·  5 comments  ·  Flag idea as inappropriate…  ·  Admin →
  5. No functionality for scheduling a trigger on basis of Holiday list in calender

    Functionality not present to schedule the trigger for running pipeline on every 2nd Working Day of month. No option to include the Holiday list for the current year in the scheduler.

    91 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    0 comments  ·  Flag idea as inappropriate…  ·  Admin →
  6. Get Metadata for Multiple Files Matching Wildcard

    The Get Metadata Activity is not useful when there is a wild card in the dataset file path. Could the Get MetaData Activity be expanded to return the number of files found in the dataset, and an array of metadata values? We want to use this information to decide whether to continue with the remainder of the pipeline based on whether any files satisfy the wild card.

    89 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    2 comments  ·  Flag idea as inappropriate…  ·  Admin →
  7. Add Support for Maintaining Identity Column Values When Copying From/To SQL DBs

    When moving data from one SQL database to another (on prem or Azure), if there is an Identity column in the source table that has a gap (e.g. the ID's are 1, 2, 4, 5), and the destination table is empty with the same structure, those values in the destination table after copy will be 1, 2, 3, 4 rather than maintaining the values. This can cause issues when the Identity column is referenced as a foreign key.

    It would be nice to see an option to keep identity values intact, even if it means that tables for which this…

    88 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    9 comments  ·  Flag idea as inappropriate…  ·  Admin →
  8. Support extracting contents from TAR file

    My source gives me a file that is compressed and packaged as .tar.gz. Currently Azure data factory can only handle the decompression step, and not unpacking the tar file. I think will have to write a custom activity to handle this now.

    87 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    2 comments  ·  Flag idea as inappropriate…  ·  Admin →
  9. Amazon S3 sink

    We'd really need the ability to write to S3 rather than just read.

    Many larger clients (big groups with multiple IT departments) will often have both Azure and Amazon and ADF is getting disqualified from the benchmarks against Talend Online and Matilion because won't push to other cloud services...

    Please help ^^

    87 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    4 comments  ·  Flag idea as inappropriate…  ·  Admin →
  10. Refer to Azure Key Vault secrets in dynamic content

    If I need to crawl a restful API which is protected with an API key, the only way to set that key is by injecting an additional header on the dataset level. This key is stored in clear text, which is poor security.

    To make matters worse, if git integration is enabled, that key is even committed into version control.

    There should be a way to fetch values from Azure Key Vault elsewhere than just for setting up linked services. Alternatively, the REST linked service should support authentication with an API key.

    87 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    2 comments  ·  Flag idea as inappropriate…  ·  Admin →
  11. Elasticsearch

    source and sink.

    86 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    3 comments  ·  Flag idea as inappropriate…  ·  Admin →
  12. Disable and enable data factory triggers for DevOps release pipeline

    When using devops release pipelines for continuous deployment of a data factory, currently you have to manually stop and start the triggers in the target data factory. the provided powershell solution from the official docs doesn't work (anymore?). The triggers should be updated automatically on deployment https://docs.microsoft.com/en-us/azure/data-factory/continuous-integration-deployment#update-active-triggers

    85 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    1 comment  ·  Flag idea as inappropriate…  ·  Admin →
  13. Rename objects in the portal

    Provide the ability to rename all objects and update their associated scripts. Right now deleting a dataset removes its slice history which can get very problematic.

    The ability to update the dataset's name and availability without having to recreate it would be very useful.

    83 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    5 comments  ·  Flag idea as inappropriate…  ·  Admin →
  14. Support SQL Database Always Encrypted sources or destinations

    With the recent increase with privacy and security concerns, namely GDPR, the need for using Always Encrypted on SQL Server or Azure SQL Database is also increasing. The problem is that in the moment that we enable this security features in SQL we can't use ADF anymore as the Dara Flow orchestration. Without this feature more secure enterprise scenarios are being left out.

    82 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    3 comments  ·  Flag idea as inappropriate…  ·  Admin →
  15. Clear errors and "unused" data slices

    There should be a option to clear old errors.
    When there is no pipeline that produces or consumes a data slice, and this slice has errors the counter still shows "current" errors, and this is not the case. I would like to remove these unused slices and their errors.

    82 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    0 comments  ·  Flag idea as inappropriate…  ·  Admin →
  16. Support Azure app service API

    Can it consume or push data to Azure app service API? Supporting Swagger API.

    79 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    0 comments  ·  Flag idea as inappropriate…  ·  Admin →
  17. create databricks cluster and that single cluster can be used in multiple databricks activity

    Hi,

    I am searching for the feature in data factory for databricks activity, suppose there is pipeline and in that pipeline there are multiple databricks activity, as of now i can make use of new job cluster to execute all the databricks activities but by doing this spin up the cluster and terminate the cluster for each activity is taking lot of time, i would like to have a functionality where i can create a cluster at the begining of the pipeline and all activities make use of the existing cluster and at the end we can terminate the cluster.…

    74 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    3 comments  ·  Flag idea as inappropriate…  ·  Admin →
  18. Support PATCH method in Web Activity

    Some Azure REST APIs and other third parties APIs use the PATCH method.

    Please add support for this method or make the method parameter a string so that we can use any method.

    73 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    3 comments  ·  Flag idea as inappropriate…  ·  Admin →
  19. Restore a Data Factory

    Sometimes mistakes are made - like deleting a pipeline. I should be able to restore the data factory or the pipeline. I am not finding any documentation on how to do this, so am assuming it isn't available.

    73 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    3 comments  ·  Flag idea as inappropriate…  ·  Admin →
  20. Web Activity should support JSON array response

    When a Web Activity calls an API that returns a JSON array as the response we get an error that says "Response Content is not a valid JObject". Please support JSON arrays as the top level of the response.

    68 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    2 comments  ·  Flag idea as inappropriate…  ·  Admin →
  • Don't see your idea?

Data Factory

Categories

Feedback and Knowledge Base