Data Factory

Azure Data Factory allows you to manage the production of trusted information by offering an easy way to create, orchestrate, and monitor data pipelines over the Hadoop ecosystem using structured, semi-structures and unstructured data sources. You can connect to your on-premises SQL Server, Azure database, tables or blobs and create data pipelines that will process the data with Hive and Pig scripting, or custom C# processing. The service offers a holistic monitoring and management experience over these pipelines, including a view of their data production and data lineage down to the source systems. The outcome of Data Factory is the transformation of raw data assets into trusted information that can be shared broadly with BI and analytics tools.

Do you have an idea, suggestion or feedback based on your experience with Azure Data Factory? We’d love to hear your thoughts.

  1. Move Activity

    Activity that copies and then deletes.

    1,164 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    49 comments  ·  Flag idea as inappropriate…  ·  Admin →
  2. Schedule pipelines as jobs / run on pipelines on demand

    Rather than the time slice idea, allow us to schedule pipelines as jobs, the same way I would schedule an agent job to run SSIS packages. Setting availability for datasets is a very awkward way to go about this. A scheduler would be 10 times easier and more intuitive.

    Also allow users to "run" a pipeline on demand, this would make testing a lot easier.

    748 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    30 comments  ·  Flag idea as inappropriate…  ·  Admin →

    Thanks so much for your feedback! We have made great enhancements in ADFv2 to make the control flow much more flexible. Please refer to these document links on how to trigger a pipeline on-demand: https://docs.microsoft.com/en-us/azure/data-factory/delete-activity and how to create schedule trigger, tumbling trigger, and an event-based trigger: https://docs.microsoft.com/en-us/azure/data-factory/concepts-pipeline-execution-triggers#triggers

  3. WYSIWIG UI

    The JSON editor is OK but is still a barrier to entry. A WYSIWIG UI based on SSIS/Machine Learning Studio would really make this easier to use.

    358 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    10 comments  ·  Flag idea as inappropriate…  ·  Admin →
  4. Integrate with Functions

    It'd make it much easier to adopt Data Factory if it was possible to add Azure Functions activities into a Pipeline.

    You can already store a blob and make an Azure Function to trigger based on that, but having the functions directly in the pipeline source would make the Data Factory management easier. Not to mention the clarity it'd give about the Data Factory functionality.

    306 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    13 comments  ·  Flag idea as inappropriate…  ·  Admin →
  5. Use partition in filefilter and filename

    At the moment you can only use * and ? in the file filter. It would be very helpful if you could use the partitionedBy section which you can use for the folderpath in the filefilter or the filename as well.

    This would allow scenarios where you need files like myName-2015-07-01.txt where the slice date and time is part of the filename.

    268 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    7 comments  ·  Flag idea as inappropriate…  ·  Admin →

    Thank you for your feedback. This can be accomplished in ADF V2 by passing in the value of a trigger variable using pipeline parameter and dataset parameter. Then you can construct a parameterized folder path name and/or filename, e.g. myName-yyyy-MM-dd.txt in your example.

    Please refer to this article for more details: https://docs.microsoft.com/en-us/azure/data-factory/how-to-read-write-partitioned-data

  6. AAD Authentication Support for Azure SQL Database and Azure SQL Data Warehouse

    Currently, ADF supports only SQL Server authentication for Azure SQL Database and Azure SQL Data Warehouse data sources. Since both Azure SQL Database and Azure SQL Data Warehouse provide AAD authentication, ADF should start supporting this.

    229 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    4 comments  ·  Flag idea as inappropriate…  ·  Admin →
  7. Integrating with SAP, using Azure Data Factory

    Need to integrate with SAP but there is no Linked Service option for SAP.

    191 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    23 comments  ·  Flag idea as inappropriate…  ·  Admin →
  8. Add json format to linked storage service (blob storage)

    Stream analytics can write in json format to blob (line separated) but it can't be used later in data factory. this is a big miss!

    190 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    5 comments  ·  Flag idea as inappropriate…  ·  Admin →
  9. Schedule trigger: Add concurrency flag to prevent overlapping runs.

    We have a job that needs to be run throughout the day. Sometimes the job runs long, sometimes it runs short. We need to start the job as soon as it finishes. We can't use tumbling window because we disable the job at night, and when we re-enable a tumbling window, it wants to run all the jobs it missed.
    Please add a concurrency flag to the schedule trigger, or add a scheduling component to the tumbling window trigger to disable the trigger during certain times.

    158 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    2 comments  ·  Flag idea as inappropriate…  ·  Admin →
  10. Provide method to programmatically execute an ADF pipeline on-demand

    Now that we have Azure functions available that can help make batch data processing more real-time, it would be great to be able to programmatically invoke the workflow component of ADF, for immediate execution of the pipeline.

    157 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    5 comments  ·  Flag idea as inappropriate…  ·  Admin →
  11. 157 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    10 comments  ·  Flag idea as inappropriate…  ·  Admin →

    Thanks for your feedback. Now you can use ADF to copy data from FTP/s into various data stores. You are invited to give it a try.
    You can use the copy wizard to easily author the copy pipeline. And the documentation on FTP/s connector can be found at https://azure.microsoft.com/en-us/documentation/articles/data-factory-ftp-connector/.
    Note SFTP is not covered in this connector, it will be worked out later. And write to FTP is not covered now.

  12. Add configurable REST and SOAP Web Service sources, so it can ingest data from other cloud services.

    There are many cloud applications that expose data via a SOAP or REST api. Customers should be able to configure generic REST and SOAP data sources for use in Azure Data Factory. Other ELT and ETL tools such as Dell Boomi, Informatica, SSIS and Talend have this functionality.

    148 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    8 comments  ·  Flag idea as inappropriate…  ·  Admin →
  13. Data Management Gateway for multiple Data factories in a single machine

    There is a limitation to use DMG for a single DF in a machine. It cannot be shared among different DFs. we need to move to another machine if we have to use the same gateway for another DF or create a gateway to connect different on-premise sql swerver.

    137 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    11 comments  ·  Flag idea as inappropriate…  ·  Admin →
  14. Postgresql as sink

    Now with the Azure Database for Postgresql GA and available as ADF's source, really want to have it as sink as well to fulfil our data loading requirement.

    134 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    6 comments  ·  Flag idea as inappropriate…  ·  Admin →
  15. 126 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    8 comments  ·  Flag idea as inappropriate…  ·  Admin →
  16. Provide a folder option to manage multiple datasets and piplelines. ADF Diagram also based on Folder/Area.

    We have around 100 datasets and 10 different pipelines. This will grow to 2000 datasets and 150 pipelines in future based on business functionality, data categorization and dependency. We already see a problem in managing it, as we are not able to arrange them in a folder(collapse) and diagram for it becomes difficult to explain. If we introduce a functionality to manage all datasets/pipeline related to one area in a specific folder and also have diagram feature specific to that folder level, it will simplify it a lot. See Before and After attachment for more clarity.

    118 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    9 comments  ·  Flag idea as inappropriate…  ·  Admin →

    Thank you everyone for your feedback in this area! We have enabled the ability to create folders in ADF authoring UI: launch “Author & Monitor” from factory blade → from left nav of resource explorer → click on “+” sign to create folder for Pipelines, Datasets, Data Flows, and Templates.

    Additionally, with the rich parameterization support in ADF V2, you can use do dynamic lookup and pass in an array of values into a parameterized dataset, which drastically reduces the need to create or maintain large number of hard-coded datasets or pipelines. Please refer to this as a concrete example of using Lookup+ForEach+Copy to load from a large set of tables: https://docs.microsoft.com/en-us/azure/data-factory/tutorial-bulk-copy-portal

  17. Support MySQL as sink

    MySQL as destination data Source.

    93 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    8 comments  ·  Flag idea as inappropriate…  ·  Admin →
  18. Add Slowly Changing Dimension or Merge functionality

    With data copy activity, it will be massively helpful to have pipeline of the type - Slowly Changing Dimension capability or similar to Merge functionality , where the pipeline can perform data validation before inserting. This is one of the great features in SSIS and will be great to have it in ADF.

    92 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    2 comments  ·  Flag idea as inappropriate…  ·  Admin →
  19. Support pulling connection strings and credentials from an Azure Key Vault

    We're using Azure Key Vault to manage all of our secrets and credentials for consumption by client applications. It would be nice if we could add Key Vaults as a source of connection strings or other variables in our JSON, and it would be populated by the Data Factory service at deploy/runtime.

    90 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    2 comments  ·  Flag idea as inappropriate…  ·  Admin →
  20. Add support for SFTP

    Currently only FTPs is supported. Please add support for SFTP both as source and sink

    86 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    5 comments  ·  Flag idea as inappropriate…  ·  Admin →
← Previous 1 3 4 5 6
  • Don't see your idea?

Data Factory

Categories

Feedback and Knowledge Base