Data Factory

Azure Data Factory allows you to manage the production of trusted information by offering an easy way to create, orchestrate, and monitor data pipelines over the Hadoop ecosystem using structured, semi-structures and unstructured data sources. You can connect to your on-premises SQL Server, Azure database, tables or blobs and create data pipelines that will process the data with Hive and Pig scripting, or custom C# processing. The service offers a holistic monitoring and management experience over these pipelines, including a view of their data production and data lineage down to the source systems. The outcome of Data Factory is the transformation of raw data assets into trusted information that can be shared broadly with BI and analytics tools.

Do you have an idea, suggestion or feedback based on your experience with Azure Data Factory? We’d love to hear your thoughts.

  1. Move Activity

    Activity that copies and then deletes.

    1,164 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    49 comments  ·  Flag idea as inappropriate…  ·  Admin →
  2. Support SFTP as sink

    Support pushing data into SFTP in copy activity.

    919 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    94 comments  ·  Flag idea as inappropriate…  ·  Admin →
  3. Schedule pipelines as jobs / run on pipelines on demand

    Rather than the time slice idea, allow us to schedule pipelines as jobs, the same way I would schedule an agent job to run SSIS packages. Setting availability for datasets is a very awkward way to go about this. A scheduler would be 10 times easier and more intuitive.

    Also allow users to "run" a pipeline on demand, this would make testing a lot easier.

    748 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    30 comments  ·  Flag idea as inappropriate…  ·  Admin →

    Thanks so much for your feedback! We have made great enhancements in ADFv2 to make the control flow much more flexible. Please refer to these document links on how to trigger a pipeline on-demand: https://docs.microsoft.com/en-us/azure/data-factory/delete-activity and how to create schedule trigger, tumbling trigger, and an event-based trigger: https://docs.microsoft.com/en-us/azure/data-factory/concepts-pipeline-execution-triggers#triggers

  4. WYSIWIG UI

    The JSON editor is OK but is still a barrier to entry. A WYSIWIG UI based on SSIS/Machine Learning Studio would really make this easier to use.

    358 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    10 comments  ·  Flag idea as inappropriate…  ·  Admin →
  5. Allow static value columns additional to columns available in source files

    We have a requirement to delete the existing data in the SQL Azure based on some criteria. Since we dont have a way of assigning any global variable/parameter and passing this value across activities.

    We have different folders to pick up data from. Both folders will never have files at the same time. The data flow and transformation of data is same but for the same kind of work, we need to execute separate data flows (multiple datasets and pipelines/activities).

    How about allowing to define a static value for a column in Dataset/Pipeline.
    Example:

    Folder 1 data flow -> if
    321 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    14 comments  ·  Flag idea as inappropriate…  ·  Admin →
  6. Integrate with Functions

    It'd make it much easier to adopt Data Factory if it was possible to add Azure Functions activities into a Pipeline.

    You can already store a blob and make an Azure Function to trigger based on that, but having the functions directly in the pipeline source would make the Data Factory management easier. Not to mention the clarity it'd give about the Data Factory functionality.

    306 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    13 comments  ·  Flag idea as inappropriate…  ·  Admin →
  7. Use partition in filefilter and filename

    At the moment you can only use * and ? in the file filter. It would be very helpful if you could use the partitionedBy section which you can use for the folderpath in the filefilter or the filename as well.

    This would allow scenarios where you need files like myName-2015-07-01.txt where the slice date and time is part of the filename.

    268 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    7 comments  ·  Flag idea as inappropriate…  ·  Admin →

    Thank you for your feedback. This can be accomplished in ADF V2 by passing in the value of a trigger variable using pipeline parameter and dataset parameter. Then you can construct a parameterized folder path name and/or filename, e.g. myName-yyyy-MM-dd.txt in your example.

    Please refer to this article for more details: https://docs.microsoft.com/en-us/azure/data-factory/how-to-read-write-partitioned-data

  8. AAD Authentication Support for Azure SQL Database and Azure SQL Data Warehouse

    Currently, ADF supports only SQL Server authentication for Azure SQL Database and Azure SQL Data Warehouse data sources. Since both Azure SQL Database and Azure SQL Data Warehouse provide AAD authentication, ADF should start supporting this.

    229 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    4 comments  ·  Flag idea as inappropriate…  ·  Admin →
  9. Integrating with SAP, using Azure Data Factory

    Need to integrate with SAP but there is no Linked Service option for SAP.

    191 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    24 comments  ·  Flag idea as inappropriate…  ·  Admin →
  10. Add json format to linked storage service (blob storage)

    Stream analytics can write in json format to blob (line separated) but it can't be used later in data factory. this is a big miss!

    190 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    5 comments  ·  Flag idea as inappropriate…  ·  Admin →
  11. Schedule trigger: Add concurrency flag to prevent overlapping runs.

    We have a job that needs to be run throughout the day. Sometimes the job runs long, sometimes it runs short. We need to start the job as soon as it finishes. We can't use tumbling window because we disable the job at night, and when we re-enable a tumbling window, it wants to run all the jobs it missed.
    Please add a concurrency flag to the schedule trigger, or add a scheduling component to the tumbling window trigger to disable the trigger during certain times.

    158 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    2 comments  ·  Flag idea as inappropriate…  ·  Admin →
  12. Provide method to programmatically execute an ADF pipeline on-demand

    Now that we have Azure functions available that can help make batch data processing more real-time, it would be great to be able to programmatically invoke the workflow component of ADF, for immediate execution of the pipeline.

    157 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    5 comments  ·  Flag idea as inappropriate…  ·  Admin →
  13. 157 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    10 comments  ·  Flag idea as inappropriate…  ·  Admin →

    Thanks for your feedback. Now you can use ADF to copy data from FTP/s into various data stores. You are invited to give it a try.
    You can use the copy wizard to easily author the copy pipeline. And the documentation on FTP/s connector can be found at https://azure.microsoft.com/en-us/documentation/articles/data-factory-ftp-connector/.
    Note SFTP is not covered in this connector, it will be worked out later. And write to FTP is not covered now.

  14. Add configurable REST and SOAP Web Service sources, so it can ingest data from other cloud services.

    There are many cloud applications that expose data via a SOAP or REST api. Customers should be able to configure generic REST and SOAP data sources for use in Azure Data Factory. Other ELT and ETL tools such as Dell Boomi, Informatica, SSIS and Talend have this functionality.

    148 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    9 comments  ·  Flag idea as inappropriate…  ·  Admin →
  15. Data Management Gateway for multiple Data factories in a single machine

    There is a limitation to use DMG for a single DF in a machine. It cannot be shared among different DFs. we need to move to another machine if we have to use the same gateway for another DF or create a gateway to connect different on-premise sql swerver.

    137 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    11 comments  ·  Flag idea as inappropriate…  ·  Admin →
  16. Postgresql as sink

    Now with the Azure Database for Postgresql GA and available as ADF's source, really want to have it as sink as well to fulfil our data loading requirement.

    134 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    10 comments  ·  Flag idea as inappropriate…  ·  Admin →
  17. 126 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    8 comments  ·  Flag idea as inappropriate…  ·  Admin →
  18. Provide a folder option to manage multiple datasets and piplelines. ADF Diagram also based on Folder/Area.

    We have around 100 datasets and 10 different pipelines. This will grow to 2000 datasets and 150 pipelines in future based on business functionality, data categorization and dependency. We already see a problem in managing it, as we are not able to arrange them in a folder(collapse) and diagram for it becomes difficult to explain. If we introduce a functionality to manage all datasets/pipeline related to one area in a specific folder and also have diagram feature specific to that folder level, it will simplify it a lot. See Before and After attachment for more clarity.

    118 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    9 comments  ·  Flag idea as inappropriate…  ·  Admin →

    Thank you everyone for your feedback in this area! We have enabled the ability to create folders in ADF authoring UI: launch “Author & Monitor” from factory blade → from left nav of resource explorer → click on “+” sign to create folder for Pipelines, Datasets, Data Flows, and Templates.

    Additionally, with the rich parameterization support in ADF V2, you can use do dynamic lookup and pass in an array of values into a parameterized dataset, which drastically reduces the need to create or maintain large number of hard-coded datasets or pipelines. Please refer to this as a concrete example of using Lookup+ForEach+Copy to load from a large set of tables: https://docs.microsoft.com/en-us/azure/data-factory/tutorial-bulk-copy-portal

  19. Support MySQL as sink

    MySQL as destination data Source.

    93 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    8 comments  ·  Flag idea as inappropriate…  ·  Admin →
  20. Add Slowly Changing Dimension or Merge functionality

    With data copy activity, it will be massively helpful to have pipeline of the type - Slowly Changing Dimension capability or similar to Merge functionality , where the pipeline can perform data validation before inserting. This is one of the great features in SSIS and will be great to have it in ADF.

    92 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    2 comments  ·  Flag idea as inappropriate…  ·  Admin →
← Previous 1 3 4 5 6 7
  • Don't see your idea?

Data Factory

Categories

Feedback and Knowledge Base