Data Factory

Azure Data Factory allows you to manage the production of trusted information by offering an easy way to create, orchestrate, and monitor data pipelines over the Hadoop ecosystem using structured, semi-structures and unstructured data sources. You can connect to your on-premises SQL Server, Azure database, tables or blobs and create data pipelines that will process the data with Hive and Pig scripting, or custom C# processing. The service offers a holistic monitoring and management experience over these pipelines, including a view of their data production and data lineage down to the source systems. The outcome of Data Factory is the transformation of raw data assets into trusted information that can be shared broadly with BI and analytics tools.

Do you have an idea, suggestion or feedback based on your experience with Azure Data Factory? We’d love to hear your thoughts.

  1. Move Activity

    Activity that copies and then deletes.

    1,164 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    49 comments  ·  Flag idea as inappropriate…  ·  Admin →
  2. Support SFTP as sink

    Support pushing data into SFTP in copy activity.

    919 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    96 comments  ·  Flag idea as inappropriate…  ·  Admin →
  3. Add Excel as source

    Add excel file as source.

    855 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    39 comments  ·  Flag idea as inappropriate…  ·  Admin →
  4. Schedule pipelines as jobs / run on pipelines on demand

    Rather than the time slice idea, allow us to schedule pipelines as jobs, the same way I would schedule an agent job to run SSIS packages. Setting availability for datasets is a very awkward way to go about this. A scheduler would be 10 times easier and more intuitive.

    Also allow users to "run" a pipeline on demand, this would make testing a lot easier.

    748 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    30 comments  ·  Flag idea as inappropriate…  ·  Admin →

    Thanks so much for your feedback! We have made great enhancements in ADFv2 to make the control flow much more flexible. Please refer to these document links on how to trigger a pipeline on-demand: https://docs.microsoft.com/en-us/azure/data-factory/delete-activity and how to create schedule trigger, tumbling trigger, and an event-based trigger: https://docs.microsoft.com/en-us/azure/data-factory/concepts-pipeline-execution-triggers#triggers

  5. XML file type in copy activity.. along with XML schema validation

    can we have a copy activity for XML files, along with validating schema of an XML file against XSD.. this would be helpful.. if schema validation is success then copy else fail the activity.. this will be useful for below scenarios..
    1. Blob to Blob
    2. Blob to SQL
    3. SQL to Blob

    if all above can work with specified schema that would be great...

    691 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    35 comments  ·  Flag idea as inappropriate…  ·  Admin →
  6. Snowflake connector as both source and sink

    Provide the capability to copy data from Blob to Snowflake data warehouse

    445 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    18 comments  ·  Flag idea as inappropriate…  ·  Admin →
  7. WYSIWIG UI

    The JSON editor is OK but is still a barrier to entry. A WYSIWIG UI based on SSIS/Machine Learning Studio would really make this easier to use.

    358 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    10 comments  ·  Flag idea as inappropriate…  ·  Admin →
  8. Allow static value columns additional to columns available in source files

    We have a requirement to delete the existing data in the SQL Azure based on some criteria. Since we dont have a way of assigning any global variable/parameter and passing this value across activities.

    We have different folders to pick up data from. Both folders will never have files at the same time. The data flow and transformation of data is same but for the same kind of work, we need to execute separate data flows (multiple datasets and pipelines/activities).

    How about allowing to define a static value for a column in Dataset/Pipeline.
    Example:

    Folder 1 data flow -> if
    321 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    15 comments  ·  Flag idea as inappropriate…  ·  Admin →
  9. Integrate with Functions

    It'd make it much easier to adopt Data Factory if it was possible to add Azure Functions activities into a Pipeline.

    You can already store a blob and make an Azure Function to trigger based on that, but having the functions directly in the pipeline source would make the Data Factory management easier. Not to mention the clarity it'd give about the Data Factory functionality.

    306 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    13 comments  ·  Flag idea as inappropriate…  ·  Admin →
  10. Use partition in filefilter and filename

    At the moment you can only use * and ? in the file filter. It would be very helpful if you could use the partitionedBy section which you can use for the folderpath in the filefilter or the filename as well.

    This would allow scenarios where you need files like myName-2015-07-01.txt where the slice date and time is part of the filename.

    268 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    7 comments  ·  Flag idea as inappropriate…  ·  Admin →

    Thank you for your feedback. This can be accomplished in ADF V2 by passing in the value of a trigger variable using pipeline parameter and dataset parameter. Then you can construct a parameterized folder path name and/or filename, e.g. myName-yyyy-MM-dd.txt in your example.

    Please refer to this article for more details: https://docs.microsoft.com/en-us/azure/data-factory/how-to-read-write-partitioned-data

  11. AAD Authentication Support for Azure SQL Database and Azure SQL Data Warehouse

    Currently, ADF supports only SQL Server authentication for Azure SQL Database and Azure SQL Data Warehouse data sources. Since both Azure SQL Database and Azure SQL Data Warehouse provide AAD authentication, ADF should start supporting this.

    229 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    4 comments  ·  Flag idea as inappropriate…  ·  Admin →
  12. Azure Data Factory - Restart an entire pipeline

    Currently in Azure Data Factory, there is no functionality to restart an entire Pipeline. If we need to refresh a dataset in Azure, all associated activities in the pipeline will have to be selected and run separately. Can we have an option where we could run the entire pipeline if required.

    195 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    5 comments  ·  Flag idea as inappropriate…  ·  Admin →
  13. Integrating with SAP, using Azure Data Factory

    Need to integrate with SAP but there is no Linked Service option for SAP.

    191 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    24 comments  ·  Flag idea as inappropriate…  ·  Admin →
  14. Add json format to linked storage service (blob storage)

    Stream analytics can write in json format to blob (line separated) but it can't be used later in data factory. this is a big miss!

    190 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    5 comments  ·  Flag idea as inappropriate…  ·  Admin →
  15. Schedule trigger: Add concurrency flag to prevent overlapping runs.

    We have a job that needs to be run throughout the day. Sometimes the job runs long, sometimes it runs short. We need to start the job as soon as it finishes. We can't use tumbling window because we disable the job at night, and when we re-enable a tumbling window, it wants to run all the jobs it missed.
    Please add a concurrency flag to the schedule trigger, or add a scheduling component to the tumbling window trigger to disable the trigger during certain times.

    158 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    2 comments  ·  Flag idea as inappropriate…  ·  Admin →
  16. Provide method to programmatically execute an ADF pipeline on-demand

    Now that we have Azure functions available that can help make batch data processing more real-time, it would be great to be able to programmatically invoke the workflow component of ADF, for immediate execution of the pipeline.

    157 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    5 comments  ·  Flag idea as inappropriate…  ·  Admin →
  17. 157 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    10 comments  ·  Flag idea as inappropriate…  ·  Admin →

    Thanks for your feedback. Now you can use ADF to copy data from FTP/s into various data stores. You are invited to give it a try.
    You can use the copy wizard to easily author the copy pipeline. And the documentation on FTP/s connector can be found at https://azure.microsoft.com/en-us/documentation/articles/data-factory-ftp-connector/.
    Note SFTP is not covered in this connector, it will be worked out later. And write to FTP is not covered now.

  18. Add configurable REST and SOAP Web Service sources, so it can ingest data from other cloud services.

    There are many cloud applications that expose data via a SOAP or REST api. Customers should be able to configure generic REST and SOAP data sources for use in Azure Data Factory. Other ELT and ETL tools such as Dell Boomi, Informatica, SSIS and Talend have this functionality.

    148 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    9 comments  ·  Flag idea as inappropriate…  ·  Admin →
  19. Data Management Gateway for multiple Data factories in a single machine

    There is a limitation to use DMG for a single DF in a machine. It cannot be shared among different DFs. we need to move to another machine if we have to use the same gateway for another DF or create a gateway to connect different on-premise sql swerver.

    137 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    11 comments  ·  Flag idea as inappropriate…  ·  Admin →
  20. Postgresql as sink

    Now with the Azure Database for Postgresql GA and available as ADF's source, really want to have it as sink as well to fulfil our data loading requirement.

    134 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    10 comments  ·  Flag idea as inappropriate…  ·  Admin →
← Previous 1 3 4 5 6 7
  • Don't see your idea?

Data Factory

Categories

Feedback and Knowledge Base