Data Factory

Azure Data Factory allows you to manage the production of trusted information by offering an easy way to create, orchestrate, and monitor data pipelines over the Hadoop ecosystem using structured, semi-structures and unstructured data sources. You can connect to your on-premises SQL Server, Azure database, tables or blobs and create data pipelines that will process the data with Hive and Pig scripting, or custom C# processing. The service offers a holistic monitoring and management experience over these pipelines, including a view of their data production and data lineage down to the source systems. The outcome of Data Factory is the transformation of raw data assets into trusted information that can be shared broadly with BI and analytics tools.

Do you have an idea, suggestion or feedback based on your experience with Azure Data Factory? We’d love to hear your thoughts.

  1. Add a new email activity with the ability to send attachments as part of the workflow.

    There are numerous instances when an output (statistics) or error file has to be mailed to administrators. Email as an activity will help in implementing this functionality

    156 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    1 comment  ·  Flag idea as inappropriate…  ·  Admin →
  2. Allow adding of additional columns to data source and defining their value programmatically

    Case study:
    While implementing ETL packages for database, I need to bulk load data from csv file into data table. I need to add data import timestamp (DateTime) column (from file name or current date for example). I cannot leave column values empty, as column is a part of a composite primary key (which requires column to be NOT NULL). I cannot also set a default value because it is not supported.

    Suggestion:
    Allow adding of additional columns to data source
    Allow to programmatically define column values (using javascript)
    Add default column values (property)

    150 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    3 comments  ·  Flag idea as inappropriate…  ·  Admin →
  3. Please add support to specify longer timeout for Web Activity

    Data Factory version 2 currently supports Web Activities with a default timeout of 1 minute:

    https://docs.microsoft.com/en-us/azure/data-factory/control-flow-web-activity

    "REST endpoints that the web activity invokes must return a response of type JSON. The activity will timeout at 1 minute with an error if it does not receive a response from the endpoint."

    Please add ability to specify a longer timeout period for complex tasks.

    140 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    3 comments  ·  Flag idea as inappropriate…  ·  Admin →
  4. Event Hub

    Source and sink.

    136 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    6 comments  ·  Flag idea as inappropriate…  ·  Admin →
  5. Allow static value columns additional to columns available in source files

    We have a requirement to delete the existing data in the SQL Azure based on some criteria. Since we dont have a way of assigning any global variable/parameter and passing this value across activities.

    We have different folders to pick up data from. Both folders will never have files at the same time. The data flow and transformation of data is same but for the same kind of work, we need to execute separate data flows (multiple datasets and pipelines/activities).

    How about allowing to define a static value for a column in Dataset/Pipeline.
    Example:
    Folder 1 data flow -> if…

    135 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    9 comments  ·  Flag idea as inappropriate…  ·  Admin →
  6. Change Data Capture feature for MySQL, Oracle and PostgreSQL

    Is it possible to have CDC features on ADF please ?

    125 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    8 comments  ·  Flag idea as inappropriate…  ·  Admin →
  7. 118 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    1 comment  ·  Flag idea as inappropriate…  ·  Admin →
  8. Add Error Handling activity

    There are situations while orchestrating control flow to fail the parent or entire pipeline based on certain error. Either ADF should add this behavior to each activity like SSIS or add another ErrorHandling activity which can fail the parent or pipeline itself.
    e.g In ForEach activity there is no way to terminate loop on failure of one iteration.

    109 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    3 comments  ·  Flag idea as inappropriate…  ·  Admin →
  9. Data factory should be able to use VNet without resorting to self hosted

    Self hosted makes a lot of sense when integrating on-premise data, however it's a shame to need to maintain a self-hosted integration runtime VM when wishing to leverage the extra security of a VNet i.e. firewalled storage accounts etc.

    Ideally the azure managed integration runtimes would be able to join a vnet on demand.

    108 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    6 comments  ·  Flag idea as inappropriate…  ·  Admin →
  10. Support for Daylight Savings Time for Trigger Schedules

    Setting up the timing of a Trigger, you need to know how far away from UTC you are so you can specify the right time. That value changes for those of us that observe Daylight Savings Time.
    The dialog box for setting up a Trigger Schedule should instead have the following three inputs:
    1) the LOCAL time you want it to run
    2) the Time Zone
    3)Adjust for DST.

    THAT is the information people have at their disposal.
    To adjust for DST, I must EDIT all my Triggers manually to ensure they run at the right hour of the day…

    107 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    1 comment  ·  Flag idea as inappropriate…  ·  Admin →
  11. Richer variable support

    Allow me to have custom variables at a pipeline and Factory level, which can be refreshed at a specified schedule from a dataset -- the closest analogue for this would be SSIS variables

    One use case for this would be for me to store a set of UTC offsets in a SQL table for each data source, and query this table at pipe runtime to retrieve the correct offset for each source. This offset can then be stored in variables for each pipeline

    104 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    4 comments  ·  Flag idea as inappropriate…  ·  Admin →
  12. NetSuite connector

    It would be great if there was a NetSuite connector

    98 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    0 comments  ·  Flag idea as inappropriate…  ·  Admin →
  13. Pause/Start Azure SQL Data Warehouse from ADF

    Pause/Start Azure SQL Data Warehouse from ADF

    96 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    4 comments  ·  Flag idea as inappropriate…  ·  Admin →
  14. Allow linking one factory to another

    I have been using the Walkthrough sample and successfully completed the exercise. This seems fairly straightforward and the entire experience of building a network of dependency between pipelines is great. This is very similar to SSIS but allows me to perform data integration @ scale with hybrid capabilities. My scenario is that we have few different teams within our organization and we need to have separate billing for each of these teams. I believe separating the subscription is the only option currently in Azure for separate billing. But we would like to allow one department to use the data of…

    88 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    planned  ·  5 comments  ·  Flag idea as inappropriate…  ·  Admin →
  15. Post-copy script in Copy Activity

    In copy activity there is a feature of pre-copy script. Similarly if there is post-copy script feature it will help to execute code post copy operation is completed from same activity.

    Traditionally when data is being copied from source sql to destination sql, the data is copied incrementally from source to temporary/stage tables/in-memory tables in destination. Post copy the merge code is executed to merge data into target table.

    If post-copy script option is provided in copy activity it will help to call the merge code from copy activity instead of calling another activity like Execute stored procedure.

    87 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    5 comments  ·  Flag idea as inappropriate…  ·  Admin →
  16. capture file name as a variable

    ADF should provide the ability to capture Input File Name and other file related parameters as a variable and pass it as input to other activities like Stored Procedure Activity, custom .NET activity etc. Currently on successful completion of ADF, I would like to store the filename in my on premise (or Azure) SQL DB. Today it is not possible with ADF SP Activity to pass the filename as input parameter.

    82 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    1 comment  ·  Flag idea as inappropriate…  ·  Admin →
  17. Add the ability to restart an activity from within a pipeline within a master pipeline in ADFv2

    If a pipeline structure is a master pipeline containing child pipelines with the activities held within these, it is not possible to restart the child pipeline and have the parent recognise when the child pipeline completes. Add the functionality to allow an activity in the child pipeline to be restarted that is then passed back to the parent pipeline when successfully completed.

    82 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    4 comments  ·  Flag idea as inappropriate…  ·  Admin →
  18. Clear errors and "unused" data slices

    There should be a option to clear old errors.
    When there is no pipeline that produces or consumes a data slice, and this slice has errors the counter still shows "current" errors, and this is not the case. I would like to remove these unused slices and their errors.

    82 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    0 comments  ·  Flag idea as inappropriate…  ·  Admin →
  19. Add Support for Maintaining Identity Column Values When Copying From/To SQL DBs

    When moving data from one SQL database to another (on prem or Azure), if there is an Identity column in the source table that has a gap (e.g. the ID's are 1, 2, 4, 5), and the destination table is empty with the same structure, those values in the destination table after copy will be 1, 2, 3, 4 rather than maintaining the values. This can cause issues when the Identity column is referenced as a foreign key.

    It would be nice to see an option to keep identity values intact, even if it means that tables for which this…

    80 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    5 comments  ·  Flag idea as inappropriate…  ·  Admin →
  20. Copy Blob with Properties and Metadata

    The copy activity does not appear to copy blob properties and custom metadata.

    Various HTTP properties like content-type, content-encoding, cache-control, etc are crucial and I'd suggest custom metadata is also if you've added it.

    Allow the copy activity from azure blob to azure blob to actually copy the entire blob including properties and metadata - not just the binary file representation.

    79 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    4 comments  ·  Flag idea as inappropriate…  ·  Admin →
  • Don't see your idea?

Data Factory

Categories

Feedback and Knowledge Base