Data Factory

Azure Data Factory allows you to manage the production of trusted information by offering an easy way to create, orchestrate, and monitor data pipelines over the Hadoop ecosystem using structured, semi-structures and unstructured data sources. You can connect to your on-premises SQL Server, Azure database, tables or blobs and create data pipelines that will process the data with Hive and Pig scripting, or custom C# processing. The service offers a holistic monitoring and management experience over these pipelines, including a view of their data production and data lineage down to the source systems. The outcome of Data Factory is the transformation of raw data assets into trusted information that can be shared broadly with BI and analytics tools.

Do you have an idea, suggestion or feedback based on your experience with Azure Data Factory? We’d love to hear your thoughts.

  1. Web Activity should support JSON array response

    When a Web Activity calls an API that returns a JSON array as the response we get an error that says "Response Content is not a valid JObject". Please support JSON arrays as the top level of the response.

    68 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    2 comments  ·  Flag idea as inappropriate…  ·  Admin →
  2. Please add function that "get Metadata" Activity could read files and folders recursively

    I want to move files recursively with "Move files by chaining the Copy activity and the Delete activity".
    But "get Metadata" Activity" dose not work recursively.
    I hope to add a function that "get Metadata" Activity could read files and folders recursively

    ■Move files by chaining the Copy activity and the Delete activity
    https://docs.microsoft.com/en-us/azure/data-factory/delete-activity#move-files-by-chaining-the-copy-activity-and-the-delete-activity

    66 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    0 comments  ·  Flag idea as inappropriate…  ·  Admin →
  3. Allow MSI authentication for AzureDataLakeStore in Mapping Data Flow

    An ADLS (gen 1) Linked Service is authenticated with a Managed Identity (MSI) or a Service Principal. When authenticating with MSI, we can't use Mapping Data Flows. Will this functionality be added?

    64 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    1 comment  ·  Flag idea as inappropriate…  ·  Admin →
  4. Support for Elastic database transactions

    ADF must support Elastic database transactions towards Azure SQL Database.

    This is equivalent to the on-premise scenario, where SSIS transactions use MSDTC towards SQL Server.

    Currently, if you set TransactionOption=Required on a data flow, and use an OLEDB connection to an Azure SQL Database, you receive an error like:
    "The SSIS runtime has failed to enlist the OLE DB connection in a distributed transaction with error 0x80070057 "The parameter is incorrect".

    63 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    3 comments  ·  Flag idea as inappropriate…  ·  Admin →
  5. Test Connection Programmatically

    Currently, Testing Connection for the linked services in ADF is only possible from the ADF GUI.
    Being able to perform this test programmatically is essential to be able to build a proper automated CI/CD pipeline for ADF and include automated connection tests.
    Therefore, Test Connection should be available via:
    - SDKs like Python, .NET, etc.
    - REST API
    - Other

    63 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    7 comments  ·  Flag idea as inappropriate…  ·  Admin →
  6. Improve performance of Copy Data Activity when dealing with a large number of small files

    The copy performance of the ADF Copy Data Activity going from a file system source to a Blob FileSystem or Blob source is quite slow and CPU intensive relative to other copy mechanisms available when copying a large number (tens of thousands to millions) of small files (<1MB).

    Both AzCopy & Azure Storage Explorer are able to complete the copy operations from the same source to the same sink approximately 3-5x faster while using less CPU than the ADF Copy Activity.

    At a minimum, we would like to see performance parity with AzCopy / Azure Storage Explorer.

    63 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    1 comment  ·  Flag idea as inappropriate…  ·  Admin →
  7. Run containers through Data Factory custom activity

    It is currently not possible to pull down docker images and run those as tasks through Data Factory, even though this is already possible through Batch itself.

    https://github.com/MicrosoftDocs/azure-docs/issues/16473

    62 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    2 comments  ·  Flag idea as inappropriate…  ·  Admin →
  8. Allow pipeline schedule to skip if already it is running (ADF V2)

    Please add a feature to skip the schedule if the current schedule is already running.

    For example I have a pipeline schedule for every 1 minute, if the pipeline is still running, the next schedule will start which causes the overlap in pipeline execution.

    Right now I'm updating some records in SQL table which takes time until then the next schedule is starting which is again updating the same records because the previous pipeline schedule execution is not completed.

    61 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    2 comments  ·  Flag idea as inappropriate…  ·  Admin →
  9. A new activity for Cancelling the pipeline execution

    Many a times I have come across that during the execution of a pipeline (such as a variable's True or False value), we want the pipeline to "Fail".

    Currently, I am to achieve this using a If Condition Activity (where the variable's value is checked), followed by a Web Activity to trigger Rest API for cancelling the pipeline run.

    https://docs.microsoft.com/en-us/rest/api/datafactory/pipelineruns/cancel

    It would have been great if similar to "Execute Pipeline" if there can be an activity to Kill/Terminate/Cancel the pipeline's run.

    61 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    7 comments  ·  Flag idea as inappropriate…  ·  Admin →
  10. In Azure Data Factory pipeline level alerts are required,Pipeline may have many activities, But single alert email should come once execute

    In Azure Data Factory pipeline level alerts are required,Pipeline may have many activities(Since activity level alerts are available now . mailbox will be filled with alert emails) , So single alert email should come once the pipeline is executed

    61 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    1 comment  ·  Flag idea as inappropriate…  ·  Admin →
  11. Google Sheets connector

    Hello,

    It would be great and very useful in my opinion if there was a Google Sheets connector.

    Thanks in advance.

    61 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    6 comments  ·  Flag idea as inappropriate…  ·  Admin →
  12. ForEach activity - Allow break

    Allow break a ForEach activity like ForEach works in most languages. Currently ForEach will iterate all items to end, even if we don't want it.

    If I have an error in one of the items, I may want to break ForEach, stop iterating and throw that error.

    For now, I have to use a flag variable and IF's to avoid ForEach to continue calling all the activities.

    60 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    1 comment  ·  Flag idea as inappropriate…  ·  Admin →
  13. Data factory Pipeline to have webhook execution

    It will be great if Azure data factory jobs could be executed/run by webhooks using their default schedule.
    The current limitation to re-run via powershell or Azure portal is not that graceful for production environment and to be automated.
    Ideally, if the job could run on http post to the webhook will be great! and will resolve many automation challenges.
    Potentially this could be integrated in Azure Logic Apps.

    59 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    2 comments  ·  Flag idea as inappropriate…  ·  Admin →
  14. Mapping of column names should be case-insensitive in SQL Azure Connector of ADF

    Automatic mapping of field names should be case-insensitive in SQL Azure connector.
    In Azure SQL Datawarehouse connector fields with identical names but different case (upper-/lowercase) characters are mapped smoothly.
    Not so in Azure SQL connector. Everything must be done manually. Every refresh will void the mappings, which is rather painful.

    58 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    3 comments  ·  Flag idea as inappropriate…  ·  Admin →
  15. Configure for singleton Pipeline Run

    For wall clock trigger schedule, should have some property by which we can control whether to allow new run of pipeline if a previous run already in progress.

    54 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    3 comments  ·  Flag idea as inappropriate…  ·  Admin →
  16. Support pulling storage account key from Azure Key Vault (not from a secret)

    When you setup Key Vault to periodically rotate the storage account key, it stores the key not in a secret but under a URI similar to https://<keyvault>.vault.azure.net/storage/<storageaccountname>

    The setup instructions for this automatic key rotation are here:
    https://docs.microsoft.com/en-us/azure/key-vault/key-vault-ovw-storage-keys#manage-storage-account-keys

    Please enhance Azure Data Factory so that you can pull the storage account key for use in a linked service from this place in Azure Key Vault. Currently ADF only supports pulling from secrets, not from storage keys in key vault.

    53 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    0 comments  ·  Flag idea as inappropriate…  ·  Admin →
  17. Talk to the O365 OneRM team and get a copy of their O365DataTransfer program -- it does a lot of things DF needs to do.

    O365 OneRM team is solving the same problems that you are, and has a mature platform that does many of the things that Azure DF will need to do. Talk to Zach, Naveen, and Karthik in building 2. Also talk to Pramod. It'll accelerate you in terms of battle-hardened user needs and things pipeline automation has needed to do in the field. You'll want to get a copy of the bits/code.

    53 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    0 comments  ·  Flag idea as inappropriate…  ·  Admin →
  18. Remove output limitations on Web and Azure Function activities

    Currently if you make a call to a web API and the JObject returned is greater than 1mb in size then the activity fails with the error:

    "The length of execution ouput is over limit (around 1M currently). "

    This is a big limitation and would be great if it were removed or increased.

    52 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    4 comments  ·  Flag idea as inappropriate…  ·  Admin →
  19. 50 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    planned  ·  7 comments  ·  Flag idea as inappropriate…  ·  Admin →
  20. Publish Azure Data Factory ARM templates to a custom folder in the publish branch

    Provide the ability to publish Azure Data Factory ARM templates to a custom folder in the publish branch. An additional property could be added to the publish_config.json file in order to cater for this e.g.

    {
    "publishBranch":"release/adf_publish",
    "publishFolder":"Deployment/ARM/"
    }

    https://docs.microsoft.com/en-us/azure/data-factory/source-control#configure-publishing-settings

    50 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    2 comments  ·  Flag idea as inappropriate…  ·  Admin →
  • Don't see your idea?

Data Factory

Categories

Feedback and Knowledge Base