Data Factory

Azure Data Factory allows you to manage the production of trusted information by offering an easy way to create, orchestrate, and monitor data pipelines over the Hadoop ecosystem using structured, semi-structures and unstructured data sources. You can connect to your on-premises SQL Server, Azure database, tables or blobs and create data pipelines that will process the data with Hive and Pig scripting, or custom C# processing. The service offers a holistic monitoring and management experience over these pipelines, including a view of their data production and data lineage down to the source systems. The outcome of Data Factory is the transformation of raw data assets into trusted information that can be shared broadly with BI and analytics tools.

Do you have an idea, suggestion or feedback based on your experience with Azure Data Factory? We’d love to hear your thoughts.

  1. create databricks cluster and that single cluster can be used in multiple databricks activity

    Hi,

    I am searching for the feature in data factory for databricks activity, suppose there is pipeline and in that pipeline there are multiple databricks activity, as of now i can make use of new job cluster to execute all the databricks activities but by doing this spin up the cluster and terminate the cluster for each activity is taking lot of time, i would like to have a functionality where i can create a cluster at the begining of the pipeline and all activities make use of the existing cluster and at the end we can terminate the cluster.…

    70 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    2 comments  ·  Flag idea as inappropriate…  ·  Admin →
  2. Web Activity should support JSON array response

    When a Web Activity calls an API that returns a JSON array as the response we get an error that says "Response Content is not a valid JObject". Please support JSON arrays as the top level of the response.

    68 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    2 comments  ·  Flag idea as inappropriate…  ·  Admin →
  3. Restore a Data Factory

    Sometimes mistakes are made - like deleting a pipeline. I should be able to restore the data factory or the pipeline. I am not finding any documentation on how to do this, so am assuming it isn't available.

    67 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    3 comments  ·  Flag idea as inappropriate…  ·  Admin →
  4. Support for Elastic database transactions

    ADF must support Elastic database transactions towards Azure SQL Database.

    This is equivalent to the on-premise scenario, where SSIS transactions use MSDTC towards SQL Server.

    Currently, if you set TransactionOption=Required on a data flow, and use an OLEDB connection to an Azure SQL Database, you receive an error like:
    "The SSIS runtime has failed to enlist the OLE DB connection in a distributed transaction with error 0x80070057 "The parameter is incorrect".

    63 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    3 comments  ·  Flag idea as inappropriate…  ·  Admin →
  5. In Azure Data Factory pipeline level alerts are required,Pipeline may have many activities, But single alert email should come once execute

    In Azure Data Factory pipeline level alerts are required,Pipeline may have many activities(Since activity level alerts are available now . mailbox will be filled with alert emails) , So single alert email should come once the pipeline is executed

    60 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    1 comment  ·  Flag idea as inappropriate…  ·  Admin →
  6. Allow MSI authentication for AzureDataLakeStore in Mapping Data Flow

    An ADLS (gen 1) Linked Service is authenticated with a Managed Identity (MSI) or a Service Principal. When authenticating with MSI, we can't use Mapping Data Flows. Will this functionality be added?

    59 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    1 comment  ·  Flag idea as inappropriate…  ·  Admin →
  7. Data factory Pipeline to have webhook execution

    It will be great if Azure data factory jobs could be executed/run by webhooks using their default schedule.
    The current limitation to re-run via powershell or Azure portal is not that graceful for production environment and to be automated.
    Ideally, if the job could run on http post to the webhook will be great! and will resolve many automation challenges.
    Potentially this could be integrated in Azure Logic Apps.

    59 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    2 comments  ·  Flag idea as inappropriate…  ·  Admin →
  8. Improve performance of Copy Data Activity when dealing with a large number of small files

    The copy performance of the ADF Copy Data Activity going from a file system source to a Blob FileSystem or Blob source is quite slow and CPU intensive relative to other copy mechanisms available when copying a large number (tens of thousands to millions) of small files (<1MB).

    Both AzCopy & Azure Storage Explorer are able to complete the copy operations from the same source to the same sink approximately 3-5x faster while using less CPU than the ADF Copy Activity.

    At a minimum, we would like to see performance parity with AzCopy / Azure Storage Explorer.

    58 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    1 comment  ·  Flag idea as inappropriate…  ·  Admin →
  9. Run containers through Data Factory custom activity

    It is currently not possible to pull down docker images and run those as tasks through Data Factory, even though this is already possible through Batch itself.

    https://github.com/MicrosoftDocs/azure-docs/issues/16473

    57 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    2 comments  ·  Flag idea as inappropriate…  ·  Admin →
  10. Azure Data Factory Dynamics 365 connector/dataset sink support Owner field

    Please provide for support on connector to sink owner field.

    57 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    4 comments  ·  Flag idea as inappropriate…  ·  Admin →
  11. Test Connection Programmatically

    Currently, Testing Connection for the linked services in ADF is only possible from the ADF GUI.
    Being able to perform this test programmatically is essential to be able to build a proper automated CI/CD pipeline for ADF and include automated connection tests.
    Therefore, Test Connection should be available via:
    - SDKs like Python, .NET, etc.
    - REST API
    - Other

    55 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    6 comments  ·  Flag idea as inappropriate…  ·  Admin →
  12. Implement functionality to rename Linked services (and possibly other components)

    Currently, the only way to rename Linked Services and other components is to delete and recreate the linked service. Doing this then requires each assosciated dataset to be updated manually.
    Functionality to rename this within the GUI tool would add value in allowing these components to be renamed with the confidence it will not break anything.

    Whilst it is possible to edit the JSON by hand, when I tried this and uploaded back into the GIT repository, it broke the connections. The behind the scenes magic seems not able to handle it.

    54 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    1 comment  ·  Flag idea as inappropriate…  ·  Admin →
  13. Configure for singleton Pipeline Run

    For wall clock trigger schedule, should have some property by which we can control whether to allow new run of pipeline if a previous run already in progress.

    54 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    3 comments  ·  Flag idea as inappropriate…  ·  Admin →
  14. Talk to the O365 OneRM team and get a copy of their O365DataTransfer program -- it does a lot of things DF needs to do.

    O365 OneRM team is solving the same problems that you are, and has a mature platform that does many of the things that Azure DF will need to do. Talk to Zach, Naveen, and Karthik in building 2. Also talk to Pramod. It'll accelerate you in terms of battle-hardened user needs and things pipeline automation has needed to do in the field. You'll want to get a copy of the bits/code.

    53 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    0 comments  ·  Flag idea as inappropriate…  ·  Admin →
  15. Mapping of column names should be case-insensitive in SQL Azure Connector of ADF

    Automatic mapping of field names should be case-insensitive in SQL Azure connector.
    In Azure SQL Datawarehouse connector fields with identical names but different case (upper-/lowercase) characters are mapped smoothly.
    Not so in Azure SQL connector. Everything must be done manually. Every refresh will void the mappings, which is rather painful.

    52 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    3 comments  ·  Flag idea as inappropriate…  ·  Admin →
  16. Google Sheets connector

    Hello,

    It would be great and very useful in my opinion if there was a Google Sheets connector.

    Thanks in advance.

    51 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    3 comments  ·  Flag idea as inappropriate…  ·  Admin →
  17. Support pulling storage account key from Azure Key Vault (not from a secret)

    When you setup Key Vault to periodically rotate the storage account key, it stores the key not in a secret but under a URI similar to https://<keyvault>.vault.azure.net/storage/<storageaccountname>

    The setup instructions for this automatic key rotation are here:
    https://docs.microsoft.com/en-us/azure/key-vault/key-vault-ovw-storage-keys#manage-storage-account-keys

    Please enhance Azure Data Factory so that you can pull the storage account key for use in a linked service from this place in Azure Key Vault. Currently ADF only supports pulling from secrets, not from storage keys in key vault.

    50 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    0 comments  ·  Flag idea as inappropriate…  ·  Admin →
  18. Remove output limitations on Web and Azure Function activities

    Currently if you make a call to a web API and the JObject returned is greater than 1mb in size then the activity fails with the error:

    "The length of execution ouput is over limit (around 1M currently). "

    This is a big limitation and would be great if it were removed or increased.

    48 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    4 comments  ·  Flag idea as inappropriate…  ·  Admin →
  19. Output files to FTP Server

    When output is ready, it should be possible to save the files to an FTP folder i.e. FTP becomes an output set as well.

    48 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    5 comments  ·  Flag idea as inappropriate…  ·  Admin →
  20. Allow Data Factory Managed identity to run Databricks notebooks

    Integrate Azure Data Factory Managed Identity in Databricks service.. like you did for Keyvault, storage, etc.

    46 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    planned  ·  3 comments  ·  Flag idea as inappropriate…  ·  Admin →
  • Don't see your idea?

Data Factory

Categories

Feedback and Knowledge Base