Data Factory

Azure Data Factory allows you to manage the production of trusted information by offering an easy way to create, orchestrate, and monitor data pipelines over the Hadoop ecosystem using structured, semi-structures and unstructured data sources. You can connect to your on-premises SQL Server, Azure database, tables or blobs and create data pipelines that will process the data with Hive and Pig scripting, or custom C# processing. The service offers a holistic monitoring and management experience over these pipelines, including a view of their data production and data lineage down to the source systems. The outcome of Data Factory is the transformation of raw data assets into trusted information that can be shared broadly with BI and analytics tools.

Do you have an idea, suggestion or feedback based on your experience with Azure Data Factory? We’d love to hear your thoughts.

  1. Support pulling storage account key from Azure Key Vault (not from a secret)

    When you setup Key Vault to periodically rotate the storage account key, it stores the key not in a secret but under a URI similar to https://<keyvault>.vault.azure.net/storage/<storageaccountname>

    The setup instructions for this automatic key rotation are here:
    https://docs.microsoft.com/en-us/azure/key-vault/key-vault-ovw-storage-keys#manage-storage-account-keys

    Please enhance Azure Data Factory so that you can pull the storage account key for use in a linked service from this place in Azure Key Vault. Currently ADF only supports pulling from secrets, not from storage keys in key vault.

    47 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    0 comments  ·  Flag idea as inappropriate…  ·  Admin →
  2. Allow MSI authentication for Azure SQL Database in Mapping Data Flow

    An Azure SQL Database Linked Service is authenticated with a Managed Identity (MSI) or a Service Principal. When authenticating with MSI, we can't use Mapping Data Flows. We get an error "AzureSqlDatabase does not support MSI authentication in Data Flow." Will this functionality be added?

    46 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    1 comment  ·  Flag idea as inappropriate…  ·  Admin →
  3. 44 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    5 comments  ·  Flag idea as inappropriate…  ·  Admin →
  4. Make "retryIntervalInSeconds" parameter able to accept dynamic values

    Currently the "retryIntervalInSeconds" parameter is only able to accept integer values, not pipeline variables that are integer values.

    43 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    2 comments  ·  Flag idea as inappropriate…  ·  Admin →
  5. Add a feature to copy always encrypted column data to always encrypted column of another Azure SQL databse

    Add a feature to copy always encrypted column data to always encrypted column of another Azure SQL database

    41 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    0 comments  ·  Flag idea as inappropriate…  ·  Admin →
  6. Support more complex types in Avro format, like Dictionaries and Arrays

    When trying to integrate a more complex scenario using Event Hub archive feature, I wasn't able to process this messages because Data Factory copy activity didn't support Dictionaries. When trying to use Stream Analytics writing to Avro format it didn't work because of the Arrays. More complex end-to-end scenarios should be supported.

    40 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    4 comments  ·  Flag idea as inappropriate…  ·  Admin →
  7. Allow pipeline schedule to skip if already it is running (ADF V2)

    Please add a feature to skip the schedule if the current schedule is already running.

    For example I have a pipeline schedule for every 1 minute, if the pipeline is still running, the next schedule will start which causes the overlap in pipeline execution.

    Right now I'm updating some records in SQL table which takes time until then the next schedule is starting which is again updating the same records because the previous pipeline schedule execution is not completed.

    38 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    0 comments  ·  Flag idea as inappropriate…  ·  Admin →
  8. Persist global temporary tables between activities

    It is currently not possible to access a global temporary table created by one activity from a subsequent activity.

    If this was possible you could create a pipeline with a Copy activity chained with a Stored Procedure activity with both accessing the same global temporary table. The benefit of this is that operations against database scoped temporary tables aren't logged, so you can load millions of records in seconds.

    37 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    1 comment  ·  Flag idea as inappropriate…  ·  Admin →
  9. Run containers through Data Factory custom activity

    It is currently not possible to pull down docker images and run those as tasks through Data Factory, even though this is already possible through Batch itself.

    https://github.com/MicrosoftDocs/azure-docs/issues/16473

    37 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    2 comments  ·  Flag idea as inappropriate…  ·  Admin →
  10. Azure Data Factory Dynamics 365 connector/dataset complex types

    1. How to nominate Dynamics 365 alternative key for use with “Upsert” sink. Eg. Account.Accountnumber
    2. Using sink to set “Lookup” types – when will this be available? (Ability to set CRM “EntityReference” types.) This is an URGENT requirement for ALL CRM integrations.
    3. Using sink to set “Owner” - when will this be available? This technically is the same as “Lookups”.

    37 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    3 comments  ·  Flag idea as inappropriate…  ·  Admin →
  11. Data Catalog integration

    If Data Catalog (ADC) is to be our metadata store for helping users explore data sets it occurs to me that there out to be some sort of integration with ADF so that new data sets appear automatically and their refresh status is available so that end users know data is up to date etc. I also notice there is no specific feedback category for ADC.
    Also ADF should be able to consume data sets in ADC by populating the appropriate linked services and table

    36 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    1 comment  ·  Flag idea as inappropriate…  ·  Admin →
  12. Reading XML and XLS directly using ADF components for ADF V2

    Reading XML and XLS directly using ADF components for ADF V2

    34 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    3 comments  ·  Flag idea as inappropriate…  ·  Admin →
  13. create databricks cluster and that single cluster can be used in multiple databricks activity

    Hi,

    I am searching for the feature in data factory for databricks activity, suppose there is pipeline and in that pipeline there are multiple databricks activity, as of now i can make use of new job cluster to execute all the databricks activities but by doing this spin up the cluster and terminate the cluster for each activity is taking lot of time, i would like to have a functionality where i can create a cluster at the begining of the pipeline and all activities make use of the existing cluster and at the end we can terminate the cluster.…

    34 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    2 comments  ·  Flag idea as inappropriate…  ·  Admin →
  14. Web Activity and Rest Connector OAuth support

    The usefulness of the Web Activity and the REST Connector are hamstrung without OAuth support for authentication. Many 3rd party services require this to consume.

    33 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    0 comments  ·  Flag idea as inappropriate…  ·  Admin →
  15. Add retry policy to webhook activity

    Right now it is not possible to retry a Webhook activity. Sometimes these activities fail due 'bad request' or other issues that can easily retried by manually re-running it. However this is so far manual.

    33 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    0 comments  ·  Flag idea as inappropriate…  ·  Admin →
  16. Add a Transform component with embedded Python/U-SQL support for simple conversions

    ADF should allow "in pipeline" components that convert data from one format to another. This component would be similar to "Refactor" components in other dataflow tools.

    Similar to spark's "selectExpr" component, ADF should allow data to flow through a Transform component where Python/U-SQL code is supplied by the developer to convert input names/formats/structure.

    This component should allow any number of output columns (up to a reasonable maximum) and any format supported by the implementation.

    This component should provide a lightweight compilation / syntax-validation and very basic simulation functions to enable the developer to see the component operating on specified or…

    33 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    0 comments  ·  Flag idea as inappropriate…  ·  Admin →
  17. Restore a Data Factory

    Sometimes mistakes are made - like deleting a pipeline. I should be able to restore the data factory or the pipeline. I am not finding any documentation on how to do this, so am assuming it isn't available.

    33 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    1 comment  ·  Flag idea as inappropriate…  ·  Admin →
  18. Handle Cosmos DB 429 Errors Within Cosmos DB Connector

    In our use case we are bulk loading data to Cosmos DB and have a requirement to scale each collection up at the beginning of a load and down at the end.

    The scaling is performed by an Azure Function and we have seen issues where Cosmos DB returns a 429 error when performing metadata requests against Cosmos DB within the copy activity that comes after the Azure Function. This occurs frequently when running multiple pipelines in parallel. When a 429 error is received on a metadata request the error bubbles up and causes the pipeline to fail completely.

    Ideally…

    32 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    0 comments  ·  Flag idea as inappropriate…  ·  Admin →
  19. Output files to FTP Server

    When output is ready, it should be possible to save the files to an FTP folder i.e. FTP becomes an output set as well.

    32 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    2 comments  ·  Flag idea as inappropriate…  ·  Admin →
  20. Allow for scheduling & running Azure Batch with Docker Containers through Azure Data Factory

    Currently it isn't possible to schedule or trigger Azure Batch with Docker Containers in Azure Data Factory (it's only possible if you use VMs on Azure Batch).
    Azure Data Factory would be a stronger product if it support this as currently one needs to set-up other scheduling to trigger Azure Batch running Dockers (e.g., Apache Airflow)

    Forum link: https://github.com/MicrosoftDocs/azure-docs/issues/16473

    32 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    0 comments  ·  Flag idea as inappropriate…  ·  Admin →
  • Don't see your idea?

Data Factory

Categories

Feedback and Knowledge Base