Data Factory

Azure Data Factory allows you to manage the production of trusted information by offering an easy way to create, orchestrate, and monitor data pipelines over the Hadoop ecosystem using structured, semi-structures and unstructured data sources. You can connect to your on-premises SQL Server, Azure database, tables or blobs and create data pipelines that will process the data with Hive and Pig scripting, or custom C# processing. The service offers a holistic monitoring and management experience over these pipelines, including a view of their data production and data lineage down to the source systems. The outcome of Data Factory is the transformation of raw data assets into trusted information that can be shared broadly with BI and analytics tools.

Do you have an idea, suggestion or feedback based on your experience with Azure Data Factory? We’d love to hear your thoughts.

  1. Allow pipeline schedule to skip if already it is running (ADF V2)

    Please add a feature to skip the schedule if the current schedule is already running.

    For example I have a pipeline schedule for every 1 minute, if the pipeline is still running, the next schedule will start which causes the overlap in pipeline execution.

    Right now I'm updating some records in SQL table which takes time until then the next schedule is starting which is again updating the same records because the previous pipeline schedule execution is not completed.

    45 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    0 comments  ·  Flag idea as inappropriate…  ·  Admin →
  2. Allow for scheduling & running Azure Batch with Docker Containers through Azure Data Factory

    Currently it isn't possible to schedule or trigger Azure Batch with Docker Containers in Azure Data Factory (it's only possible if you use VMs on Azure Batch).
    Azure Data Factory would be a stronger product if it support this as currently one needs to set-up other scheduling to trigger Azure Batch running Dockers (e.g., Apache Airflow)

    Forum link: https://github.com/MicrosoftDocs/azure-docs/issues/16473

    44 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    0 comments  ·  Flag idea as inappropriate…  ·  Admin →
  3. Make "retryIntervalInSeconds" parameter able to accept dynamic values

    Currently the "retryIntervalInSeconds" parameter is only able to accept integer values, not pipeline variables that are integer values.

    43 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    2 comments  ·  Flag idea as inappropriate…  ·  Admin →
  4. Publish Azure Data Factory ARM templates to a custom folder in the publish branch

    Provide the ability to publish Azure Data Factory ARM templates to a custom folder in the publish branch. An additional property could be added to the publish_config.json file in order to cater for this e.g.

    {
    "publishBranch":"release/adf_publish",
    "publishFolder":"Deployment/ARM/"
    }

    https://docs.microsoft.com/en-us/azure/data-factory/source-control#configure-publishing-settings

    43 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    2 comments  ·  Flag idea as inappropriate…  ·  Admin →
  5. Add support for MongoDB v4.0 for the MongoDB connector

    The last version currently supported is 3.6

    42 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    3 comments  ·  Flag idea as inappropriate…  ·  Admin →
  6. A new activity for Cancelling the pipeline execution

    Many a times I have come across that during the execution of a pipeline (such as a variable's True or False value), we want the pipeline to "Fail".

    Currently, I am to achieve this using a If Condition Activity (where the variable's value is checked), followed by a Web Activity to trigger Rest API for cancelling the pipeline run.

    https://docs.microsoft.com/en-us/rest/api/datafactory/pipelineruns/cancel

    It would have been great if similar to "Execute Pipeline" if there can be an activity to Kill/Terminate/Cancel the pipeline's run.

    41 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    5 comments  ·  Flag idea as inappropriate…  ·  Admin →
  7. support web linking with rest api pagination

    REST API pagination needs to support RFC 5988 style links in the header.

    Examples are ServiceNow and Greenhouse.

    See: https://tools.ietf.org/html/rfc5988#page-6 for RFC
    See: https://stackoverflow.com/questions/54589413/azure-data-factory-rest-api-to-service-now-pagination-issue for a related stack overflow question

    Greenhouse link header example:

    link →<https://harvest.greenhouse.io/v1/applications?page=2&perpage=100>; rel="next",<https://harvest.greenhouse.io/v1/applications?page=129&perpage=100>; rel="last"

    Need to grab the 'next' url which is not currently possible with pagination support:
    https://docs.microsoft.com/en-us/azure/data-factory/connector-rest#pagination-support

    Only way around this seems to be go outside data factory to fetch the data (e.g. databricks python) which defeats the purpose.

    41 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    3 comments  ·  Flag idea as inappropriate…  ·  Admin →
  8. 41 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    planned  ·  7 comments  ·  Flag idea as inappropriate…  ·  Admin →
  9. Add Support for Apache Kakfa

    Add support for Apache Kafka Producer, Consumer, streams and KSQL API's into Azure Data Factory

    40 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    1 comment  ·  Flag idea as inappropriate…  ·  Admin →
  10. Add datasets to Data Flows that can support all the connectors found in Azure Data Factory

    I have a use case where the source is SAP and the sink is ODBC based and I don't want to rest the data into any intermediate staging storage. Just let it flow through. To date, Data Flows support only 5 types of datasets when defining a source or a sink (Azure Blob Storage, Azure Data Lake Storage Gen1, Azure Data Lake Storage Gen2, Azure SQL Data Warehouse, and Azure SQL Database)

    40 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    under review  ·  4 comments  ·  Flag idea as inappropriate…  ·  Admin →
  11. Add retry policy to webhook activity

    Right now it is not possible to retry a Webhook activity. Sometimes these activities fail due 'bad request' or other issues that can easily retried by manually re-running it. However this is so far manual.

    39 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    0 comments  ·  Flag idea as inappropriate…  ·  Admin →
  12. Data Catalog integration

    If Data Catalog (ADC) is to be our metadata store for helping users explore data sets it occurs to me that there out to be some sort of integration with ADF so that new data sets appear automatically and their refresh status is available so that end users know data is up to date etc. I also notice there is no specific feedback category for ADC.
    Also ADF should be able to consume data sets in ADC by populating the appropriate linked services and table

    39 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    1 comment  ·  Flag idea as inappropriate…  ·  Admin →
  13. Web Activity and Rest Connector OAuth support

    The usefulness of the Web Activity and the REST Connector are hamstrung without OAuth support for authentication. Many 3rd party services require this to consume.

    38 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    0 comments  ·  Flag idea as inappropriate…  ·  Admin →
  14. Provide better billing statistics

    Provide better cost analysis possibilities (either in azure portal or in adf). Right now it is impossible to see costs by pipelines or activities - you can only see overall cost of whole data factory instance which is quite not useful.
    Please add billing per pipeline (logic apps is a good example where you can track costs per each logic app)

    38 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    2 comments  ·  Flag idea as inappropriate…  ·  Admin →
  15. Access/Mapping the File Name during the copy process to a SQL Datatable

    I need a way to store the FileName that is been copied to a SQL Datatable mapped column. Will be great to have access to other file properties like size, rowcount, etc. But the file name will help us to work with undo processes.

    38 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    2 comments  ·  Flag idea as inappropriate…  ·  Admin →
  16. SharePoint List Destination

    Would like a target destination of a sharepoint list.

    38 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    1 comment  ·  Flag idea as inappropriate…  ·  Admin →
  17. Handle Cosmos DB 429 Errors Within Cosmos DB Connector

    In our use case we are bulk loading data to Cosmos DB and have a requirement to scale each collection up at the beginning of a load and down at the end.

    The scaling is performed by an Azure Function and we have seen issues where Cosmos DB returns a 429 error when performing metadata requests against Cosmos DB within the copy activity that comes after the Azure Function. This occurs frequently when running multiple pipelines in parallel. When a 429 error is received on a metadata request the error bubbles up and causes the pipeline to fail completely.

    Ideally…

    37 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    0 comments  ·  Flag idea as inappropriate…  ·  Admin →
  18. ForEach activity - Allow break

    Allow break a ForEach activity like ForEach works in most languages. Currently ForEach will iterate all items to end, even if we don't want it.

    If I have an error in one of the items, I may want to break ForEach, stop iterating and throw that error.

    For now, I have to use a flag variable and IF's to avoid ForEach to continue calling all the activities.

    37 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    0 comments  ·  Flag idea as inappropriate…  ·  Admin →
  19. Parameter for Azure Function App URL on ARM Template

    When you export the ARM template, the Azure Function App URL is not part of ARM parameter. This is required to make the ADF configurable and move between environments.

    37 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    4 comments  ·  Flag idea as inappropriate…  ·  Admin →
  20. Allow ORC to be used as an source/sink format in DataFlow

    We currently cannot use ORC as a source/sink type in DataFlow jobs. This requires an extra copy in to Parquet format, which can cause issues due to not having as many data types as ORC does. Allowing ORC would remove the need to perform this extra copy operation that could potentially cause data type issues.

    35 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    planned  ·  0 comments  ·  Flag idea as inappropriate…  ·  Admin →
  • Don't see your idea?

Data Factory

Categories

Feedback and Knowledge Base