Update: Microsoft will be moving away from UserVoice sites on a product-by-product basis throughout the 2021 calendar year. We will leverage 1st party solutions for customer feedback. Learn more here.

Data Factory

Azure Data Factory allows you to manage the production of trusted information by offering an easy way to create, orchestrate, and monitor data pipelines over the Hadoop ecosystem using structured, semi-structures and unstructured data sources. You can connect to your on-premises SQL Server, Azure database, tables or blobs and create data pipelines that will process the data with Hive and Pig scripting, or custom C# processing. The service offers a holistic monitoring and management experience over these pipelines, including a view of their data production and data lineage down to the source systems. The outcome of Data Factory is the transformation of raw data assets into trusted information that can be shared broadly with BI and analytics tools.

Do you have an idea, suggestion or feedback based on your experience with Azure Data Factory? We’d love to hear your thoughts.

  1. Allow copying subset of columns with implicit mapping

    A copy activity will fail if if my source has more columns than my destination. I would like to use the implicit mapping (let data factory match on column name) but have it not fail if a source column has no matching destination. For example, if I am copying from a text file in ADLS to a table in Azure SQL DB and my source file has 200 columns but I only need 20, I don't want to have to bring in all 200 fields. I also don't want to have to map them all. Instead of failing, ADF should…

    37 votes
    Vote

    We're glad you're here

    Please sign in to leave feedback

    Signed in as (Sign out)
    You have left! (?) (thinking…)
    3 comments  ·  Flag idea as inappropriate…  ·  Admin →
  2. Parametrize Blob Storage Linked Services with Dynamic Contents

    We need to dynamically choose the destination blob storage in Data Factory. By parametrizing the "Secret name" field we could accomplish this.

    This has already been implemented for some linked services.

    37 votes
    Vote

    We're glad you're here

    Please sign in to leave feedback

    Signed in as (Sign out)
    You have left! (?) (thinking…)
    0 comments  ·  Flag idea as inappropriate…  ·  Admin →
  3. Ability to delete datasets or pipelines when the resource group lock is on

    With a resource group lock on, you're not able to delete datasets or pipelines within ADF. Want to keep the resource group lock on so the resource group cannot be deleted by accident.

    36 votes
    Vote

    We're glad you're here

    Please sign in to leave feedback

    Signed in as (Sign out)
    You have left! (?) (thinking…)
    5 comments  ·  Flag idea as inappropriate…  ·  Admin →
  4. 36 votes
    Vote

    We're glad you're here

    Please sign in to leave feedback

    Signed in as (Sign out)
    You have left! (?) (thinking…)
    1 comment  ·  Flag idea as inappropriate…  ·  Admin →
  5. Parameter for Azure Function App URL on ARM Template

    When you export the ARM template, the Azure Function App URL is not part of ARM parameter. This is required to make the ADF configurable and move between environments.

    36 votes
    Vote

    We're glad you're here

    Please sign in to leave feedback

    Signed in as (Sign out)
    You have left! (?) (thinking…)
    4 comments  ·  Flag idea as inappropriate…  ·  Admin →
  6. ADF GetMetaData Sort Functionality.

    We use getMetaData to retrieve the list of BLOB's from the storage. Currently, Microsoft website does not document what sorting is implemented. Can we request to allow an option to sort based on attribute or be consistent with the sorting done by Azure Storage API where documentation states as: Blobs are listed in alphabetical order in the response body, with upper-case letters listed first.

    https://docs.microsoft.com/en-us/rest/api/storageservices/list-blobs

    36 votes
    Vote

    We're glad you're here

    Please sign in to leave feedback

    Signed in as (Sign out)
    You have left! (?) (thinking…)
    0 comments  ·  Flag idea as inappropriate…  ·  Admin →
  7. Allow Self-hosted Integration Runtime to be installed on Virtual Machine Scale Set (VMSS)

    If we use Self-Hosted Runtime, based on our scheduled activity, it might need more compute to process that and I can't think of anything better than VMSS for this as customers won't have to worry about scaling as if Runtime needs more capacity, it will automatically scale it and brings it back down later.

    36 votes
    Vote

    We're glad you're here

    Please sign in to leave feedback

    Signed in as (Sign out)
    You have left! (?) (thinking…)
    0 comments  ·  Flag idea as inappropriate…  ·  Admin →
  8. Retain GIT configuration when deploying Data Factory ARM template

    Currently when we deploy our ARM template to Data Factory V2 from VSTS Release, the GIT configuration is reset, and we have to configure it again following every deploy.

    We worked around the problem by disabling the ARM deployment task in our release.

    Retain GIT configuration when deploying Data Factory ARM template, or add the GIT configuration to ARM.

    Thanks!

    35 votes
    Vote

    We're glad you're here

    Please sign in to leave feedback

    Signed in as (Sign out)
    You have left! (?) (thinking…)
    5 comments  ·  Flag idea as inappropriate…  ·  Admin →
  9. ADF CI/CD Incremental deployment - Just the entities not the whole ADF

    I am Working on Azure DevOps and building CI/CD pipelines for Azure Data Approach Provided in "https://docs.microsoft.com/en-us/azure/data-factory/continuous-integration-deployment",
    this will Create whole ADF and its entities in test/prod environments wherever we deploy the ARM templates, but we just want to deploy the changed entities not the whole ADF.
    Factory.

    The Approach i know is "Configure the ADF with GIT ==> Merge to Master ==> Publish to adf_Publish branch ==> setup CI/CD Pipeline to use the Template & Parameter Jsons to respective test/prod environments.

    The ask is "how to deploy just the ADF Pipelines / Data sets / Linked Services…

    35 votes
    Vote

    We're glad you're here

    Please sign in to leave feedback

    Signed in as (Sign out)
    You have left! (?) (thinking…)
    3 comments  ·  Flag idea as inappropriate…  ·  Admin →
  10. Schedule Trigger vs Tumbling Window Trigger

    There is a very thin line difference between schedule trigger and tumbling window trigger and it is really hard to understand from data factory documentation. I will suggest Microsoft to improve documentation and include additional details with real-time use cases. Thanks

    35 votes
    Vote

    We're glad you're here

    Please sign in to leave feedback

    Signed in as (Sign out)
    You have left! (?) (thinking…)
    1 comment  ·  Flag idea as inappropriate…  ·  Admin →
  11. Add support for GPG file decryption

    I have a number of processes that will be sourced from GPG encrypted files, currently I have no way to natively decrypt the files and as such have to code a separate process.

    Please add an option in the copy activity to decrypt file types like GPG encrypted files.

    Keys and passphrases could be kept in my Azure KeyVault and passed in at runtime.

    35 votes
    Vote

    We're glad you're here

    Please sign in to leave feedback

    Signed in as (Sign out)
    You have left! (?) (thinking…)
    2 comments  ·  Flag idea as inappropriate…  ·  Admin →
  12. XML at sink side

    Hi,

    is there a way to store XML files at blob storage via ADF (e.g. copy activity)? At source side one can import XML files, but at sink side it is not possible yet.
    That would be a great improvement. Thanks in advance.

    Regards,
    Lukas

    35 votes
    Vote

    We're glad you're here

    Please sign in to leave feedback

    Signed in as (Sign out)
    You have left! (?) (thinking…)
    0 comments  ·  Flag idea as inappropriate…  ·  Admin →
  13. Allow ORC to be used as an source/sink format in DataFlow

    We currently cannot use ORC as a source/sink type in DataFlow jobs. This requires an extra copy in to Parquet format, which can cause issues due to not having as many data types as ORC does. Allowing ORC would remove the need to perform this extra copy operation that could potentially cause data type issues.

    35 votes
    Vote

    We're glad you're here

    Please sign in to leave feedback

    Signed in as (Sign out)
    You have left! (?) (thinking…)
    started  ·  0 comments  ·  Flag idea as inappropriate…  ·  Admin →
  14. Support overwriting data in sink

    Currently for copying data to a CosmosDB or Azure Table Storage or some other sink type, if a row/document already exists, an error is thrown. It would be very nice to be able to specify to overwrite the data in the sink with the incoming data from the source so that one can set up scheduled copy tasks from a source to a sink that always keeps the data in the sink in sync with the source, instead of failing.

    34 votes
    Vote

    We're glad you're here

    Please sign in to leave feedback

    Signed in as (Sign out)
    You have left! (?) (thinking…)
    0 comments  ·  Flag idea as inappropriate…  ·  Admin →
  15. Enable Conditional Access for Azure Data Factory

    At this time Azure Data Factory App is open to public internet. However, services like Azure Data Bricks supports Conditional Access which can prevent the App to VPN or a subnet. Make this feature available to ADF.

    34 votes
    Vote

    We're glad you're here

    Please sign in to leave feedback

    Signed in as (Sign out)
    You have left! (?) (thinking…)
    3 comments  ·  Flag idea as inappropriate…  ·  Admin →
  16. Dark mode

    Working with ADF is good experience but it can be more user friendly if ADF can support Dark mode because all white is kind of distracting.

    34 votes
    Vote

    We're glad you're here

    Please sign in to leave feedback

    Signed in as (Sign out)
    You have left! (?) (thinking…)
    1 comment  ·  Flag idea as inappropriate…  ·  Admin →
  17. Copy SQL to DocumentDB with nested objects/arrays and JSON strings

    So there are times where deeper structured data from a data is useful to place into DocumentDB documents. For example:

    select Id, Col1, Col2,
    (select * from Table2 where Table1.Id=Table2.Table1Id FOR JSON PATH) ArrayOfLinkedData,
    JSON_QUERY(Information,'$') Information -- a string storing JSON data
    from Table1

    shows nested data from a linked table Table2, and some unschema'd JSON stored in a varchar column called Information.

    At present both the array and the json stored in a string are loaded into DocumentDB as escaped strings not JSON entities. The only way we have found to handle this situation is first dropping the data…

    33 votes
    Vote

    We're glad you're here

    Please sign in to leave feedback

    Signed in as (Sign out)
    You have left! (?) (thinking…)
    1 comment  ·  Flag idea as inappropriate…  ·  Admin →
  18. Add support for custom mail alerts

    It would be nice to have the ability to send custom emails from the Monitor & Manage portal.
    When a pipeline fails I want to inform end-users that their data might not be available yet, but I don't want them to end up with an all technical email.

    32 votes
    Vote

    We're glad you're here

    Please sign in to leave feedback

    Signed in as (Sign out)
    You have left! (?) (thinking…)
    1 comment  ·  Flag idea as inappropriate…  ·  Admin →
  19. Add additional error handling details to the PolyBase option of Azure Data Factory

    in the case of 116080514509754, while transfering the data from On prem PolyBase to SQL Data Warehouse, we got the following error: Database operation failed. Error message from database execution : ErrorCode=FailedDbOperation,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=Error happened when loading data into SQL Data Warehouse.,Source=Microsoft.DataTransfer.ClientLibrary,''Type=System.Data.SqlClient.SqlException,Message=org.apache.hadoop.io.LongWritable cannot be cast to org.apache.hadoop.io.Text,Source=.Net SqlClient Data Provider,SqlErrorNumber=106000,Class=16,ErrorCode=-2146232060,State=1,Errors=[{Class=16,Number=106000,State=1,Message=org.apache.hadoop.io.LongWritable cannot be cast to org.apache.hadoop.io.Text,},],'.
    [Problem start date and time]

    no mention of the Column name involved in the error is seen here.

    32 votes
    Vote

    We're glad you're here

    Please sign in to leave feedback

    Signed in as (Sign out)
    You have left! (?) (thinking…)
    0 comments  ·  Flag idea as inappropriate…  ·  Admin →
  20. ADF to kill the queued jobs

    When something is stuck, we want to start the pipelines freshly. It would be helpful to have an option to kill the already queued up jobs and have the capability to start the pipelines from scratch again. Currently we don’t have an option other than to just wait for the queued-up jobs to get completed.

    32 votes
    Vote

    We're glad you're here

    Please sign in to leave feedback

    Signed in as (Sign out)
    You have left! (?) (thinking…)
    0 comments  ·  Flag idea as inappropriate…  ·  Admin →
1 2 5 7 9 65 66
  • Don't see your idea?

Data Factory

Categories

Feedback and Knowledge Base