Data Factory

Azure Data Factory allows you to manage the production of trusted information by offering an easy way to create, orchestrate, and monitor data pipelines over the Hadoop ecosystem using structured, semi-structures and unstructured data sources. You can connect to your on-premises SQL Server, Azure database, tables or blobs and create data pipelines that will process the data with Hive and Pig scripting, or custom C# processing. The service offers a holistic monitoring and management experience over these pipelines, including a view of their data production and data lineage down to the source systems. The outcome of Data Factory is the transformation of raw data assets into trusted information that can be shared broadly with BI and analytics tools.

Do you have an idea, suggestion or feedback based on your experience with Azure Data Factory? We’d love to hear your thoughts.

  1. Parameter for Azure Function App URL on ARM Template

    When you export the ARM template, the Azure Function App URL is not part of ARM parameter. This is required to make the ADF configurable and move between environments.

    35 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    4 comments  ·  Flag idea as inappropriate…  ·  Admin →
  2. Allow ORC to be used as an source/sink format in DataFlow

    We currently cannot use ORC as a source/sink type in DataFlow jobs. This requires an extra copy in to Parquet format, which can cause issues due to not having as many data types as ORC does. Allowing ORC would remove the need to perform this extra copy operation that could potentially cause data type issues.

    35 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    started  ·  0 comments  ·  Flag idea as inappropriate…  ·  Admin →
  3. Allow parameterizing Azure key vault secret names

    I would like to be able set secret name as parameter. Now it does allow me to "add dynamic content" but when I do try to add actual parameter to keyvault secret name it does not provide me ability to do so. There is a bug or this feature is limited? At least this happens when trying to parameterize ADF SSIS IR package parameters

    34 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    2 comments  ·  Flag idea as inappropriate…  ·  Admin →
  4. 33 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    1 comment  ·  Flag idea as inappropriate…  ·  Admin →
  5. Execute Pipeline activity automatic rerun

    Possibility to automatically rerun the related pipeline when a failure occurs.

    This is to help cases where a single activity rerun will not get the pipeline on track, for example, when data must be submitted again from the beginning. In these cases, it might be necessary to rerun the complete pipeline.

    As of today, the Execute Pipeline activity does not have possibility to specify the number of retries that can be executed before the activity is set to failed.

    The workaround to implement a solution involves several components and seems unnecessarily complex.

    The attached picture describes a linear pipeline including…

    33 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    0 comments  ·  Flag idea as inappropriate…  ·  Admin →
  6. Copy SQL to DocumentDB with nested objects/arrays and JSON strings

    So there are times where deeper structured data from a data is useful to place into DocumentDB documents. For example:

    select Id, Col1, Col2,
    (select * from Table2 where Table1.Id=Table2.Table1Id FOR JSON PATH) ArrayOfLinkedData,
    JSON_QUERY(Information,'$') Information -- a string storing JSON data
    from Table1

    shows nested data from a linked table Table2, and some unschema'd JSON stored in a varchar column called Information.

    At present both the array and the json stored in a string are loaded into DocumentDB as escaped strings not JSON entities. The only way we have found to handle this situation is first dropping the data…

    33 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    1 comment  ·  Flag idea as inappropriate…  ·  Admin →
  7. Copy data - Parquet files - Support file copying when table has white space in column name

    The documentation says that white space in column name is not supported for parquet files, but I would like to suggest implementing this feature. When too many tables are copied at once it is difficult to handle case by case because Data Factory does not support white space in column name for parquet files.

    Documentation: https://docs.microsoft.com/en-us/azure/data-factory/format-parquet#

    Check attached file for details.

    Regards,
    Cristina

    32 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    1 comment  ·  Flag idea as inappropriate…  ·  Admin →
  8. Enable Conditional Access for Azure Data Factory

    At this time Azure Data Factory App is open to public internet. However, services like Azure Data Bricks supports Conditional Access which can prevent the App to VPN or a subnet. Make this feature available to ADF.

    31 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    3 comments  ·  Flag idea as inappropriate…  ·  Admin →
  9. Connector for IoTHub Device Registry

    Having the possibility to sync the device registry (State, ConnectionState, Thumbprints (for backup scenarios), DeviceTwin Properties, etc.) would allow many interesting use cases and integration scenarios. For example, we would like to sync the LastActivityDate of all devices to our datawarehouse once every minute.

    31 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    1 comment  ·  Flag idea as inappropriate…  ·  Admin →
  10. Add support for custom mail alerts

    It would be nice to have the ability to send custom emails from the Monitor & Manage portal.
    When a pipeline fails I want to inform end-users that their data might not be available yet, but I don't want them to end up with an all technical email.

    31 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    1 comment  ·  Flag idea as inappropriate…  ·  Admin →
  11. Allow copying subset of columns with implicit mapping

    A copy activity will fail if if my source has more columns than my destination. I would like to use the implicit mapping (let data factory match on column name) but have it not fail if a source column has no matching destination. For example, if I am copying from a text file in ADLS to a table in Azure SQL DB and my source file has 200 columns but I only need 20, I don't want to have to bring in all 200 fields. I also don't want to have to map them all. Instead of failing, ADF should…

    31 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    3 comments  ·  Flag idea as inappropriate…  ·  Admin →
  12. Data Factory v2 Alternative DevOps Git Credentials

    By default Data Factory v2 Azure DevOps Git Integration forces your to select an Azure AD of which your current (azure portal) user has access.

    Integration w/ GitHub offers no such limitation, you can input a url, and then a new dialogue appears to auth.. it would be fantastic if alternative git credentials could be provided for an alternative Azure DevOps Repo.

    Our current workaround is to add the user that authenticates with the Azure Portal as a guest in our Azure AD backed DevOps instance - this incurs a license cost, but also ignores the use case whereby Azure…

    31 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    1 comment  ·  Flag idea as inappropriate…  ·  Admin →
  13. Add additional error handling details to the PolyBase option of Azure Data Factory

    in the case of 116080514509754, while transfering the data from On prem PolyBase to SQL Data Warehouse, we got the following error: Database operation failed. Error message from database execution : ErrorCode=FailedDbOperation,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=Error happened when loading data into SQL Data Warehouse.,Source=Microsoft.DataTransfer.ClientLibrary,''Type=System.Data.SqlClient.SqlException,Message=org.apache.hadoop.io.LongWritable cannot be cast to org.apache.hadoop.io.Text,Source=.Net SqlClient Data Provider,SqlErrorNumber=106000,Class=16,ErrorCode=-2146232060,State=1,Errors=[{Class=16,Number=106000,State=1,Message=org.apache.hadoop.io.LongWritable cannot be cast to org.apache.hadoop.io.Text,},],'.
    [Problem start date and time]

    no mention of the Column name involved in the error is seen here.

    30 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    0 comments  ·  Flag idea as inappropriate…  ·  Admin →
  14. Allow Debugging of Individual Activities in ADFv2

    In the ADFv2 GUI, clicking debug runs the entire pipeline. I would like the option to run only one activity, or multiple selected activities in the pipeline via debug. In the meantime, I have been creating a test pipeline and copying activities there for debugging, but this work-around wastes time.

    30 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    2 comments  ·  Flag idea as inappropriate…  ·  Admin →
  15. Retain GIT configuration when deploying Data Factory ARM template

    Currently when we deploy our ARM template to Data Factory V2 from VSTS Release, the GIT configuration is reset, and we have to configure it again following every deploy.

    We worked around the problem by disabling the ARM deployment task in our release.

    Retain GIT configuration when deploying Data Factory ARM template, or add the GIT configuration to ARM.

    Thanks!

    30 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    4 comments  ·  Flag idea as inappropriate…  ·  Admin →
  16. Durable Function Activity

    Please provide a seperate Activity to run Durable Azure Functions.

    You can HTTP Triggered Functions with the known limitation of maximum runtime of 230 seconds. Durable Functions work around this limitation up to multiple hours.
    But they are not supported native, only via manual development. See https://github.com/MicrosoftDocs/azure-docs/issues/30160

    Require multiple activities in ADF to run a Durable Azure Function prevents the "usefullness" of Retry Settings in DataFactory, because its not a single activity you would need to rerun.
    If manually implemented you always have issues with Alert Notifications for example...

    Should not be to difficult to code..

    29 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    1 comment  ·  Flag idea as inappropriate…  ·  Admin →
  17. Please improve to be able to increase timeout more than 30 seconds for Azure Table Storage

    When I copy data from Azure table Storage, sometimes Timeout occurred at Azure Table Storage side.
    But at now, the maximum timeout for Azure Table Storage is 30 seconds.
    So, Please improve to be able to increase timeout more than 30 seconds for Azure Table Storage

    29 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    0 comments  ·  Flag idea as inappropriate…  ·  Admin →
  18. A role to be prohibited to preview data

    Please consider adding a role like below:
    the user is
    - allowed to change pipeline/dataset/linkedservice and other ADF objects
    - allowed to retrieve schema information from datasource like SQL DB
    - prohibited to retrieve data from datasource with [preview data] or some other methods

    An easy workaround to achieve this is to use Managed ID auth or other non-key based auth and delete SQL permission during the operation.
    But it's better to allow such permission in bulk to the operators.

    29 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    0 comments  ·  Flag idea as inappropriate…  ·  Admin →
  19. Lookup Activity - Support REST Dataset

    The lookup activity should support the REST Generic protocol Dataset. This is absolutely essential in being able to consume values from a REST API and pass them as a dynamic value in Azure Data Factory. Currently HTTP is supported but this does not allow AAD service principal authentication.

    28 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    0 comments  ·  Flag idea as inappropriate…  ·  Admin →
  20. ADF CI/CD Incremental deployment - Just the entities not the whole ADF

    I am Working on Azure DevOps and building CI/CD pipelines for Azure Data Approach Provided in "https://docs.microsoft.com/en-us/azure/data-factory/continuous-integration-deployment",
    this will Create whole ADF and its entities in test/prod environments wherever we deploy the ARM templates, but we just want to deploy the changed entities not the whole ADF.
    Factory.

    The Approach i know is "Configure the ADF with GIT ==> Merge to Master ==> Publish to adf_Publish branch ==> setup CI/CD Pipeline to use the Template & Parameter Jsons to respective test/prod environments.

    The ask is "how to deploy just the ADF Pipelines / Data sets / Linked Services…

    27 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    3 comments  ·  Flag idea as inappropriate…  ·  Admin →
  • Don't see your idea?

Data Factory

Categories

Feedback and Knowledge Base