Data Factory

Azure Data Factory allows you to manage the production of trusted information by offering an easy way to create, orchestrate, and monitor data pipelines over the Hadoop ecosystem using structured, semi-structures and unstructured data sources. You can connect to your on-premises SQL Server, Azure database, tables or blobs and create data pipelines that will process the data with Hive and Pig scripting, or custom C# processing. The service offers a holistic monitoring and management experience over these pipelines, including a view of their data production and data lineage down to the source systems. The outcome of Data Factory is the transformation of raw data assets into trusted information that can be shared broadly with BI and analytics tools.

Do you have an idea, suggestion or feedback based on your experience with Azure Data Factory? We’d love to hear your thoughts.

  1. Output files to FTP Server

    When output is ready, it should be possible to save the files to an FTP folder i.e. FTP becomes an output set as well.

    32 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    2 comments  ·  Flag idea as inappropriate…  ·  Admin →
  2. Allow for scheduling & running Azure Batch with Docker Containers through Azure Data Factory

    Currently it isn't possible to schedule or trigger Azure Batch with Docker Containers in Azure Data Factory (it's only possible if you use VMs on Azure Batch).
    Azure Data Factory would be a stronger product if it support this as currently one needs to set-up other scheduling to trigger Azure Batch running Dockers (e.g., Apache Airflow)

    Forum link: https://github.com/MicrosoftDocs/azure-docs/issues/16473

    32 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    0 comments  ·  Flag idea as inappropriate…  ·  Admin →
  3. Access/Mapping the File Name during the copy process to a SQL Datatable

    I need a way to store the FileName that is been copied to a SQL Datatable mapped column. Will be great to have access to other file properties like size, rowcount, etc. But the file name will help us to work with undo processes.

    32 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    1 comment  ·  Flag idea as inappropriate…  ·  Admin →
  4. Connector for IoTHub Device Registry

    Having the possibility to sync the device registry (State, ConnectionState, Thumbprints (for backup scenarios), DeviceTwin Properties, etc.) would allow many interesting use cases and integration scenarios. For example, we would like to sync the LastActivityDate of all devices to our datawarehouse once every minute.

    31 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    1 comment  ·  Flag idea as inappropriate…  ·  Admin →
  5. Add support for custom mail alerts

    It would be nice to have the ability to send custom emails from the Monitor & Manage portal.
    When a pipeline fails I want to inform end-users that their data might not be available yet, but I don't want them to end up with an all technical email.

    31 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    1 comment  ·  Flag idea as inappropriate…  ·  Admin →
  6. 31 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    planned  ·  4 comments  ·  Flag idea as inappropriate…  ·  Admin →
  7. Support SQL Database Always Encrypted sources or destinations

    With the recent increase with privacy and security concerns, namely GDPR, the need for using Always Encrypted on SQL Server or Azure SQL Database is also increasing. The problem is that in the moment that we enable this security features in SQL we can't use ADF anymore as the Dara Flow orchestration. Without this feature more secure enterprise scenarios are being left out.

    30 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    1 comment  ·  Flag idea as inappropriate…  ·  Admin →
  8. Azure Data Factory Dynamics 365 connector/dataset sink support Owner field

    Please provide for support on connector to sink owner field.

    30 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    2 comments  ·  Flag idea as inappropriate…  ·  Admin →
  9. Copy SQL to DocumentDB with nested objects/arrays and JSON strings

    So there are times where deeper structured data from a data is useful to place into DocumentDB documents. For example:

    select Id, Col1, Col2,
    (select * from Table2 where Table1.Id=Table2.Table1Id FOR JSON PATH) ArrayOfLinkedData,
    JSON_QUERY(Information,'$') Information -- a string storing JSON data
    from Table1

    shows nested data from a linked table Table2, and some unschema'd JSON stored in a varchar column called Information.

    At present both the array and the json stored in a string are loaded into DocumentDB as escaped strings not JSON entities. The only way we have found to handle this situation is first dropping the data…

    30 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    1 comment  ·  Flag idea as inappropriate…  ·  Admin →
  10. allow to resume pipeline from the point of failure

    I created a master pipeline that execute other child pipelines. If there is an error in one of the child pipelines, and I want to fix the issue and rerun (resume is not available) failed child pipeline the parent pipeline doesn't resume. I have to rerun the the parent from the very beginning. Which forces me to reload all the data from source to stage and then from stage to EDW.
    This is really ridiculous. At least show the activities in the monitor that were scheduled but didn't run due to a child pipeline failure and allow us to manually…

    29 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    0 comments  ·  Flag idea as inappropriate…  ·  Admin →
  11. Retain GIT configuration when deploying Data Factory ARM template

    Currently when we deploy our ARM template to Data Factory V2 from VSTS Release, the GIT configuration is reset, and we have to configure it again following every deploy.

    We worked around the problem by disabling the ARM deployment task in our release.

    Retain GIT configuration when deploying Data Factory ARM template, or add the GIT configuration to ARM.

    Thanks!

    29 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    3 comments  ·  Flag idea as inappropriate…  ·  Admin →
  12. Parametrize Blob Storage Linked Services with Dynamic Contents

    We need to dynamically choose the destination blob storage in Data Factory. By parametrizing the "Secret name" field we could accomplish this.

    This has already been implemented for some linked services.

    28 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    0 comments  ·  Flag idea as inappropriate…  ·  Admin →
  13. Handle Nulls in Copy Data Activity for Numeric datatype

    Currently ADF fails when loading a flat file with null value into a table with numeric datatype as a column.

    It displays the detailed error message as "Empty string can't be convereted to DECIMAL"

    OR

    Message as "Error converting data type VARCHAR to DECIMAL".

    However if the destination data type is modified as string, it loads fine.
    This was observed when loading latitude/longitude information into a azure datawarehouse table where the destination column as decimal.

    So in summary, we should be able to load NULL into a numeric column just as easily as able to load NULL into a string…

    28 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    0 comments  ·  Flag idea as inappropriate…  ·  Admin →
  14. Data Factory v2 Alternative DevOps Git Credentials

    By default Data Factory v2 Azure DevOps Git Integration forces your to select an Azure AD of which your current (azure portal) user has access.

    Integration w/ GitHub offers no such limitation, you can input a url, and then a new dialogue appears to auth.. it would be fantastic if alternative git credentials could be provided for an alternative Azure DevOps Repo.

    Our current workaround is to add the user that authenticates with the Azure Portal as a guest in our Azure AD backed DevOps instance - this incurs a license cost, but also ignores the use case whereby Azure…

    28 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    1 comment  ·  Flag idea as inappropriate…  ·  Admin →
  15. 27 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    0 comments  ·  Flag idea as inappropriate…  ·  Admin →
  16. Durable Function Activity

    Please provide a seperate Activity to run Durable Azure Functions.

    You can HTTP Triggered Functions with the known limitation of maximum runtime of 230 seconds. Durable Functions work around this limitation up to multiple hours.
    But they are not supported native, only via manual development. See https://github.com/MicrosoftDocs/azure-docs/issues/30160

    Require multiple activities in ADF to run a Durable Azure Function prevents the "usefullness" of Retry Settings in DataFactory, because its not a single activity you would need to rerun.
    If manually implemented you always have issues with Alert Notifications for example...

    Should not be to difficult to code..

    26 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    0 comments  ·  Flag idea as inappropriate…  ·  Admin →
  17. Allow MSI authentication for AzureDataLakeStore in Mapping Data Flow

    An ADLS (gen 1) Linked Service is authenticated with a Managed Identity (MSI) or a Service Principal. When authenticating with MSI, we can't use Mapping Data Flows. Will this functionality be added?

    26 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    0 comments  ·  Flag idea as inappropriate…  ·  Admin →
  18. Schema import capability from source to destination

    Schema import capability from a source (SQL, relational or structure CSV) to a destination on the first run, especially when we are moving structure data over.
    We could specify a schema name and it will generate the source schema at destination and write into it.
    This could be just a potentially a checkbox option at destination in the wizard, and it will save a lot of effort while doing schema on write type of jobs in DF.

    26 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    0 comments  ·  Flag idea as inappropriate…  ·  Admin →
  19. HDInsight with Azure Data Lake

    Today you can't use an on demand or bring your own cluster of HDInsight with Data Factory as the cluster requires a blob storage linked service. We need the ability to use HDInsight clusters backed by Azure Data Lake in a Data Factory pipeline.

    25 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    1 comment  ·  Flag idea as inappropriate…  ·  Admin →
  20. Encrypted Zip file support

    It would be very helpful to have AES-256 encrypted zip file support to simplify pipelines, rather than needing azure batch or functions.

    24 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    2 comments  ·  Flag idea as inappropriate…  ·  Admin →
  • Don't see your idea?

Data Factory

Categories

Feedback and Knowledge Base