Data Factory

Azure Data Factory allows you to manage the production of trusted information by offering an easy way to create, orchestrate, and monitor data pipelines over the Hadoop ecosystem using structured, semi-structures and unstructured data sources. You can connect to your on-premises SQL Server, Azure database, tables or blobs and create data pipelines that will process the data with Hive and Pig scripting, or custom C# processing. The service offers a holistic monitoring and management experience over these pipelines, including a view of their data production and data lineage down to the source systems. The outcome of Data Factory is the transformation of raw data assets into trusted information that can be shared broadly with BI and analytics tools.

Do you have an idea, suggestion or feedback based on your experience with Azure Data Factory? We’d love to hear your thoughts.

  1. Allow for scheduling & running Azure Batch with Docker Containers through Azure Data Factory

    Currently it isn't possible to schedule or trigger Azure Batch with Docker Containers in Azure Data Factory (it's only possible if you use VMs on Azure Batch).
    Azure Data Factory would be a stronger product if it support this as currently one needs to set-up other scheduling to trigger Azure Batch running Dockers (e.g., Apache Airflow)

    Forum link: https://github.com/MicrosoftDocs/azure-docs/issues/16473

    35 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    0 comments  ·  Flag idea as inappropriate…  ·  Admin →
  2. 34 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    planned  ·  5 comments  ·  Flag idea as inappropriate…  ·  Admin →
  3. Web Activity and Rest Connector OAuth support

    The usefulness of the Web Activity and the REST Connector are hamstrung without OAuth support for authentication. Many 3rd party services require this to consume.

    33 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    0 comments  ·  Flag idea as inappropriate…  ·  Admin →
  4. Copy SQL to DocumentDB with nested objects/arrays and JSON strings

    So there are times where deeper structured data from a data is useful to place into DocumentDB documents. For example:

    select Id, Col1, Col2,
    (select * from Table2 where Table1.Id=Table2.Table1Id FOR JSON PATH) ArrayOfLinkedData,
    JSON_QUERY(Information,'$') Information -- a string storing JSON data
    from Table1

    shows nested data from a linked table Table2, and some unschema'd JSON stored in a varchar column called Information.

    At present both the array and the json stored in a string are loaded into DocumentDB as escaped strings not JSON entities. The only way we have found to handle this situation is first dropping the data…

    33 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    1 comment  ·  Flag idea as inappropriate…  ·  Admin →
  5. Add a Transform component with embedded Python/U-SQL support for simple conversions

    ADF should allow "in pipeline" components that convert data from one format to another. This component would be similar to "Refactor" components in other dataflow tools.

    Similar to spark's "selectExpr" component, ADF should allow data to flow through a Transform component where Python/U-SQL code is supplied by the developer to convert input names/formats/structure.

    This component should allow any number of output columns (up to a reasonable maximum) and any format supported by the implementation.

    This component should provide a lightweight compilation / syntax-validation and very basic simulation functions to enable the developer to see the component operating on specified or…

    33 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    0 comments  ·  Flag idea as inappropriate…  ·  Admin →
  6. Handle Cosmos DB 429 Errors Within Cosmos DB Connector

    In our use case we are bulk loading data to Cosmos DB and have a requirement to scale each collection up at the beginning of a load and down at the end.

    The scaling is performed by an Azure Function and we have seen issues where Cosmos DB returns a 429 error when performing metadata requests against Cosmos DB within the copy activity that comes after the Azure Function. This occurs frequently when running multiple pipelines in parallel. When a 429 error is received on a metadata request the error bubbles up and causes the pipeline to fail completely.

    Ideally…

    32 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    0 comments  ·  Flag idea as inappropriate…  ·  Admin →
  7. allow to resume pipeline from the point of failure

    I created a master pipeline that execute other child pipelines. If there is an error in one of the child pipelines, and I want to fix the issue and rerun (resume is not available) failed child pipeline the parent pipeline doesn't resume. I have to rerun the the parent from the very beginning. Which forces me to reload all the data from source to stage and then from stage to EDW.
    This is really ridiculous. At least show the activities in the monitor that were scheduled but didn't run due to a child pipeline failure and allow us to manually…

    32 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    0 comments  ·  Flag idea as inappropriate…  ·  Admin →
  8. Output files to FTP Server

    When output is ready, it should be possible to save the files to an FTP folder i.e. FTP becomes an output set as well.

    32 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    2 comments  ·  Flag idea as inappropriate…  ·  Admin →
  9. Access/Mapping the File Name during the copy process to a SQL Datatable

    I need a way to store the FileName that is been copied to a SQL Datatable mapped column. Will be great to have access to other file properties like size, rowcount, etc. But the file name will help us to work with undo processes.

    32 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    1 comment  ·  Flag idea as inappropriate…  ·  Admin →
  10. Connector for IoTHub Device Registry

    Having the possibility to sync the device registry (State, ConnectionState, Thumbprints (for backup scenarios), DeviceTwin Properties, etc.) would allow many interesting use cases and integration scenarios. For example, we would like to sync the LastActivityDate of all devices to our datawarehouse once every minute.

    31 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    1 comment  ·  Flag idea as inappropriate…  ·  Admin →
  11. Add support for custom mail alerts

    It would be nice to have the ability to send custom emails from the Monitor & Manage portal.
    When a pipeline fails I want to inform end-users that their data might not be available yet, but I don't want them to end up with an all technical email.

    31 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    1 comment  ·  Flag idea as inappropriate…  ·  Admin →
  12. Azure Data Factory Dynamics 365 connector/dataset sink support Owner field

    Please provide for support on connector to sink owner field.

    30 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    2 comments  ·  Flag idea as inappropriate…  ·  Admin →
  13. 29 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    0 comments  ·  Flag idea as inappropriate…  ·  Admin →
  14. Retain GIT configuration when deploying Data Factory ARM template

    Currently when we deploy our ARM template to Data Factory V2 from VSTS Release, the GIT configuration is reset, and we have to configure it again following every deploy.

    We worked around the problem by disabling the ARM deployment task in our release.

    Retain GIT configuration when deploying Data Factory ARM template, or add the GIT configuration to ARM.

    Thanks!

    29 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    4 comments  ·  Flag idea as inappropriate…  ·  Admin →
  15. Parametrize Blob Storage Linked Services with Dynamic Contents

    We need to dynamically choose the destination blob storage in Data Factory. By parametrizing the "Secret name" field we could accomplish this.

    This has already been implemented for some linked services.

    28 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    0 comments  ·  Flag idea as inappropriate…  ·  Admin →
  16. Handle Nulls in Copy Data Activity for Numeric datatype

    Currently ADF fails when loading a flat file with null value into a table with numeric datatype as a column.

    It displays the detailed error message as "Empty string can't be convereted to DECIMAL"

    OR

    Message as "Error converting data type VARCHAR to DECIMAL".

    However if the destination data type is modified as string, it loads fine.
    This was observed when loading latitude/longitude information into a azure datawarehouse table where the destination column as decimal.

    So in summary, we should be able to load NULL into a numeric column just as easily as able to load NULL into a string…

    28 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    0 comments  ·  Flag idea as inappropriate…  ·  Admin →
  17. Google Sheets connector

    Hello,

    It would be great and very useful in my opinion if there was a Google Sheets connector.

    Thanks in advance.

    28 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    2 comments  ·  Flag idea as inappropriate…  ·  Admin →
  18. Data Factory v2 Alternative DevOps Git Credentials

    By default Data Factory v2 Azure DevOps Git Integration forces your to select an Azure AD of which your current (azure portal) user has access.

    Integration w/ GitHub offers no such limitation, you can input a url, and then a new dialogue appears to auth.. it would be fantastic if alternative git credentials could be provided for an alternative Azure DevOps Repo.

    Our current workaround is to add the user that authenticates with the Azure Portal as a guest in our Azure AD backed DevOps instance - this incurs a license cost, but also ignores the use case whereby Azure…

    28 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    1 comment  ·  Flag idea as inappropriate…  ·  Admin →
  19. Allow parameterizing Azure key vault secret names

    I would like to be able set secret name as parameter. Now it does allow me to "add dynamic content" but when I do try to add actual parameter to keyvault secret name it does not provide me ability to do so. There is a bug or this feature is limited? At least this happens when trying to parameterize ADF SSIS IR package parameters

    28 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    1 comment  ·  Flag idea as inappropriate…  ·  Admin →
  20. Durable Function Activity

    Please provide a seperate Activity to run Durable Azure Functions.

    You can HTTP Triggered Functions with the known limitation of maximum runtime of 230 seconds. Durable Functions work around this limitation up to multiple hours.
    But they are not supported native, only via manual development. See https://github.com/MicrosoftDocs/azure-docs/issues/30160

    Require multiple activities in ADF to run a Durable Azure Function prevents the "usefullness" of Retry Settings in DataFactory, because its not a single activity you would need to rerun.
    If manually implemented you always have issues with Alert Notifications for example...

    Should not be to difficult to code..

    26 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    0 comments  ·  Flag idea as inappropriate…  ·  Admin →
  • Don't see your idea?

Data Factory

Categories

Feedback and Knowledge Base