Data Factory

Azure Data Factory allows you to manage the production of trusted information by offering an easy way to create, orchestrate, and monitor data pipelines over the Hadoop ecosystem using structured, semi-structures and unstructured data sources. You can connect to your on-premises SQL Server, Azure database, tables or blobs and create data pipelines that will process the data with Hive and Pig scripting, or custom C# processing. The service offers a holistic monitoring and management experience over these pipelines, including a view of their data production and data lineage down to the source systems. The outcome of Data Factory is the transformation of raw data assets into trusted information that can be shared broadly with BI and analytics tools.

Do you have an idea, suggestion or feedback based on your experience with Azure Data Factory? We’d love to hear your thoughts.

  1. Add multi-factor authentication for SFTP connector

    The existing SFTP connector in Azure data factory is not supporting multi-factor authentication. The connector supports either password based authentication or key based authentication. Enterprises are moving towards multi-factor authentication requiring both key and password for SFTP. This is a must have feature given the information security focus.

    48 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    1 comment  ·  Flag idea as inappropriate…  ·  Admin →
  2. Please allow users to automate “Publish”.

    Now, somebody must get to the dev ADF, open the master branch, and hit “Publish” for the changes to get the “Adf_publish” updated.

    Automating “Publish” on the master branch is necessary for improving efficiency and saving more time.

    164 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    started  ·  3 comments  ·  Flag idea as inappropriate…  ·  Admin →
  3. HITRUST Compliance with Azure Data factory

    In Azure Compliance offering sheet, I see Data factory is not compliance with HITRUST. Is there a roadmap to support it?

    506 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    10 comments  ·  Flag idea as inappropriate…  ·  Admin →
  4. Dark theme for Data Factory Web UI

    Dark theme for Azure Data Factory Web UI, I think it would be a nice addition for those of us who prefer dark themes in general. Also, it would be consistent with the Azure portal.

    256 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    4 comments  ·  Flag idea as inappropriate…  ·  Admin →
  5. Pipeline Dependencies in Azure Data Factory

    When using a hierarchal structure of pipelines (pipelines referring to other pipelines) in ADF it can get messy and confusing quite fast. To get a good picture of how things are put together I would love to have the ability to show visual dependencies between pipelines – just like in Power BI regarding query dependencies.

    I believe one should have the possibility to see dependencies between all the pipelines, but to minimize complexity and increase focus, a “drill-down” functionality should also be available if needed. I.e. one could have a dependency view per folder.

    Total view (All pipeline dependencies) -->…

    39 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    0 comments  ·  Flag idea as inappropriate…  ·  Admin →
  6. Self-hosted integration runtime security; store credentials locally without flowing the credentials through Azure backend

    If the owner of the DF should not know the Self-hosted integration runtime credentials the client should use New-AzDataFactoryV2LinkedServiceEncryptedCredential that needs a connection to Azure's DF, therefor he must have credentials into the azure account that hosts the DF. The other option is to put the credentials in a key vault (managed by one or another) and exchange, again, azure credentials for access.

    There's no way to break this chain, either the client receives too much security info about the server, either the server knows too much about the client security.

    In the situation that the DF pipelines are exposed…

    24 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    0 comments  ·  Flag idea as inappropriate…  ·  Admin →
  7. 125 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    4 comments  ·  Flag idea as inappropriate…  ·  Admin →
  8. Add ability to customize output fields from Execute Pipeline Activity

    This request comes directly from a StackOverflow post, https://stackoverflow.com/questions/57749509/how-to-get-custom-output-from-an-executed-pipeline .
    Currently, the output from the execute pipeline activity is limited to the pipeline's name and runId of the executed pipeline, making it difficult to pass any data or settings from the executed pipeline back to the parent pipeline - for instance, if a variable is set in the child pipeline, there is no in-built way to pass this variable in the Azure Data Factory UI. There exists a couple of workarounds as detailed in the above StackOverflow post, but adding this as an inbuilt feature would greatly enhance the ability…

    635 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    under review  ·  10 comments  ·  Flag idea as inappropriate…  ·  Admin →
  9. Please add configurable, per activity default settings

    The default timeout for Copy Activities in Azure Data Factory is 7 days.
    We are used to set this property individually to a much lower value for every Copy Activity we use.

    More generally speaking it would be great to be able to configure Default Values per Activity Type to adjust those values to local procedures.

    This way when developing new pipelines not every property would have been needed to be set individually every time again and again.

    So please provide Adjustable, Per Activity - Default Property Values.

    21 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    0 comments  ·  Flag idea as inappropriate…  ·  Admin →
  10. GitLab Integration in Azure Data Factory

    Will be useful to have GitLab integration in Azure Data Factory along with GitHub and Azure Repos as it's one of the most popular tools

    71 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    1 comment  ·  Flag idea as inappropriate…  ·  Admin →
  11. Fault Tolerance - Skip Incompatible Columns

    I am loading data dynamically from multiple SQL Server databases. All of the databases share the same table structure. However, there may be some exceptions where one of the databases has a missing column in one of the tables.

    When such a situation occurs, I would like to set the Fault Tolerance on to Skip the incompatible column. Meaning, instead of it failing or skipping all the rows (columns), it should skip the single column instead.

    That way, instead of losing all the columns, I lose one column, which is not used anyways since it never exists.

    98 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    1 comment  ·  Flag idea as inappropriate…  ·  Admin →
  12. Add support for Cluster Mode in Databricks Linked Service

    Databricks has an option for creating clusters with Cluster Mode = Single Node, which means that all activities will run on same node. This should be supported in ADF Databricks Linked Service too.

    30 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    0 comments  ·  Flag idea as inappropriate…  ·  Admin →
  13. Add an information button for Pipelines and Mapping Data Flow

    Development in a team involves many people and hence many pipelines and mapping data flows. It takes a lot of time to get up to speed when you continue developing a pipeline/data flow another team member has created.

    I believe it would save time if pipeline/data flows had an information button on the ribbon (see picture attached) - easy access to how the pipeline/data flow has been developed.

    An additional functionality of the information button that I think would be cool is to gather all the components used in buckets; One bucket for Lookups, filter, derived columns etc. Then you…

    31 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    0 comments  ·  Flag idea as inappropriate…  ·  Admin →
  14. Global parameters, not exposed to pipelines in some cases

    This seems like a bug more than anything. I am trying to pass global parameters through a pipeline into a linked server to dynamically define a data set (parameters will change for each environment). However it appears this is no longer possible? I swear this was working at the start of August...

    According to this it should be possible: https://techcommunity.microsoft.com/t5/azure-data-factory/global-parameters-generally-available-in-azure-data-factory/bc-p/1705056#M306

    However in the comments myself and another person are having the same issue. Please fix!

    26 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    2 comments  ·  Flag idea as inappropriate…  ·  Admin →
  15. Ability to reuse one Databricks Job Cluster for multiple Databricks activities

    Job Clusters are the prefered way to run Databricks Notebooks.

    It would be very helpful if the same Databricks Job Cluster could be used for multiple Databricks activities.

    For example for all Databricks activities in one pipeline.

    13 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    0 comments  ·  Flag idea as inappropriate…  ·  Admin →
  16. ForEach activity - Allow break

    Allow break a ForEach activity like ForEach works in most languages. Currently ForEach will iterate all items to end, even if we don't want it.

    If I have an error in one of the items, I may want to break ForEach, stop iterating and throw that error.

    For now, I have to use a flag variable and IF's to avoid ForEach to continue calling all the activities.

    143 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    1 comment  ·  Flag idea as inappropriate…  ·  Admin →
  17. Support for Presto connector in mapping data flow

    Currently Presto connector is not supported in mapping data flow. Can you please add support for Presto connector in mapping data flow?

    31 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    0 comments  ·  Flag idea as inappropriate…  ·  Admin →
  18. Support for Dynamic Content with pagination rule in Rest source while Copy Data activity

    I'm using pagination rule in copy data activity from rest endpoint to blob storage.

    I have applied pagination rule with dynamic value like this:

    AbsoluteUrl = @replace('$.nextLink','oldbaseurl','newbaseurl')

    But it is still accessing the oldbaseurl which is I'm getting in response, doesn't replace it with new through function.

    Error in data factory:

    ErrorCode=UserErrorFailToReadFromRestResource,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=An error occurred while sending the request.,Source=Microsoft.DataTransfer.ClientLibrary,''Type=System.Net.Http.HttpRequestException,Message=An error occurred while sending the request.,Source=mscorlib,''Type=System.Net.WebException,Message=Unable to connect to the remote server,Source=System,''Type=System.Net.Sockets.SocketException,Message=A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has…

    45 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    1 comment  ·  Flag idea as inappropriate…  ·  Admin →
  19. Concurrency limit should avoid queueing new runs

    I have a pipeline with a basic time-based trigger that runs it hourly. I only want one instance of it running at a time so I set the concurrency property to 1. However, instead of doing nothing, ADF will queue new instances if the pipeline is already running. This is not the behavior that I want. The trigger is set up to ensure the pipeline runs hourly and that is sufficient for my purposes. Queueing new instances just causes a backlog and eventually pipeline failures.

    I know I can accomplish what I want with a tumbling window trigger, but that…

    7 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    3 comments  ·  Flag idea as inappropriate…  ·  Admin →
  20. ADF SharePoint List - Allow Multiple Values

    The current version of this connector treats any SP List columns that allow multiple values as a "complex type". These are not supported in the connector, so the data in those columns are not available.

    7 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    1 comment  ·  Flag idea as inappropriate…  ·  Admin →
← Previous 1 3 4 5 58 59
  • Don't see your idea?

Data Factory

Categories

Feedback and Knowledge Base