Data Factory

Azure Data Factory allows you to manage the production of trusted information by offering an easy way to create, orchestrate, and monitor data pipelines over the Hadoop ecosystem using structured, semi-structures and unstructured data sources. You can connect to your on-premises SQL Server, Azure database, tables or blobs and create data pipelines that will process the data with Hive and Pig scripting, or custom C# processing. The service offers a holistic monitoring and management experience over these pipelines, including a view of their data production and data lineage down to the source systems. The outcome of Data Factory is the transformation of raw data assets into trusted information that can be shared broadly with BI and analytics tools.

Do you have an idea, suggestion or feedback based on your experience with Azure Data Factory? We’d love to hear your thoughts.

  1. Data factory should be able to use VNet without resorting to self hosted

    Self hosted makes a lot of sense when integrating on-premise data, however it's a shame to need to maintain a self-hosted integration runtime VM when wishing to leverage the extra security of a VNet i.e. firewalled storage accounts etc.

    Ideally the azure managed integration runtimes would be able to join a vnet on demand.

    575 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    planned  ·  12 comments  ·  Flag idea as inappropriate…  ·  Admin →
  2. refreshing Azure Analysis Cube

    Azure Data Factory pipeline activity to refresh Azure analysis services cube partitions.

    314 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    8 comments  ·  Flag idea as inappropriate…  ·  Admin →
  3. Change Data Capture feature for RDBMS (Oracle, SQL Server, SAP HANA, etc)

    Is it possible to have CDC features on ADF please ?

    191 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    planned  ·  11 comments  ·  Flag idea as inappropriate…  ·  Admin →
  4. Support more complex types in Avro format, like Dictionaries and Arrays

    When trying to integrate a more complex scenario using Event Hub archive feature, I wasn't able to process this messages because Data Factory copy activity didn't support Dictionaries. When trying to use Stream Analytics writing to Avro format it didn't work because of the Arrays. More complex end-to-end scenarios should be supported.

    93 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    planned  ·  6 comments  ·  Flag idea as inappropriate…  ·  Admin →
  5. Add ability to EASILY load SharePoint online lists

    I found on the Internet that it is somehow possible to load SharePoint online lists using Azure Data Factory, using OData. I did not succeed to do it thought as there is none comprehensive description how to achieve that.

    Please add SharePoint online as a regular data source in the Azure Data Factory.

    89 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    planned  ·  3 comments  ·  Flag idea as inappropriate…  ·  Admin →
  6. Allow linking one factory to another

    I have been using the Walkthrough sample and successfully completed the exercise. This seems fairly straightforward and the entire experience of building a network of dependency between pipelines is great. This is very similar to SSIS but allows me to perform data integration @ scale with hybrid capabilities. My scenario is that we have few different teams within our organization and we need to have separate billing for each of these teams. I believe separating the subscription is the only option currently in Azure for separate billing. But we would like to allow one department to use the data of…

    89 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    planned  ·  5 comments  ·  Flag idea as inappropriate…  ·  Admin →
  7. Elasticsearch

    source and sink.

    83 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    3 comments  ·  Flag idea as inappropriate…  ·  Admin →
  8. ADF connection to Azure Delta Lake

    Are there any plans to provide connection between ADF v2/Managing Data Flow and Azure Delta Lake? It would be great new source and sync for ADF pipeline and Managing Data Flows to provide full ETL/ELT CDC capabilities to simplify complex lambda data warehouse architecture requirements.

    76 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    planned  ·  6 comments  ·  Flag idea as inappropriate…  ·  Admin →
  9. Allow Data Factory Managed identity to run Databricks notebooks

    Integrate Azure Data Factory Managed Identity in Databricks service.. like you did for Keyvault, storage, etc.

    43 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    planned  ·  3 comments  ·  Flag idea as inappropriate…  ·  Admin →
  10. 38 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    planned  ·  6 comments  ·  Flag idea as inappropriate…  ·  Admin →
  11. Allow ORC to be used as an source/sink format in DataFlow

    We currently cannot use ORC as a source/sink type in DataFlow jobs. This requires an extra copy in to Parquet format, which can cause issues due to not having as many data types as ORC does. Allowing ORC would remove the need to perform this extra copy operation that could potentially cause data type issues.

    35 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    planned  ·  0 comments  ·  Flag idea as inappropriate…  ·  Admin →
  12. DocumentDB examples - Transform examples of shredding JSON documents to extract arrays as tables for inclusion in a SQL data warehouse.

    JSON documents can contain objects and arrays, and can have a lot more nested levels than can easily be extracted using DocumentDB query. Having examples of how to leverage ADF to extract subsets of data from a collection of documents for inclusion in a SQL database, or as flat files would be very helpful. Specific examples would include exporting the root key/id along with hierarchy key columns and flattened detail arrays.

    23 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    planned  ·  0 comments  ·  Flag idea as inappropriate…  ·  Admin →
  13. A better debugging UI that will give more details about the job Id. Where it failed? What file it failed on?

    I'm new to Hive and I'm trying to figure out what I did wrong. But the only detail I get is below. It doesn't help much.

    Failed to submit Hive job: 1d625b0c-8e69-44ce-a0dc-21ba6b53db27. Error: An error occurred while sending the request..

    8 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    1 comment  ·  Flag idea as inappropriate…  ·  Admin →

    Thanks for the feedback. This work is In-Progress and you will be able to better debug you jobs. We will keep you folks updated when this feature will be available in Production

  14. Source: Azure Blob; Target: IaaS SQL Server VM / Azure DB - Enable detailed mapping of SQL server types.

    We have a SQL Server (2014) table with data types like nvarchar(5), datetime, etc. At the moment loading data into an table with these data types fails. nvarchar(5) results into truncation errors. Datetime results into conversion errors.

    6 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    0 comments  ·  Flag idea as inappropriate…  ·  Admin →
  15. DataFlow Split Column should be supported

    Whuen trying to Split a column over a delimiter we are getting the error that at this moment this is not yet supported.
    Would be a great feature as frequently used in PowerQuery

    3 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    planned  ·  0 comments  ·  Flag idea as inappropriate…  ·  Admin →
  16. 3 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    planned  ·  0 comments  ·  Flag idea as inappropriate…  ·  Admin →
  17. Implement native error handling on ADF Data Flows

    Currently, ADF Data Flows don't handle errors natively. For example, if source columns in a file is a String and sink column in an Azure SQL Database is a Int, an error doesn't occur and a NULL value is loaded instead which is misleading.

    The only way to deal with that is to implement our own logic to detect inconsistent values which adds to the complexity of the data flow and development time.

    The Copy Activity in ADF itself has this fault tolerance feature where we can skip and log incompatible rows. It would be good to see ADF Data…

    1 vote
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    planned  ·  0 comments  ·  Flag idea as inappropriate…  ·  Admin →
  • Don't see your idea?

Data Factory

Categories

Feedback and Knowledge Base