Data Factory

Azure Data Factory allows you to manage the production of trusted information by offering an easy way to create, orchestrate, and monitor data pipelines over the Hadoop ecosystem using structured, semi-structures and unstructured data sources. You can connect to your on-premises SQL Server, Azure database, tables or blobs and create data pipelines that will process the data with Hive and Pig scripting, or custom C# processing. The service offers a holistic monitoring and management experience over these pipelines, including a view of their data production and data lineage down to the source systems. The outcome of Data Factory is the transformation of raw data assets into trusted information that can be shared broadly with BI and analytics tools.

Do you have an idea, suggestion or feedback based on your experience with Azure Data Factory? We’d love to hear your thoughts.

  1. Static IP ranges for Data Factory and add ADF to list of Trusted Azure Services

    It is not currently possible to identify the IP Address of the DF, which you need for firewall rules, including Azure SQL Server firewall....

    1,926 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    88 comments  ·  Flag idea as inappropriate…  ·  Admin →

    Great news – static IP range for Azure Integration Runtime is now available in all ADF regions! You can whitelist specific IP ranges for ADF as part of firewall rules. The IPs are documented here: https://docs.microsoft.com/en-us/azure/data-factory/azure-integration-runtime-ip-addresses#azure-integration-runtime-ip-addresses-specific-regions. Static IP ranges for gov cloud and China cloud will be published soon!

    Please refer to this blog post on how you can use various mechanisms including trusted Azure service and static IP to secure data access through ADF:
    https://techcommunity.microsoft.com/t5/azure-data-factory/azure-data-factory-now-supports-static-ip-address-ranges/ba-p/1117508

    Service tag support will be made available in next few weeks. Please stay tuned!

    If your network security requirement calls for ADF support for VNet and cannot be met using Trusted Azure service (released in Oct 2019), static IP range (released in Jan 2020), or service tag (upcoming), please vote for VNet feature here: https://feedback.azure.com/forums/270578-data-factory/suggestions/37105363-data-factory-should-be-able-to-use-vnet-without-re

  2. Add support for Power Query / Power BI Data Catalog as Data Store/ Linked Service

    Power Query is awesome! It would be a great feature to be able to output its result into either a SQL database or Azure (Storage or SQL).

    461 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    11 comments  ·  Flag idea as inappropriate…  ·  Admin →

    Please check the new capability we recently unveiled called Wrangling Data Flows, available in preview! Wrangling Data Flow allows you to discover and explore your data using the familiar Power Query Online mashup editor to do data preparation, and then execute at scale using Spark runtime.

    Sign up for preview access at: https://forms.office.com/Pages/ResponsePage.aspx?id=v4j5cvGGr0GRqy180BHbR9-OHbkcd7NIvtztVhbGIU9UNk5QM0dSWkFDSkFPUlowTFJMRVZUUUZGRi4u and check out more details at https://aka.ms/wranglingdfdocs

  3. Snowflake connector as both source and sink

    Provide the capability to copy data from Blob to Snowflake data warehouse

    407 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    15 comments  ·  Flag idea as inappropriate…  ·  Admin →
  4. Web and ODATA connectors need to support OAuth

    the web and odata connectors need to add support for OAuth ASAP. Most other Microsoft services (Office 365, PWA, CRM, etc, etc, etc) along with many other industry API's require the use of OAuth. Not having this closes the door to lots of integration scenarios.

    326 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    47 comments  ·  Flag idea as inappropriate…  ·  Admin →
  5. Provide better billing statistics

    Provide better cost analysis possibilities (either in azure portal or in adf). Right now it is impossible to see costs by pipelines or activities - you can only see overall cost of whole data factory instance which is quite not useful.
    Please add billing per pipeline (logic apps is a good example where you can track costs per each logic app)

    38 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    2 comments  ·  Flag idea as inappropriate…  ·  Admin →
  6. Allow string_agg in data flow aggregations

    Currently, it's only possible to do numerical aggregations (count, sum, etc) in data flow aggregations. Implementing something that works like SQL string_agg would be very helpful.

    16 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    started  ·  1 comment  ·  Flag idea as inappropriate…  ·  Admin →
  7. IoT Sample pipeline

    It would be nice to have a sample where we can use Data Factory in an IoT scenario to get started more quickly.

    I would really appreciate this!

    8 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    started  ·  2 comments  ·  Flag idea as inappropriate…  ·  Admin →
  8. Unstructured Data

    More file formats should be allowed, could not see copy to azure blob support PDF,Word,Images formats and more others.

    It would be really great if we could have some process in place to read PDF, Word, Images (unstructured data).

    7 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    0 comments  ·  Flag idea as inappropriate…  ·  Admin →

    You can currently copy any file format via the copy activity. Simply do not provide the structure element in the dataset. But we do want to surface this in a first class manner.

  9. Multiple line queries with syntax highlighting in portal editor

    Currently, a pipeline query in the portal editor can only be one line, with no syntax highlighting.

    This makes it hard to read and edit, easy to introduce errors (particularly when escaping characters), and hard to spot them.

    Please add syntax highlighting, and allow the query to span multiple lines (even when in a Text.Format macro).

    6 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    1 comment  ·  Flag idea as inappropriate…  ·  Admin →

    Thanks for your feedback. We are working on an authoring experience that will allow you to use the syntax highlighting. For the query spanning multiple lines, you can store your query in your storage account and refer the path in ‘scriptpath’ parameter. This will allow your query to span multiple lines while using ‘Text.Format’.

  10. Azure Data Factory Visual Studio 2015 Deployment Rights

    At present you need co-admin rights to deploy. Businesses cannot give out these rights. As a subscription owner I should be able to deploy from VS as these rights give me access in the portal to create and delete!

    5 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    2 comments  ·  Flag idea as inappropriate…  ·  Admin →
  11. Add Collect_list feature in data flow

    While doing aggregation of column using aggregation transformation it is not allowed to aggregate all the columns having text inside it.
    Similar functionality is available in pyspark.

    For example below is specific pyspark code which can not be transformed using Azure data flow.

    df.groupby(['customerid', 'month', 'year']).agg(F.concatws(", ", F.collectlist(df.text)).alias('aggdescr'),F.min('minbalance').alias('balance'),

    3 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    started  ·  0 comments  ·  Flag idea as inappropriate…  ·  Admin →
  12. Include cost estimates and functionality for controlling cost

    It's very hard to get a clear understanding of the cost of executing a pipeline/a dataflow. It would be very helpful with cost estimates for this based on the current configuration (runtime, etc.) Furthermore, a projection of cost akin to that in the Azure platform would be very helpful. Also, it would be very helpful on a given runtime to see how many clusters are alive on it and be able to shut them down individually.

    2 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    started  ·  0 comments  ·  Flag idea as inappropriate…  ·  Admin →
  • Don't see your idea?

Data Factory

Categories

Feedback and Knowledge Base