Data Lake

You can use this set to communicate with the Azure Data Lake team. We are eager to hear your ideas, suggestions, or any other feedback that would help us improve the service to bet fit your needs.

If you have technical questions, please visit our forums.
If you are looking for tutorials and documentation, please visit http://aka.ms/AzureDataLake.

How can we improve Microsoft Azure Data Lake?

(thinking…)

Enter your idea and we'll search to see if someone has already suggested it.

If a similar idea already exists, you can support and comment on it.

If it doesn't exist, you can post your idea so others can support it.

Enter your idea and we'll search to see if someone has already suggested it.

  1. Support 'dynamic' output file names in ADLA

    When reading files, it's easy to read from files, using the wildcard characters and dynamic parts of the URI. (see the Month variable)
    @mydata =
    EXTRACT Customer string,
    Active bool,
    Month string
    FROM @"/MySolution/Energy/InputFiles/{Month:*}.csv"
    USING new USQLExtractors.CustomExtractor();

    I als want to have this behavior at the output side. I want to split out by customer (and don't know these customer upfront)
    OUTPUT @mydata
    TO @"/MySolution/Energy/OutputFiles/{Customer:*}.csv"
    USING Outputters.Text();


    Please make this possible.

    217 votes
    Sign in
    (thinking…)
    Sign in with: oidc
    Signed in as (Sign out)

    We’ll send you updates on this idea

    29 comments  ·  Flag idea as inappropriate…  ·  Admin →
  2. Support Parquet in Azure Data Lake

    Parquet is (becoming) the standard format for storing columnar data in the Big Data community. Almost all open-source projects, like Spark, Hive, Drill, ... support parquet as a first class citizen.
    It's the succesor to ORC files.

    141 votes
    Sign in
    (thinking…)
    Sign in with: oidc
    Signed in as (Sign out)

    We’ll send you updates on this idea

    18 comments  ·  Flag idea as inappropriate…  ·  Admin →
  3. Read from and write to Optimized Row Columnar (ORC) format

    Please add the ability to read from and write to the ORC file format. This format is very popular due to the high compression and predicate push-down features.

    120 votes
    Sign in
    (thinking…)
    Sign in with: oidc
    Signed in as (Sign out)

    We’ll send you updates on this idea

    23 comments  ·  Flag idea as inappropriate…  ·  Admin →
  4. Generate Heading Rows using the built in ADLA Outputters

    In the built it outputters allow generating a heading row so that it can be opened in tools like Excel easily.

    60 votes
    Sign in
    (thinking…)
    Sign in with: oidc
    Signed in as (Sign out)

    We’ll send you updates on this idea

    8 comments  ·  Flag idea as inappropriate…  ·  Admin →
  5. Support GZip on OUTPUT as well

    Today EXTRACT provides the ability to auto-decompress .gz files. The same should be supported on OUTPUT: If the target file name is ending in .gz, the resulting file should be compressed.

    58 votes
    Sign in
    (thinking…)
    Sign in with: oidc
    Signed in as (Sign out)

    We’ll send you updates on this idea

    7 comments  ·  Flag idea as inappropriate…  ·  Admin →
  6. Add ANSI code page support for built-in Extractors/Outputters and custom extractors/outputters.

    Today many files that get created on Windows are created with the OS' ANSI code page (Windows-125x or Shift-JIS5). Currently the U-SQL Extractors and Outputters only support ASCII, UTF-8 and UTF-16 encodings. Please add the common ANSI code pages and Shift-JIS code pages to the extractors/outputters.

    54 votes
    Sign in
    (thinking…)
    Sign in with: oidc
    Signed in as (Sign out)

    We’ll send you updates on this idea

    7 comments  ·  Flag idea as inappropriate…  ·  Admin →

    Please not Michael’s comments below. This is complete.

    The way to use it is one of the following way. We support the Windows-125x and ISO code pages. Shift-JIS is currently not supported since that encoding is using overlapping code points and can lead to wrong data if a parallelization split happens inside an encoding.
    // Using code page number with a variable.
    DECLARE windows1252 = System.Text.Encoding.GetEncoding(1252); … USING Extractors.Tsv(encoding:windows1252);
    // Using direct code page name. Inlined into invocation.
    USING Extractors.Tsv(encoding:System.Text.Encoding.GetEncoding(“windows-1250″));

  7. Skip header in Extractor

    When you read a flat file a header is usually present. It would be nice if Extractors would have an option to skip the header automatically.

    35 votes
    Sign in
    (thinking…)
    Sign in with: oidc
    Signed in as (Sign out)

    We’ll send you updates on this idea

    2 comments  ·  Flag idea as inappropriate…  ·  Admin →
  8. Support running jobs against many small blobs (10,000+)

    One of our scenarios has data partitioned against multiple dimensions in Azure Blobs, the result is many small blobs which we would like to do analytics against. Currently it seems to time out during compilation after 10 minutes when accessing too many Blobs. We would like ADL to support this scenario efficiently without compilation or execution problems.

    29 votes
    Sign in
    (thinking…)
    Sign in with: oidc
    Signed in as (Sign out)

    We’ll send you updates on this idea

    7 comments  ·  Flag idea as inappropriate…  ·  Admin →
  9. Provide NUGET packages for customization related assemblies

    Use the industry-standard mechanism (NuGet.org) for publishing assemblies instead of forcing me to manually locate the assemblies in "C:\Program Files (x86)\Microsoft Visual Studio 14.0\Common7\IDE\PublicAssemblies\" and then copy & register them in my project. This would also simplify updating to newer versions of the assemblies.

    23 votes
    Sign in
    (thinking…)
    Sign in with: oidc
    Signed in as (Sign out)

    We’ll send you updates on this idea

    2 comments  ·  Flag idea as inappropriate…  ·  Admin →
  10. Support functionality to handle file properties from U-SQL.

    I would like to hold image files (like JPEG) in ADLS with other semi-structured data. It is good to have the way to read file properties like date/time from U-SQL.

    22 votes
    Sign in
    (thinking…)
    Sign in with: oidc
    Signed in as (Sign out)

    We’ll send you updates on this idea

    5 comments  ·  Flag idea as inappropriate…  ·  Admin →
  11. Allow me to define a schema that I can reuse in multiple EXTRACTs

    I have a TVF with many EXTRACTS. I have to repeat the schema multiple times - one for each extract. I'd rather have a way of defining the schema in one place and then reusing that in the EXTRACTS.

    I see that there is a CREATE TYPE DDL that allows me to name a table type. Can I use this in an EXTRACT?

    NOTE: I can't merge these into a single EXTRACT with stream sets. The filenames are not organized in a way that makes that possible. So i must use multiple EXTRACTS

    22 votes
    Sign in
    (thinking…)
    Sign in with: oidc
    Signed in as (Sign out)

    We’ll send you updates on this idea

    2 comments  ·  Flag idea as inappropriate…  ·  Admin →
  12. Add PIVOT/UNPIVOT to U-SQL

    Please provide PIVOT and UNPIVOT in U-SQL.

    Voters, please indicate if you are ok with static PIVOT a la https://msdn.microsoft.com/en-us/library/ms177410(SQL.105).aspx.

    18 votes
    Sign in
    (thinking…)
    Sign in with: oidc
    Signed in as (Sign out)

    We’ll send you updates on this idea

    3 comments  ·  Flag idea as inappropriate…  ·  Admin →
  13. add support for CSV files with quotes

    This input:
    1,2,"three, more"

    will result in 4 columns, please parse single and double quotes correctly

    12 votes
    Sign in
    (thinking…)
    Sign in with: oidc
    Signed in as (Sign out)

    We’ll send you updates on this idea

    5 comments  ·  Flag idea as inappropriate…  ·  Admin →
    completed  ·  matt winkler responded

    We’ve updated the management portal to include these capabilities.

  14. Obtain the user name of a job submitter inside a U-SQL job

    Something like System.Security.Principal.WindowsIdentity.GetCurrent().Name in Windows or USER_NAME() in T-SQL.

    The main objective is to get ahold of the user name and use it as a filter in a WHERE clause.

    8 votes
    Sign in
    (thinking…)
    Sign in with: oidc
    Signed in as (Sign out)

    We’ll send you updates on this idea

    1 comment  ·  Flag idea as inappropriate…  ·  Admin →
  15. Compilation directive (or environment variables)

    Having #if directive to condition code based on execution environment would be nice.
    #if Emulated
    @source = "local source file"
    #else
    @source = "adl source"
    #endif

    Another way would be to declare U-SQL variables (@source in previous example) and set values based on execution environment (similar to configuration settings in Azure Tools).

    8 votes
    Sign in
    (thinking…)
    Sign in with: oidc
    Signed in as (Sign out)

    We’ll send you updates on this idea

    1 comment  ·  Flag idea as inappropriate…  ·  Admin →
  16. 7 votes
    Sign in
    (thinking…)
    Sign in with: oidc
    Signed in as (Sign out)

    We’ll send you updates on this idea

    0 comments  ·  Flag idea as inappropriate…  ·  Admin →
  17. 7 votes
    Sign in
    (thinking…)
    Sign in with: oidc
    Signed in as (Sign out)

    We’ll send you updates on this idea

    0 comments  ·  Flag idea as inappropriate…  ·  Admin →
    completed  ·  matt winkler responded

    This has been recently shipped into the Azure portal for monitoring jobs

  18. Infer the return schema on CREATE FUNCTION

    In my TVF I have to specify the schema in the result function syntax, but the query for the result already knows the schema.

    It would be awesomely convenient to have the return type be optional so that the schema could be derived from the rowset being returned.

    6 votes
    Sign in
    (thinking…)
    Sign in with: oidc
    Signed in as (Sign out)

    We’ll send you updates on this idea

    2 comments  ·  Flag idea as inappropriate…  ·  Admin →
  19. Expose ADLA job metadata through U-SQL variables

    Allow ADLA job metadata to be accessible directly from U-SQL scripts and procedures through variables such as @@jobid and @@jobsubmitter.

    Having this information available from within U-SQL scripts and procedures will help to support scenarios where processing activity needs to be logged to an ADLA table.

    This feature request is closely related to: https://feedback.azure.com/forums/327234-data-lake/suggestions/13701351-need-a-provision-to-capture-the-job-id-of-a-adla-j

    5 votes
    Sign in
    (thinking…)
    Sign in with: oidc
    Signed in as (Sign out)

    We’ll send you updates on this idea

    1 comment  ·  Flag idea as inappropriate…  ·  Admin →
  20. alter table add column support

    Our entities allow adding columns and we would like to pass in down to the managed table.

    5 votes
    Sign in
    (thinking…)
    Sign in with: oidc
    Signed in as (Sign out)

    We’ll send you updates on this idea

    2 comments  ·  Flag idea as inappropriate…  ·  Admin →
← Previous 1 3
  • Don't see your idea?

Feedback and Knowledge Base