How can we improve Microsoft Azure Data Lake?

JSON support for data analytics in azure portal

data analysis for JSON data hosted in data lake store need to be processed by data analytics jobs directly and easily. Lot's of customers waiting for that. There is NO clear guidance on how to use JSON data using U-SQL query, and we might be loosing lots of business as most of the customers now a days have complex JSON data to be queried.

114 votes
Sign in
Sign in with: Microsoft
Signed in as (Sign out)

We’ll send you updates on this idea

Dharmesh Rathod shared this idea  ·   ·  Flag idea as inappropriate…  ·  Admin →


Sign in
Sign in with: Microsoft
Signed in as (Sign out)
  • Craig R commented  ·   ·  Flag as inappropriate

    One challenge with the sample json extractor is that it does not scale and only handles arrays of json, while the majority of parsers support jsonlines, or sets of json objects.

    The scaling issue is similar to early XML parsers that chose to load the DOM into memory completely vs. streaming content (e.g. SAX) for scalability.
    --> to expose the limitation, produce some array of json objects that is sufficiently large (100G works, but probably smaller works also) and you'll see one vertex allocated and your job will hang forever... no idea what it is doing :)

    ADLA does not handle binary data well at all , and really doesn't have an auto-scale capability for input and output, imho. For example, the read is generally one vertex / AU per file, and many of these files are one file.

    To handle this, btw, the guidance is to use ADF to split the file into "many" small input files, then ADLA will scale. (or, switch to spark and hack it out)

  • Mike R commented  ·   ·  Flag as inappropriate

    Hi Dharmesh

    You can find an example JSON extractor at and some sample U-SQL queries using it at

    If you need a built-in JSON extractor, please look at this request:

  • Steven Mayer commented  ·   ·  Flag as inappropriate

    The JSON extractor has been updated to work with files > 1GB. For documents that are JSON document per line, you can use a mix of: Extractors.Text with delimiter of \n (believe this is default) and use the JSON function JsonTuple to parse through each line of JSON.

  • Anonymous commented  ·   ·  Flag as inappropriate

    Thanks for the sample toolkit - however I have tested the Json extractor and it does not work with large data files (> 1GB).

  • Michael Rys commented  ·   ·  Flag as inappropriate

    Thanks for your suggestion. At the moment we are shipping a sample JSON (and XML) toolkit that includes extracting from JSON files. You can find it in the sample GitHub directory at I would like to ask you to use it and tell us about your experience with it. We are planning on using the early feedback on the toolkit to improve it before making it native.

  • Benjamin Guinebertière commented  ·   ·  Flag as inappropriate

    Support JSON files, as well as files containing a JSON document per line.
    Scenario: Web Analytics sends a JSON document per web event. It then must be transformed before going to SQL DW

Feedback and Knowledge Base