How can we improve Microsoft Azure Data Lake?

JSON support for data analytics in azure portal

data analysis for JSON data hosted in data lake store need to be processed by data analytics jobs directly and easily. Lot's of customers waiting for that. There is NO clear guidance on how to use JSON data using U-SQL query, and we might be loosing lots of business as most of the customers now a days have complex JSON data to be queried.

102 votes
Sign in
Check!
(thinking…)
Reset
or sign in with
  • facebook
  • google
    Password icon
    Signed in as (Sign out)

    We’ll send you updates on this idea

    Dharmesh Rathod shared this idea  ·   ·  Flag idea as inappropriate…  ·  Admin →

    7 comments

    Sign in
    Check!
    (thinking…)
    Reset
    or sign in with
    • facebook
    • google
      Password icon
      Signed in as (Sign out)
      Submitting...
      • Craig R commented  ·   ·  Flag as inappropriate

        One challenge with the sample json extractor is that it does not scale and only handles arrays of json, while the majority of parsers support jsonlines, or sets of json objects.

        The scaling issue is similar to early XML parsers that chose to load the DOM into memory completely vs. streaming content (e.g. SAX) for scalability.
        --> to expose the limitation, produce some array of json objects that is sufficiently large (100G works, but probably smaller works also) and you'll see one vertex allocated and your job will hang forever... no idea what it is doing :)

        ADLA does not handle binary data well at all , and really doesn't have an auto-scale capability for input and output, imho. For example, the read is generally one vertex / AU per file, and many of these files are one file.

        To handle this, btw, the guidance is to use ADF to split the file into "many" small input files, then ADLA will scale. (or, switch to spark and hack it out)

      • Steven Mayer commented  ·   ·  Flag as inappropriate

        The JSON extractor has been updated to work with files > 1GB. For documents that are JSON document per line, you can use a mix of: Extractors.Text with delimiter of \n (believe this is default) and use the JSON function JsonTuple to parse through each line of JSON.

      • Anonymous commented  ·   ·  Flag as inappropriate

        Thanks for the sample toolkit - however I have tested the Json extractor and it does not work with large data files (> 1GB).

      • Michael Rys commented  ·   ·  Flag as inappropriate

        Thanks for your suggestion. At the moment we are shipping a sample JSON (and XML) toolkit that includes extracting from JSON files. You can find it in the sample GitHub directory at http://usql.io. I would like to ask you to use it and tell us about your experience with it. We are planning on using the early feedback on the toolkit to improve it before making it native.

      • Benjamin Guinebertière commented  ·   ·  Flag as inappropriate

        Support JSON files, as well as files containing a JSON document per line.
        Scenario: Web Analytics sends a JSON document per web event. It then must be transformed before going to SQL DW

      Feedback and Knowledge Base