JSON support for data analytics in azure portal
data analysis for JSON data hosted in data lake store need to be processed by data analytics jobs directly and easily. Lot's of customers waiting for that. There is NO clear guidance on how to use JSON data using U-SQL query, and we might be loosing lots of business as most of the customers now a days have complex JSON data to be queried.
Craig R commented
One challenge with the sample json extractor is that it does not scale and only handles arrays of json, while the majority of parsers support jsonlines, or sets of json objects.
The scaling issue is similar to early XML parsers that chose to load the DOM into memory completely vs. streaming content (e.g. SAX) for scalability.
--> to expose the limitation, produce some array of json objects that is sufficiently large (100G works, but probably smaller works also) and you'll see one vertex allocated and your job will hang forever... no idea what it is doing :)
ADLA does not handle binary data well at all , and really doesn't have an auto-scale capability for input and output, imho. For example, the read is generally one vertex / AU per file, and many of these files are one file.
To handle this, btw, the guidance is to use ADF to split the file into "many" small input files, then ADLA will scale. (or, switch to spark and hack it out)
Mike R commented
You can find an example JSON extractor at https://github.com/Azure/usql/tree/master/Examples/DataFormats and some sample U-SQL queries using it at
If you need a built-in JSON extractor, please look at this request: https://feedback.azure.com/forums/327234-data-lake/suggestions/10828575-support-for-json
Alexander Batishchev commented
+1 for json extractor as a first-class citizen
Steven Mayer commented
The JSON extractor has been updated to work with files > 1GB. For documents that are JSON document per line, you can use a mix of: Extractors.Text with delimiter of \n (believe this is default) and use the JSON function JsonTuple to parse through each line of JSON.
Thanks for the sample toolkit - however I have tested the Json extractor and it does not work with large data files (> 1GB).
Michael Rys commented
Thanks for your suggestion. At the moment we are shipping a sample JSON (and XML) toolkit that includes extracting from JSON files. You can find it in the sample GitHub directory at http://usql.io. I would like to ask you to use it and tell us about your experience with it. We are planning on using the early feedback on the toolkit to improve it before making it native.
Benjamin Guinebertière commented
Support JSON files, as well as files containing a JSON document per line.
Scenario: Web Analytics sends a JSON document per web event. It then must be transformed before going to SQL DW