How can we improve Microsoft Azure Data Lake?

USQL String Data Type has a size limit of 128KB

USQL String Column Data Type has a size limit of 128KB. This limits uploading/processing the text data larger than 128kb through USQL job. For example, if the text data type in SQL has XML content, which size greater than 300KB, it fails uploading/processing with USQL. Can we increase the string data type size?

169 votes
Sign in
Sign in with: Microsoft
Signed in as (Sign out)

We’ll send you updates on this idea

Meer Alam shared this idea  ·   ·  Flag idea as inappropriate…  ·  Admin →


Sign in
Sign in with: Microsoft
Signed in as (Sign out)
  • Daniel commented  ·   ·  Flag as inappropriate

    Each line of my input file is a serialized JSON object. It can easily be over 128KB, currently I have no option to parse it.

  • Li Li commented  ·   ·  Flag as inappropriate

    We are trying to use data lake to process some spatial data and some LINESTRING could be very long. The string will be used in the SqlServerSpatial assemblies, so it would be impossible to use byte array solution. Looking forward to a BIG data processing engine.

  • Saul Cruz commented  ·   ·  Flag as inappropriate

    An example of how to put the data into a byte array (byte[]) using the Extractors would be nice.

  • Kaz Gwozdz commented  ·   ·  Flag as inappropriate

    Any update on this? Is this limit going to be increased in the foreseeable future?

  • J Meade commented  ·   ·  Flag as inappropriate

    we want to use u-sql for data prep, nlp processing, and/or merging multiple smaller data files into larger consolidated files. we're running into issues when reading any rows containing fields that are beyond this limit. it would be really helpful if we could work under the same constraints as .NET and/or T-SQL.

  • Flaskepost commented  ·   ·  Flag as inappropriate

    Using the R.Reducer, I want to save some R objects that took me a loooooong time to calculate. They are quite large (10-50MBs) but I need to be able to get them out of R, save them to ADLS and send them back to ADLA+R when doing some other computations.

    We need to be able to put larger single objects than 128k in and out of R. (Yeah, I know I can use DEPLOY RESOURCE, but it does not have the dynamics of passing things around using the Reducer)

  • Alex Kyllo commented  ·   ·  Flag as inappropriate

    Cosmos strings don't seem to have this limitation, and a string in C# .NET can be up to 2GB in size, and in T-SQL an nvarchar(max) field can also be up to 2 GB. So it's very strange that this limitation exists in U-SQL especially since it is marketed as a big data processing language. Also the 4MB per row limitation is too restrictive for data such as JSON, XML, and unstructured text. Please consider improving the U-SQL runtime to remove these limitations.

  • Carolus Holman commented  ·   ·  Flag as inappropriate

    When trying to access an array of elements the Extractor fails when typing the json array object as a string. If I directly load the array using the jsonpath in the extractor ex. JsonExtractor(body.telemetry.sensors[*]) I can load the fragment, but using this method I cannot get to the other parts of the json object such as the header. I have scoured the internet but I cannot find a solution.

  • Sha commented  ·   ·  Flag as inappropriate

    The workaround of reading it as byte[] does not work when dealing with Gzip compressed files.
    In lot of cases, there is a need to read a larger string and then parse out a smaller portion of it.

    Please add support for reading larger string values...similar to HIVE please.

  • Rukmani Gopalan commented  ·   ·  Flag as inappropriate

    While a byte[] works fine, it is restrictive in terms of not being able to use string operations. E.g. think of a scenario of mining long error messages (with a huge error stack) where you are looking for a specific tag - a substring function will be handy. Another scenario - working with string encodings of huge objects like images to perform operations.

  • Michael Rys commented  ·   ·  Flag as inappropriate

    Thanks for filing. Currently the recommended workaround is to put the data into a byte array (byte[]). Note that for XML documents that may have a self-contained encoding, that may be the better way anyway.

    What is the expectation for such a type? Do you still want to dot into it and have it type compatible with the core string type? Or do you want it as a different type?

Feedback and Knowledge Base