How can we improve Microsoft Azure Data Lake?

USQL String Data Type has a size limit of 128KB

USQL String Column Data Type has a size limit of 128KB. This limits uploading/processing the text data larger than 128kb through USQL job. For example, if the text data type in SQL has XML content, which size greater than 300KB, it fails uploading/processing with USQL. Can we increase the string data type size?

61 votes
Sign in
Check!
(thinking…)
Reset
or sign in with
  • facebook
  • google
    Password icon
    I agree to the terms of service
    Signed in as (Sign out)

    We’ll send you updates on this idea

    Meer AlamMeer Alam shared this idea  ·   ·  Flag idea as inappropriate…  ·  Admin →

    7 comments

    Sign in
    Check!
    (thinking…)
    Reset
    or sign in with
    • facebook
    • google
      Password icon
      I agree to the terms of service
      Signed in as (Sign out)
      Submitting...
      • FlaskepostFlaskepost commented  ·   ·  Flag as inappropriate

        Using the R.Reducer, I want to save some R objects that took me a loooooong time to calculate. They are quite large (10-50MBs) but I need to be able to get them out of R, save them to ADLS and send them back to ADLA+R when doing some other computations.

        We need to be able to put larger single objects than 128k in and out of R. (Yeah, I know I can use DEPLOY RESOURCE, but it does not have the dynamics of passing things around using the Reducer)

      • Alex KylloAlex Kyllo commented  ·   ·  Flag as inappropriate

        Cosmos strings don't seem to have this limitation, and a string in C# .NET can be up to 2GB in size, and in T-SQL an nvarchar(max) field can also be up to 2 GB. So it's very strange that this limitation exists in U-SQL especially since it is marketed as a big data processing language. Also the 4MB per row limitation is too restrictive for data such as JSON, XML, and unstructured text. Please consider improving the U-SQL runtime to remove these limitations.

      • Carolus HolmanCarolus Holman commented  ·   ·  Flag as inappropriate

        When trying to access an array of elements the Extractor fails when typing the json array object as a string. If I directly load the array using the jsonpath in the extractor ex. JsonExtractor(body.telemetry.sensors[*]) I can load the fragment, but using this method I cannot get to the other parts of the json object such as the header. I have scoured the internet but I cannot find a solution.

      • ShaSha commented  ·   ·  Flag as inappropriate

        The workaround of reading it as byte[] does not work when dealing with Gzip compressed files.
        In lot of cases, there is a need to read a larger string and then parse out a smaller portion of it.

        Please add support for reading larger string values...similar to HIVE please.

      • Rukmani GopalanRukmani Gopalan commented  ·   ·  Flag as inappropriate

        While a byte[] works fine, it is restrictive in terms of not being able to use string operations. E.g. think of a scenario of mining long error messages (with a huge error stack) where you are looking for a specific tag - a substring function will be handy. Another scenario - working with string encodings of huge objects like images to perform operations.

      • Michael RysMichael Rys commented  ·   ·  Flag as inappropriate

        Thanks for filing. Currently the recommended workaround is to put the data into a byte array (byte[]). Note that for XML documents that may have a self-contained encoding, that may be the better way anyway.

        What is the expectation for such a type? Do you still want to dot into it and have it type compatible with the core string type? Or do you want it as a different type?

      Feedback and Knowledge Base