How can we improve Microsoft Azure Data Lake?

USQL String Data Type has a size limit of 128KB

USQL String Column Data Type has a size limit of 128KB. This limits uploading/processing the text data larger than 128kb through USQL job. For example, if the text data type in SQL has XML content, which size greater than 300KB, it fails uploading/processing with USQL. Can we increase the string data type size?

91 votes
Sign in
Check!
(thinking…)
Reset
or sign in with
  • facebook
  • google
    Password icon
    I agree to the terms of service
    Signed in as (Sign out)

    We’ll send you updates on this idea

    Meer AlamMeer Alam shared this idea  ·   ·  Flag idea as inappropriate…  ·  Admin →

    8 comments

    Sign in
    Check!
    (thinking…)
    Reset
    or sign in with
    • facebook
    • google
      Password icon
      I agree to the terms of service
      Signed in as (Sign out)
      Submitting...
      • J MeadeJ Meade commented  ·   ·  Flag as inappropriate

        we want to use u-sql for data prep, nlp processing, and/or merging multiple smaller data files into larger consolidated files. we're running into issues when reading any rows containing fields that are beyond this limit. it would be really helpful if we could work under the same constraints as .NET and/or T-SQL.

      • FlaskepostFlaskepost commented  ·   ·  Flag as inappropriate

        Using the R.Reducer, I want to save some R objects that took me a loooooong time to calculate. They are quite large (10-50MBs) but I need to be able to get them out of R, save them to ADLS and send them back to ADLA+R when doing some other computations.

        We need to be able to put larger single objects than 128k in and out of R. (Yeah, I know I can use DEPLOY RESOURCE, but it does not have the dynamics of passing things around using the Reducer)

      • Alex KylloAlex Kyllo commented  ·   ·  Flag as inappropriate

        Cosmos strings don't seem to have this limitation, and a string in C# .NET can be up to 2GB in size, and in T-SQL an nvarchar(max) field can also be up to 2 GB. So it's very strange that this limitation exists in U-SQL especially since it is marketed as a big data processing language. Also the 4MB per row limitation is too restrictive for data such as JSON, XML, and unstructured text. Please consider improving the U-SQL runtime to remove these limitations.

      • Carolus HolmanCarolus Holman commented  ·   ·  Flag as inappropriate

        When trying to access an array of elements the Extractor fails when typing the json array object as a string. If I directly load the array using the jsonpath in the extractor ex. JsonExtractor(body.telemetry.sensors[*]) I can load the fragment, but using this method I cannot get to the other parts of the json object such as the header. I have scoured the internet but I cannot find a solution.

      • ShaSha commented  ·   ·  Flag as inappropriate

        The workaround of reading it as byte[] does not work when dealing with Gzip compressed files.
        In lot of cases, there is a need to read a larger string and then parse out a smaller portion of it.

        Please add support for reading larger string values...similar to HIVE please.

      • Rukmani GopalanRukmani Gopalan commented  ·   ·  Flag as inappropriate

        While a byte[] works fine, it is restrictive in terms of not being able to use string operations. E.g. think of a scenario of mining long error messages (with a huge error stack) where you are looking for a specific tag - a substring function will be handy. Another scenario - working with string encodings of huge objects like images to perform operations.

      • Michael RysMichael Rys commented  ·   ·  Flag as inappropriate

        Thanks for filing. Currently the recommended workaround is to put the data into a byte array (byte[]). Note that for XML documents that may have a self-contained encoding, that may be the better way anyway.

        What is the expectation for such a type? Do you still want to dot into it and have it type compatible with the core string type? Or do you want it as a different type?

      Feedback and Knowledge Base