USQL String Data Type has a size limit of 128KB
USQL String Column Data Type has a size limit of 128KB. This limits uploading/processing the text data larger than 128kb through USQL job. For example, if the text data type in SQL has XML content, which size greater than 300KB, it fails uploading/processing with USQL. Can we increase the string data type size?
Using the R.Reducer, I want to save some R objects that took me a loooooong time to calculate. They are quite large (10-50MBs) but I need to be able to get them out of R, save them to ADLS and send them back to ADLA+R when doing some other computations.
We need to be able to put larger single objects than 128k in and out of R. (Yeah, I know I can use DEPLOY RESOURCE, but it does not have the dynamics of passing things around using the Reducer)
Paul Andrew commented
For info, this relates to this SO question/answer: https://stackoverflow.com/questions/44631022/value-too-long-failure-when-attempting-to-convert-column-data
Alex Kyllo commented
Cosmos strings don't seem to have this limitation, and a string in C# .NET can be up to 2GB in size, and in T-SQL an nvarchar(max) field can also be up to 2 GB. So it's very strange that this limitation exists in U-SQL especially since it is marketed as a big data processing language. Also the 4MB per row limitation is too restrictive for data such as JSON, XML, and unstructured text. Please consider improving the U-SQL runtime to remove these limitations.
Carolus Holman commented
When trying to access an array of elements the Extractor fails when typing the json array object as a string. If I directly load the array using the jsonpath in the extractor ex. JsonExtractor(body.telemetry.sensors[*]) I can load the fragment, but using this method I cannot get to the other parts of the json object such as the header. I have scoured the internet but I cannot find a solution.
The workaround of reading it as byte does not work when dealing with Gzip compressed files.
In lot of cases, there is a need to read a larger string and then parse out a smaller portion of it.
Please add support for reading larger string values...similar to HIVE please.
Rukmani Gopalan commented
While a byte works fine, it is restrictive in terms of not being able to use string operations. E.g. think of a scenario of mining long error messages (with a huge error stack) where you are looking for a specific tag - a substring function will be handy. Another scenario - working with string encodings of huge objects like images to perform operations.
Michael Rys commented
Thanks for filing. Currently the recommended workaround is to put the data into a byte array (byte). Note that for XML documents that may have a self-contained encoding, that may be the better way anyway.
What is the expectation for such a type? Do you still want to dot into it and have it type compatible with the core string type? Or do you want it as a different type?