When reading files, it's easy to read from files, using the wildcard characters and dynamic parts of the URI. (see the Month variable)
EXTRACT Customer string,
USING new USQLExtractors.CustomExtractor();
I als want to have this behavior at the output side. I want to split out by customer (and don't know these customer upfront)
Please make this possible.218 votes
The private preview is in full swing and will soon become public preview. See https://github.com/Azure/AzureDataLake/blob/master/docs/Release_Notes/2018/2018_Spring/USQL_Release_Notes_2018_Spring.md#data-driven-output-partitioning-with-output-fileset-is-in-private-preview
Parquet is (becoming) the standard format for storing columnar data in the Big Data community. Almost all open-source projects, like Spark, Hive, Drill, ... support parquet as a first class citizen.
It's the succesor to ORC files.142 votes
Here is the release notes and documentation. Thanks for all the votes and the patience!
Please add the ability to read from and write to the ORC file format. This format is very popular due to the high compression and predicate push-down features.120 votes
In the built it outputters allow generating a heading row so that it can be opened in tools like Excel easily.60 votes
Today EXTRACT provides the ability to auto-decompress .gz files. The same should be supported on OUTPUT: If the target file name is ending in .gz, the resulting file should be compressed.58 votes
This will be released GA in next refresh. For now in public preview:
Today many files that get created on Windows are created with the OS' ANSI code page (Windows-125x or Shift-JIS5). Currently the U-SQL Extractors and Outputters only support ASCII, UTF-8 and UTF-16 encodings. Please add the common ANSI code pages and Shift-JIS code pages to the extractors/outputters.54 votes
Please not Michael’s comments below. This is complete.
The way to use it is one of the following way. We support the Windows-125x and ISO code pages. Shift-JIS is currently not supported since that encoding is using overlapping code points and can lead to wrong data if a parallelization split happens inside an encoding.
// Using code page number with a variable.
windows1252 = System.Text.Encoding.GetEncoding(1252); … USING Extractors.Tsv(encoding:windows1252);
// Using direct code page name. Inlined into invocation.
When you read a flat file a header is usually present. It would be nice if Extractors would have an option to skip the header automatically.35 votes
One of our scenarios has data partitioned against multiple dimensions in Azure Blobs, the result is many small blobs which we would like to do analytics against. Currently it seems to time out during compilation after 10 minutes when accessing too many Blobs. We would like ADL to support this scenario efficiently without compilation or execution problems.29 votes
The part about many files has been released:
The part about small files is in public preview: https://github.com/Azure/AzureDataLake/blob/master/docs/Release_Notes/2018/2018_Spring/USQL_Release_Notes_2018_Spring.md#input-file-set-uses-less-resources-when-operating-on-many-small-files-is-in-public-preview
Use the industry-standard mechanism (NuGet.org) for publishing assemblies instead of forcing me to manually locate the assemblies in "C:\Program Files (x86)\Microsoft Visual Studio 14.0\Common7\IDE\PublicAssemblies\" and then copy & register them in my project. This would also simplify updating to newer versions of the assemblies.23 votes
Thanks for the feedback here. We have posted the SDKs you were looking for here. https://www.nuget.org/packages/Microsoft.Azure.DataLake.USQL.SDK/ and also https://www.nuget.org/packages/Microsoft.Azure.DataLake.USQL.Interfaces/1.0.0
I would like to hold image files (like JPEG) in ADLS with other semi-structured data. It is good to have the way to read file properties like date/time from U-SQL.22 votes
This feature is supported now: https://github.com/Azure/AzureDataLake/blob/master/docs/Release_Notes/2018/2018_Spring/USQL_Release_Notes_2018_Spring.md#u-sql-adds-support-for-computed-file-property-columns-on-extract
Please send us feedback about it.
I have a TVF with many EXTRACTS. I have to repeat the schema multiple times - one for each extract. I'd rather have a way of defining the schema in one place and then reusing that in the EXTRACTS.
I see that there is a CREATE TYPE DDL that allows me to name a table type. Can I use this in an EXTRACT?
NOTE: I can't merge these into a single EXTRACT with stream sets. The filenames are not organized in a way that makes that possible. So i must use multiple EXTRACTS22 votes
Hi Saveen. Thanks for the patience :). This is now supported. See https://github.com/Azure/AzureDataLake/blob/master/docs/Release_Notes/2018/2018_Spring/USQL_Release_Notes_2018_Spring.md#the-extract-expressions-schema-can-be-specified-with-a-table-type
Please provide PIVOT and UNPIVOT in U-SQL.
Voters, please indicate if you are ok with static PIVOT a la https://msdn.microsoft.com/en-us/library/ms177410(SQL.105).aspx.18 votes
will result in 4 columns, please parse single and double quotes correctly12 votes
We’ve updated the management portal to include these capabilities.
Something like System.Security.Principal.WindowsIdentity.GetCurrent().Name in Windows or USER_NAME() in T-SQL.
The main objective is to get ahold of the user name and use it as a filter in a WHERE clause.8 votes
Thanks for your patience. This functionality is now available. Please see https://github.com/Azure/AzureDataLake/blob/master/docs/Release_Notes/2018/2018_Spring/USQL_Release_Notes_2018_Spring.md#u-sql-adds-job-information-system-variable-jobinfo
Having #if directive to condition code based on execution environment would be nice.
@source = "local source file"
@source = "adl source"
Another way would be to declare U-SQL variables (@source in previous example) and set values based on execution environment (similar to configuration settings in Azure Tools).8 votes
The constant foldable IF has shipped. See https://github.com/Azure/AzureDataLake/blob/master/docs/Release_Notes/2016/2016_08_01/USQL_Release_Notes_2016_08_01.md#u-sql-supports-a-compile-time-if-statement for details.
Add XML Namespace support to XML Extractor. See for example http://stackoverflow.com/questions/35132327/adla-xmlextractor-cant-read-properties.7 votes
The XmlDomExtractor has been updated to support namespaces.
This has been recently shipped into the Azure portal for monitoring jobs
In my TVF I have to specify the schema in the result function syntax, but the query for the result already knows the schema.
It would be awesomely convenient to have the return type be optional so that the schema could be derived from the rowset being returned.6 votes
Allow ADLA job metadata to be accessible directly from U-SQL scripts and procedures through variables such as @@jobid and @@jobsubmitter.
Having this information available from within U-SQL scripts and procedures will help to support scenarios where processing activity needs to be logged to an ADLA table.
This feature request is closely related to: https://feedback.azure.com/forums/327234-data-lake/suggestions/13701351-need-a-provision-to-capture-the-job-id-of-a-adla-j5 votes
Hi Michael. This capability is now available. See https://github.com/Azure/AzureDataLake/blob/master/docs/Release_Notes/2018/2018_Spring/USQL_Release_Notes_2018_Spring.md#u-sql-adds-job-information-system-variable-jobinfo
Please let us know if you need more information exposed.
Our entities allow adding columns and we would like to pass in down to the managed table.5 votes
This capability has been added. Please see https://github.com/Azure/AzureDataLake/blob/master/docs/Release_Notes/2016/2016_07_14/USQL_Release_Notes_2016_07_14.md#u-sql-now-supports-adding-and-removing-columns-on-u-sql-tables for details.
- Don't see your idea?