[In preview] PolyBase: allow escaping string delimiters within string fields
This feedback item is currently in preview here:
If you want to quote the string columns, and if one of the quoted columns has a quote character in it, the proper way to encode this is to double the quote character. (Build a CSV file in Excel and you will see that's how it saves it. I would say Excel is the authority on CSV formats.)
it appears it's impossible for Polybase to properly load that file. For example the following format is not interpreted right with STRING_DELIMITER = '"' since it divides the value 2.5" into two columns
"3.5 feet"|"3.5 ft"
Please improve Polybase to properly parse these files.
Any update on this????
This is highly needed
Mitch vH commented
There is a DBLoader utility in the DW samples (https://github.com/Microsoft/sql-data-warehouse-samples/tree/master/samples/utility/DBLoader) that allows you to do this by using a back slash in front of a double quote. Note that this is not a replacement for Polybase, but it provides a work around until the product team has a change to release this functionality in Polybase.
Hello @SQL DW Team,
any news on this?
It is an essential functionality to read escaped fields inside a CSV.
Do you have a workaround if there is no way to change the source file?
Hasan Iqbal commented
Want to Improve ADW/PolyBase experience or Data Import? Enable the developer with the following requests:
1. Enable error logging in file or somewhere that can be reviewed for debugging
2. You not only provide the Column Ordinal Value, CAN you PLEASE include the row number for which PolyBase query to import data from EXTERNAL Table failed?
3. Allow double_quote " character to be used as TEXT or STRING field QUALIFIER, which also may be embedded within the TEXT or STRING field itself. Just allow standard CSV file to be processed
4. Make PolyBase query richer for data format
You can use custom separator like char ascii 127 and get rid of text separators. Polybase does not support using function CHAR() in format of file but you can just copy-paste a value generated in notepad++. It works fine, and eliminates all of your problems. Polybase should support custome char on row delimiter also ;)
Hi Sql DW Team. Any news about this ? Because this is a huge issue since we cannot control the data that is comming and if we start parsing the performance goes down
Chris Thompson (m1nkeh) commented
need this ASAP!
Brynn Borton commented
Hi Sahaj This is a big priority for me to make this usable in a production scenario for clients. In general you do not have control over source files you are loading, for example I am dealing with large extracts of web analytics data from Omniture. Without this feature my only other option seems to be parse each individual file separately to remove any escaped double quotes. Which kind of defeats the point of PolyBase as I may as well load it directly into the warehouse while parsing. I am dealing with around 5GB of data an hour. Adding what on the surface seems to me to be a fairly simple fix would add a massive amount to the overall product offering.
Sahaj Saini commented
Thanks for the feedback Greg. You are correct about how Excel builds CSV files. However, today PolyBase requires you to choose a column or string delimiter that does not conflict with the data to correctly parse the file. We appreciate your suggestion and will look into improving this experience.