PolyBase: allow escaping string delimiters within string fields
If you want to quote the string columns, and if one of the quoted columns has a quote character in it, the proper way to encode this is to double the quote character. (Build a CSV file in Excel and you will see that's how it saves it. I would say Excel is the authority on CSV formats.)
it appears it's impossible for Polybase to properly load that file. For example the following format is not interpreted right with STRING_DELIMITER = '"' since it divides the value 2.5" into two columns
"3.5 feet"|"3.5 ft"
Please improve Polybase to properly parse these files.
awais aslam commented
any update on it?
Andy Steinke commented
Is there any update on this? It feels like this is more of a requirement than a "nice to have" for something intended for enterprise data management. Keep up the good work!
Well, where are we at with this? Please add asap.
I vote for, the feature is needed.
Raghavendra Narayana commented
The alternate I am trying is to use " (double quotes) for Character Fields while exporting data, and when importing the data I will use Replace(column_name, '\~', '~'), assuming escape character used is \ and delimiter used is ~. Let me know if you see any issues in this approach please.
Charlie Cook commented
Voting for this as well. Scenarios where a text input file has tabs, newlines, embedded double quotes all should be able to be handled by having double quotes around the field. This type of file is processed fine by say Excel, but not by polybase / ADW external tables. This makes it impossible to simply ingest data into ADLA and put polybase external tables on them if there is any data from text fields that may be user entered comments for instance.
I vote for, the feature is needed.
When will this very necessary feature be released? Any timeline?
Anu Bosky commented
How can i Insert a record which has line breaks to a single nvarchar(max) column of azure datawarehouse. Kindly help
Any update on this????
This is highly needed
Mitch vH commented
There is a DBLoader utility in the DW samples (https://github.com/Microsoft/sql-data-warehouse-samples/tree/master/samples/utility/DBLoader) that allows you to do this by using a back slash in front of a double quote. Note that this is not a replacement for Polybase, but it provides a work around until the product team has a change to release this functionality in Polybase.
Hello @SQL DW Team,
any news on this?
It is an essential functionality to read escaped fields inside a CSV.
Do you have a workaround if there is no way to change the source file?
Hasan Iqbal commented
Want to Improve ADW/PolyBase experience or Data Import? Enable the developer with the following requests:
1. Enable error logging in file or somewhere that can be reviewed for debugging
2. You not only provide the Column Ordinal Value, CAN you PLEASE include the row number for which PolyBase query to import data from EXTERNAL Table failed?
3. Allow double_quote " character to be used as TEXT or STRING field QUALIFIER, which also may be embedded within the TEXT or STRING field itself. Just allow standard CSV file to be processed
4. Make PolyBase query richer for data format
You can use custom separator like char ascii 127 and get rid of text separators. Polybase does not support using function CHAR() in format of file but you can just copy-paste a value generated in notepad++. It works fine, and eliminates all of your problems. Polybase should support custome char on row delimiter also ;)
Hi Sql DW Team. Any news about this ? Because this is a huge issue since we cannot control the data that is comming and if we start parsing the performance goes down
Chris Thompson (m1nkeh) commented
need this ASAP!
Brynn Borton commented
Hi Sahaj This is a big priority for me to make this usable in a production scenario for clients. In general you do not have control over source files you are loading, for example I am dealing with large extracts of web analytics data from Omniture. Without this feature my only other option seems to be parse each individual file separately to remove any escaped double quotes. Which kind of defeats the point of PolyBase as I may as well load it directly into the warehouse while parsing. I am dealing with around 5GB of data an hour. Adding what on the surface seems to me to be a fairly simple fix would add a massive amount to the overall product offering.
Sahaj Saini commented
Thanks for the feedback Greg. You are correct about how Excel builds CSV files. However, today PolyBase requires you to choose a column or string delimiter that does not conflict with the data to correctly parse the file. We appreciate your suggestion and will look into improving this experience.