How can we improve Microsoft Azure SQL Data Warehouse ?

PolyBase: allow escaping string delimiters within string fields

If you want to quote the string columns, and if one of the quoted columns has a quote character in it, the proper way to encode this is to double the quote character. (Build a CSV file in Excel and you will see that's how it saves it. I would say Excel is the authority on CSV formats.)

it appears it's impossible for Polybase to properly load that file. For example the following format is not interpreted right with STRING_DELIMITER = '"' since it divides the value 2.5" into two columns

SizeDescription|SizeCode
"3.5 feet"|"3.5 ft"
"2.5 inches"|"2.5"""

Please improve Polybase to properly parse these files.

201 votes
Sign in
(thinking…)
Sign in with: oidc
Signed in as (Sign out)

We’ll send you updates on this idea

Greg Galloway shared this idea  ·   ·  Flag idea as inappropriate…  ·  Admin →

19 comments

Sign in
(thinking…)
Sign in with: oidc
Signed in as (Sign out)
Submitting...
  • Andy Steinke commented  ·   ·  Flag as inappropriate

    Is there any update on this? It feels like this is more of a requirement than a "nice to have" for something intended for enterprise data management. Keep up the good work!

  • Raghavendra Narayana commented  ·   ·  Flag as inappropriate

    The alternate I am trying is to use " (double quotes) for Character Fields while exporting data, and when importing the data I will use Replace(column_name, '\~', '~'), assuming escape character used is \ and delimiter used is ~. Let me know if you see any issues in this approach please.

  • Charlie Cook commented  ·   ·  Flag as inappropriate

    Voting for this as well. Scenarios where a text input file has tabs, newlines, embedded double quotes all should be able to be handled by having double quotes around the field. This type of file is processed fine by say Excel, but not by polybase / ADW external tables. This makes it impossible to simply ingest data into ADLA and put polybase external tables on them if there is any data from text fields that may be user entered comments for instance.

  • Jaime commented  ·   ·  Flag as inappropriate

    When will this very necessary feature be released? Any timeline?

  • Anu Bosky commented  ·   ·  Flag as inappropriate

    How can i Insert a record which has line breaks to a single nvarchar(max) column of azure datawarehouse. Kindly help

  • Anonymous commented  ·   ·  Flag as inappropriate

    Hello @SQL DW Team,
    any news on this?
    It is an essential functionality to read escaped fields inside a CSV.
    Do you have a workaround if there is no way to change the source file?

  • Hasan Iqbal commented  ·   ·  Flag as inappropriate

    Want to Improve ADW/PolyBase experience or Data Import? Enable the developer with the following requests:
    1. Enable error logging in file or somewhere that can be reviewed for debugging
    2. You not only provide the Column Ordinal Value, CAN you PLEASE include the row number for which PolyBase query to import data from EXTERNAL Table failed?
    3. Allow double_quote " character to be used as TEXT or STRING field QUALIFIER, which also may be embedded within the TEXT or STRING field itself. Just allow standard CSV file to be processed
    4. Make PolyBase query richer for data format

  • Wojtek commented  ·   ·  Flag as inappropriate

    You can use custom separator like char ascii 127 and get rid of text separators. Polybase does not support using function CHAR() in format of file but you can just copy-paste a value generated in notepad++. It works fine, and eliminates all of your problems. Polybase should support custome char on row delimiter also ;)

  • Anonymous commented  ·   ·  Flag as inappropriate

    Hi Sql DW Team. Any news about this ? Because this is a huge issue since we cannot control the data that is comming and if we start parsing the performance goes down

  • Brynn Borton commented  ·   ·  Flag as inappropriate

    Hi Sahaj This is a big priority for me to make this usable in a production scenario for clients. In general you do not have control over source files you are loading, for example I am dealing with large extracts of web analytics data from Omniture. Without this feature my only other option seems to be parse each individual file separately to remove any escaped double quotes. Which kind of defeats the point of PolyBase as I may as well load it directly into the warehouse while parsing. I am dealing with around 5GB of data an hour. Adding what on the surface seems to me to be a fairly simple fix would add a massive amount to the overall product offering.

  • Sahaj Saini commented  ·   ·  Flag as inappropriate

    Thanks for the feedback Greg. You are correct about how Excel builds CSV files. However, today PolyBase requires you to choose a column or string delimiter that does not conflict with the data to correctly parse the file. We appreciate your suggestion and will look into improving this experience.

Feedback and Knowledge Base