Polybase: allow field/row terminators within string fields
If a line end is present within a text field (qualified) importing data using Polybase will fail.
Loading this line will work fine:
Loading this will fail:
CREATE EXTERNAL FILE FORMAT text_file_format_raw
FORMAT_TYPE = DELIMITEDTEXT,
STRING_DELIMITER = '0x22',
USE_TYPE_DEFAULT = TRUE
CREATE EXTERNAL TABLE dbo.test_external
LOCATION = 'test_failing.txt',
DATA_SOURCE = azure_storage,
FILE_FORMAT = text_file_format_raw
CREATE TABLE dbo.test
WITH (CLUSTERED COLUMNSTORE INDEX, DISTRIBUTION = ROUND_ROBIN)
SELECT * FROM dbo.test_external;
Thanks for the suggestion. We are looking into this scenario for a future release. “PolyBase should allow escaping row, field and string delimiters in string fields” – 8826021.
Mitch vH commented
There is a DBLoader utility in the samples (https://github.com/Microsoft/sql-data-warehouse-samples/tree/master/samples/utility/DBLoader) that allows you to do this by using a \ in front of the double quote. Note that this is not a replacement for Polybase, but it provides a work around until the product team has a change to release this functionality in Polybase.
The other thing you can try is using Polybase with the ORC input file format.
Andrew Dale commented
Given that Azure Data Factory copy command allows you to specify both the columnDelimiter and the rowDelimiter, then EXTERNAL FILE FORMAT should do the same. This means you could then use the official ASCII codes. However, polybase does not allow you to specify a row terminaotr, only a field terminator.
Can you please let us know when can we expect this feature? This is urgent.
Himalayan Ninja commented
Because of this feature we are completely stuck and can't use polybase at all. How can a Dev team miss this basic and most important feature? Please fix this asap as it is very crucial for success of polybase
Himalayan Ninja commented
Can you please let us know when can we expect this feature? This is blocking our development
Ashish Kumar commented
This is an absolutely desired functionality. Minimally, we need to be able to escape the field if they are used in the date.
CREATE EXTERNAL FILE FORMAT TextFileFormat
( FORMAT_TYPE = DELIMITEDTEXT
, FORMAT_OPTIONS ( FIELD_TERMINATOR = ','
, STRING_DELIMITER = '0x22'
, DATE_FORMAT = 'dd-MMM-yyyy HH:mm:ss'
, USE_TYPE_DEFAULT = FALSE
Does now allow the following string data to be loaded.
"Allowed codes are "A", "B", and "C"."
This is really important feature need, escaping characters is very basic in DB work.
Hi Team, Any deadline for availability of this feature.
Kapil Jain commented
Hi Team, Any update on this. Its really very important for us as our production jobs are failing due to this.
Chris Thompson (m1nkeh) commented
this was raised in Nov last year... we are not heading towards the end of July on a product that is going GA soon. This is a bug, and a massive bug at that. if i were to read this very same file in Excel, or SSIS, it would work absolutely fine.
What is the a) workaround, or b) timeline for resolution please?
Both of these will tell me what i need to know / do in terms of getting my system working right now.