[In preview] Polybase: allow field/row terminators within string fields
This feedback item is currently in preview here:
If a line end is present within a text field (qualified) importing data using Polybase will fail.
Loading this line will work fine:
Loading this will fail:
CREATE EXTERNAL FILE FORMAT textfileformat_raw
FORMAT_TYPE = DELIMITEDTEXT,
STRING_DELIMITER = '0x22',
USE_TYPE_DEFAULT = TRUE
CREATE EXTERNAL TABLE dbo.testexternal
LOCATION = 'testfailing.txt',
DATASOURCE = azurestorage,
FILEFORMAT = textfileformatraw
CREATE TABLE dbo.test
WITH (CLUSTERED COLUMNSTORE INDEX, DISTRIBUTION = ROUNDROBIN)
SELECT * FROM dbo.testexternal;
i agree i am having problems with this too. its really bad that you can't ingest columns with line feeds in Polybase.
Still getting ignored. Either Yes/No would really help !!!!!
Wow. A massive bug completely ignored by Microsoft for years. Unbelieveable. Polybase is fine, but utterly unusable on most circustances due to the limitations and lack of support.
Vijay Suram commented
Any update on this yet? This is blocking our loading. Using alternative as Azure Data factory is expensive to create multiple pipelines and triggers.
can you provide update?
any update on this?
this is a major show stopper since 2016. when are you guys gonna do anything about it. you are selling the polybase pattern but it doesn't support this fundamental need.
Please provide some feedback on this as this is a huge blocker for us and it also seems that microsoft is completely ignoring this matter.
Any update on this? This is a huge blocker for us to go with polybase approach for loading data into Azure Datawarehouse
Daniel cundy commented
I'm guessing this still hasn't been actioned..... Three years on.
Vinodh Ramasubramanian commented
Is this feature available atleast now in 2018 ?
Mitch vH commented
There is a DBLoader utility in the samples (https://github.com/Microsoft/sql-data-warehouse-samples/tree/master/samples/utility/DBLoader) that allows you to do this by using a \ in front of the double quote. Note that this is not a replacement for Polybase, but it provides a work around until the product team has a change to release this functionality in Polybase.
The other thing you can try is using Polybase with the ORC input file format.
Andrew Dale commented
Given that Azure Data Factory copy command allows you to specify both the columnDelimiter and the rowDelimiter, then EXTERNAL FILE FORMAT should do the same. This means you could then use the official ASCII codes. However, polybase does not allow you to specify a row terminaotr, only a field terminator.
Can you please let us know when can we expect this feature? This is urgent.
Himalayan Ninja commented
Because of this feature we are completely stuck and can't use polybase at all. How can a Dev team miss this basic and most important feature? Please fix this asap as it is very crucial for success of polybase
Himalayan Ninja commented
Can you please let us know when can we expect this feature? This is blocking our development
Seems to be a duplicate of this suggestion, which as of March 2017 is Under Review: https://feedback.azure.com/forums/307516-sql-data-warehouse/suggestions/10600132-polybase-allow-field-row-terminators-within-strin
Ashish Kumar commented
This is an absolutely desired functionality. Minimally, we need to be able to escape the field if they are used in the date.
CREATE EXTERNAL FILE FORMAT TextFileFormat
( FORMAT_TYPE = DELIMITEDTEXT
, FORMAT_OPTIONS ( FIELD_TERMINATOR = ','
, STRING_DELIMITER = '0x22'
, DATE_FORMAT = 'dd-MMM-yyyy HH:mm:ss'
, USE_TYPE_DEFAULT = FALSE
Does now allow the following string data to be loaded.
"Allowed codes are "A", "B", and "C"."
This is really important feature need, escaping characters is very basic in DB work.
Gerhard Brueckl commented
Polybase fails to read rows if a [delimited] text column contains a newline/carriage return character even if they are within a text delimiters
take this example: I write a line to a external table using CETAS and cannot read the very same table again:
CREATE EXTERNAL TABLE stage.testNewLine WITH
LOCATION = 'archive/NewLine',
DATA_SOURCE = myDatasource,
FILE_FORMAT = myFormat
'regular Text' AS Col1,
Line2' AS Col2
Everything that is written to SQL DW using INSERT should be able to be retrieved again using SELECT