PolyBase: allow skipping header rows in delimited text files
Setting reject rows in Polybase to ignore the header row is a hack. And if you have all string columns your header won't be rejected. Please add an IGNORE_HEADER_ROWS = <numRows> option to Polybase
Thank you all for the feedback and being patient while we implemented this functionality.
You can now skip header rows for delimited text files on Azure SQL DW by using the First_Row option in the external file format.
Documentation of the change is at https://docs.microsoft.com/en-us/sql/t-sql/statements/create-external-file-format-transact-sql
Rizwan Hanif commented
there is a workaround by using 'EXTERNAL FILE FORMAT' with 'FIRST_ROW = 2'.
e.g. if we create a file format
CREATE EXTERNAL FILE FORMAT [CsvFormatWithHeader] WITH (
FORMAT_TYPE = DELIMITEDTEXT,
FIELD_TERMINATOR = ',',
FIRST_ROW = 2,
STRING_DELIMITER = '"',
USE_TYPE_DEFAULT = False
And then use this file format with create external table
CREATE EXTERNAL TABLE [testdata].[testfile1]
[column1] [nvarchar](4000) NULL
WITH (DATA_SOURCE = data_source,
LOCATION = file_location,
FILE_FORMAT = [CsvFormatWithHeader],REJECT_TYPE = PERCENTAGE,REJECT_VALUE = 100,REJECT_SAMPLE_VALUE = 1000)
It will skip first row while executing queries for 'testdata.testfile1'.
Chi Huynh commented
Please add this feature to Polybase. We have a hard time skipping the headers manually.
Hi SQL DW team, Do we have any update regarding skipping header row in ploybase. As you mentioned regarding this scenario for release. 6891051. When can we have this option in Polybase,
Pawel Potasinski commented
This is annoying. So much talking about the elasticity and time-to-market of cloud-based solutions... While every data engineer will prefer to stay on-prem seeing nightmares like this one.
Chris Bailiss commented
Should have been MVP. Current functionality adds extra difficulty to data loading for no benefit whatsoever.
John Klacynski commented
Any update ? Must have for our business
Alejandro Leguizamo commented
Hi MS: Any updates on this?
Chris Bailiss commented
Very surprised that Polybase doesn't have this feature. At least one column heading/header row is standard in almost all CSV files. This adds another barrier to using/loading data into Azure SQL DW that shouldn't be present at all. Surely should have been part of the MVP.
PS. Reject Rows is not really a viable workaround, e.g. if all columns are text this won't strip out the heading.
The SSIS team is working on a new control flow task for SSIS Azure Feature Pack called SQL DW Upload Task. It uploads on-premises local data to SQL DW via blob storage using PolyBase. If local delimited-text files contain header rows, it will remove them automatically upon uploading to blob storage.
Binh Cao commented
this is a must have to remove the friction that customers might have.
also, we need to add templates utility (function/stored procedure) that would help with creation of hundreds of Polybase External Tables in scenarios where we have hundreds of files to work with. We do not want to manually create one by one external tables.
Ralph Kemperdick commented
Just skipping the Header rows seems to me not be worth the effort.
Why not map of first line header row to fields in the target table ?
And please do not load the header into the data !!
With this regards IGNOR_HEADER_ROWS makes some sense.
Reject_value=1 dose not make any sense, as the Option has a very different purpose.
I also want to suggest skiping header.
This is very important feature for loading data from any file. Current work arround is to use reject_value =1 which is not 100% proof. Even we use this option it creates an issue while inserting data into another table from external table.
Anthony Kunnel Jose commented
Stream Analytics puts out CSV files with header, however Data Factory does not parse CSV files with a header correctly. Please allign these services by correctly parsing CSV file headers when using them as an input.