PolyBase: allow retrieving rejected rows in an error file
Let's say you define an external table with:
REJECT_VALUE = 1000,
REJECT_TYPE = value
That means that it can reject and discard 1000 rows without failing the entire command.
Please enhance Polybase to give us visibility into which rows were rejected and why. I would propose you can define an error output file where rejected rows get written. Either that or let us query some DMV to get a copy of the complete row as a string and the error reason.
We are happy to announce that this is completed.
Please look at the release blog https://azure.microsoft.com/en-us/blog/load-confidently-with-sql-data-warehouse-polybase-rejected-row-location/ for more information.
Stange, James commented
Loading parquet files is a frustrating experience. Polybase's error messages are very brief and unhelpful, no column name, nor filename, nor record count is logged. In addition the REJECTED_ROW_LOCATION doesn't work with parquet files.
Manoj Kumar Gupta commented
When we execute the polybase to load the external table data into DW table, we observe reject files are getting generated in REJECTED_ROW_LOCATION as expected. Addition to this, our requirement is to get reject record count to stop the load if it reaches the reject records threshold. As of now it is not available to get reject record count to automate the process. Would like to know when can we expect it
AFAIK this has been released and you can use the following option to write ann error file in the table creation options "REJECTED_ROW_LOCATION = 'reject'"
Is there any plan to fix the issue?
Glen Swan commented
Yes please! If there is anyway to redirect reject row counts, not the actual details of the record, but just counts to the Azure admin panel so we could create alerts on them that would be awesome too!
I suggest you divert the bad records to a table containing one column with the entire bad record, one with the error message, a third with the import date. Perhaps there are other columns of interest like source and target tables. Row number was suggested below. I know you may have an issue if the entire source file does not contain a row delimiter, but you could limit the row to the max allowable characters or throw a different error indicating that you can parse the rows from the file(s).
Catching the output is essential when you want to automate!
Andrew Dale commented
Hi SQL DW Team, what does future release 6891260 mean? Is there a date for this?
Bhavik Merchant commented
Just adding that I have the same issue.. the improved error messaging is really good.. but i am bulk loading billions of rows and i need to be able to log the ones that failed
Abhijit More commented
This would be much awaited option as its difficult capture row level failure log.
Brynn Borton commented
Hi Sahaj I can see the message output which is useful but is this log kept anywhere when running a stored procedure from Data Factory for example.
Thanks for this improvement. One more thing is missing - line number. Error description is quite detailed, but it should contain line number.
Sahaj Saini commented
The SQL Data Warehouse team has addressed this feedback by improving the PolyBase error message for row rejections; it now provides the reason for row rejection. You can read more details in the November blog: https://azure.microsoft.com/en-us/blog/sql-data-warehouse-november-2015-updates/
Sahaj Saini commented
Thanks for the feedback Greg. We understand that a number of our customers are using PolyBase as their primary data loading mechanism to SQL Data Warehouse and have recognized the need to improve this experience. We are currently working on improving the reject experience so that a customer can easily discover the rejected rows and the reason for rejection so they can fix the source data.