bcp utility should support RFC 4180 CSV files
The BCP utility (and it's BULK INSERT sibling) represents the fastest way to load bulk data into SQL Server. Unfortunately, it only has the most basic support for delimited files. It's impossible for the delimited file to have the delimiter in any of its field values.
CSV, a type of delimited file, has a standard way for dealing with special characters in its values, defined in RFC 4180 ( http://www.ietf.org/rfc/rfc4180.txt ). But BCP only supports use of comma as a delimiter; there is no support for the additional features of the CSV file format.
Upvotes: 21<=-=Mar 24 2006 10:28AM=-=>
This is a reasonable suggestion and will will consider it in the future. Could you supply more information why you need this enhancement? You can have commas in values that are quoted. Could you explain why this does not meet your needs?<=-=Mar 25 2006 3:23PM=-=>
It’s a common question on how to load files where some fields are enclosed in quotes, and where there is a header where it is not. Example:
12,“This is col2”,“And col3”,19
This particular file is not loadable by BCP at all. On the other hand:
“12”,“This is col2”,“And col3”,19
this one is, because in the format file you can defined an initial dummy column that will swallow the header.
I have previously suggested that it should be possible to describe a header, so that BCP can skip it.
Adding pre-defined support for CSV seems like a good idea, as it could relieve people from writing format files for a very common format.<=-=Mar 27 2006 9:14AM=-=>
Some of the more exotic uses of CSV can be found in the RFC itself ( http://www.ietf.org/rfc/rfc4180.txt ):6. Fields containing line breaks (CRLF), double quotes, and commas should be enclosed in double-quotes. For example: “aaa”,"b CRLF bb",“ccc” CRLF zzz,yyy,xxx 7. If double-quotes are used to enclose fields, then a double-quote appearing inside a field must be escaped by preceding it with another double quote. For example: “aaa”,"b""bb",“ccc”
As you can see, CSV has defined ways for dealing with quotes and newlines. It’s possible to store any type of data in a CSV field, so BCP should have direct support for CSV so no tricks with format files would be necessary. I also like Erland’s suggestion regarding the first-row-is-a-header handling (a common practice in CSVs).
Many data processing vendors prefer the data in CSV files because the format provides universal compatiblity. If BCP ( and bulk insert ) had full support for this format, it would save us signficant coding and maintenance work.
Excel has full support for CSV files (2003 even adds full Unicode support), so the format provides a great way to get data to and from Excel fans.<=-=Oct 18 2006 9:04AM=-=>
Of Course, the single most fustrating thing about not support the CSV format properly is that can you have a comma in the middle of a column, for example
bcp and OpenRowset bulk does not read this correctly whenever I try it.<=-=Nov 1 2007 1:34PM=-=>
Thank you for this additional feedback. We will give CSV support in BCP serious consideration for a future release of SQL Server, but this will be after SQL Server 2008.<=-=Apr 4 2010 1:39PM=-=>
Microsoft posted an update that this will be improved in the next “major” version in the comments section: https://connect.microsoft.com/SQLServer/feedback/details/281398/sql-2008-flat-file-provider-still-broken<=-=Aug 10 2012 4:54PM=-=>
OK, SQL Server 2012 is out now. What happened to it was going to be fixed “in the next major version”? This is pathetic.