Delimiters within data fields is a common problem with delimited files. Some common tactics to address this include:
- Recreate the data file with all occurrences of the delimiter stripped out of the data fields before they are written to file: this eliminates OPENROWSET errors, but does not preserve the integrity of the data.
- Recreate the data file with a different delimiter character: in my experience, a tab delimiter is a better choice. It's less common to encounter a tab character than a comma within data. But it's certainly not unheard of. I've seen tabs within data too.
- Enclose data fields in double quotes: this requires some tweaks to the XML format file.
Manually editing the data file might be do-able for any of the above options. But it can be tedious, especially for large files. (Just opening a file of several GB's in Notepad.exe is an exercise in patience.) Realistically, you'd want the author to re-create it for you. Option #1 should always "work". But again, there's the data integrity issue you may not be able to live with. Option #2 will probably work for many cases, but it's not bulletproof. Option #3 isn't bulletproof either (it's always possible to have a delimiter within a data field), but it's about as close as you can get. Plus, it preserves data integrity.
Here is one possibility for your XML format file:
<?xml version="1.0"?>
<BCPFORMAT xmlns="http://schemas.microsoft.com/sqlserver/2004/bulkload/format" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<RECORD>
<FIELD ID="1" xsi:type="CharTerm" TERMINATOR='","' MAX_LENGTH="5"/>
<FIELD ID="2" xsi:type="CharTerm" TERMINATOR='","' MAX_LENGTH="128" COLLATION="SQL_Latin1_General_CP1_CI_AS"/>
<FIELD ID="3" xsi:type="CharTerm" TERMINATOR='"\r\n' MAX_LENGTH="128" COLLATION="SQL_Latin1_General_CP1_CI_AS"/>
</RECORD>
<ROW>
<COLUMN SOURCE="1" NAME="Reference" xsi:type="SQLVARYCHAR"/>
<COLUMN SOURCE="2" NAME="Name" xsi:type="SQLVARYCHAR"/>
<COLUMN SOURCE="3" NAME="Street" xsi:type="SQLVARYCHAR"/>
</ROW>
</BCPFORMAT>
Notice the FIELD TERMINATOR: I used single quotes to identify "," as the terminator and "\r\n as the row terminator (the COLUMN 3 terminator). I made an educated guess that Name and Street are up to 128 characters--edit that as needed.
Problems:
- OPENROWSET() queries will return
Reference with a leading " double quote character. And because of that...
Reference cannot be returned as an INT (or SMALLINT, BIGINT, etc.). It gets returned as a VARCHAR (xsi:type="SQLVARYCHAR")
For the particular data sample provided, I'd remove the double-quotes from Reference data fields, adjust the XML format file so that FIELD ID="1" has TERMINATOR=',"', and further adjust the XML format file so that COLUMN SOURCE="1" has xsi:type="SQLINT".
For some additional info, this blog post may help: Getting Started With OPENROWSET and the BULK Rowset Provider - Part 2