6

I have an XML stored procedure in MS SQL 2005 which I use the SqlCommand.ExecuteXmlReader to get an XmlReader, then parse through the data and form an XML document. The problem is that the data in SQL contains some binary characters which are illegal within a UTF-8 XML document, so an exception is thrown.

Has anyone else dealt with this problem? I've considered filtering the data on input into the DB, but then I'd have to put the filtering everywhere, and every character would need to be checked.

Any other suggestions?

EDIT: The data is typically stored in varchar columns of various length. The data is actually input from users on web forms (ASP .NET app). So sometimes they copy-paste from MS Word or something and it puts these strange binary characters in.

5 Answers 5

1

I've have seen the DotNet SqlClient "scramble" data from nvarchar columns in the database, our theory that was its something to do with "surrogate code points", see:

http://www.siao2.com/2005/07/27/444101.aspx

http://publib.boulder.ibm.com/infocenter/iseries/v5r3/index.jsp?topic=rzaaxsurrogate.htm

http://publib.boulder.ibm.com/infocenter/db2luw/v8/index.jsp?topic=/com.ibm.db2.udb.doc/admin/c0004816.htm

SqlClient seemed to "interpret" some of the bytes meaing that our Xml was no longer well formed, converting to nvarchar(max) seemed to stop this (although this did have a performance impact):

SELECT CONVERT(NVARCHAR(MAX), MyValue) FROM ...

Note that you need to use NVARCHAR(MAX), NVARCHAR( N ) doesnt work.

We also found that the OleDB provider works correctly as well (although it is slower than the SqlClient).

Sign up to request clarification or add additional context in comments.

Comments

0

How did the bad data get into the database? Are you using an XML column?

You can put the filtering (it's called "validation", actually) into the stored procedures used to enter data into the database, or you can add triggers to check the data regardless of where it comes from.

In general, don't allow bad data to get into the database!

1 Comment

The data is user-input stored in varchar columns in the database.
0

Is this a matter of encoding? Or is the xml just malformed? If malformed, I can't help. But for encoding... it is unfortunate that ExecuteXmlReader doesn't let you specify the encoding, but you could treat the data as a BLOB, and process it separately with your own encoding and XmlReader?

If the data is large, you'd probably want to use ExecuteReader with CommandBehavior.SequentialAccess and write it to a temporary file (Path.GetTempFileName()) - then process that file as a Stream with XmlReader.

Comments

0

I've abstracted the making of SqlParameter objects everywhere in the application already, so I'll scrub the input at that point. My abstraction method creates and returns a SqlParameter object to use in a stored procedure call. If it's a varchar the caller wants, I'll loop through each character of the string they want to make into a SqlParameter object and filter out those illegal binary XML characters. That will eliminate the bad data from entering the database in the first place.

Comments

0

How is your stored procedure produce the XML? If you use any of the FOR XML options in SQL Server, binary characters in text fields will be properly escaped:

CREATE TABLE test (
   id int identity(1,1) not null primary key, 
   data nvarchar(50))
INSERT INTO test (data) values (char(0))
SELECT * FROM test FOR XML RAW

produces:

<row ID="1" data="&#x0;" />

2 Comments

I am using "For Xml Explicit"
That shouldn't matter; FOR XML EXPLICIT properly escapes binary XML characters too.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.