2

I have an UTF-8 encoded xml

<?xml version="1.0" encoding="UTF-8"?>

When using below version of xml reader. I am assuming this uses UTF-8 enoding to parse xml file.

 using (XmlReader reader = XmlReader.Create(inputUri))

I am getting below exception.

System.Xml.XmlException occurred
  HResult=-2146232000
  LineNumber=18750
  LinePosition=13
  Message=Invalid character in the given encoding. Line 18750, position 13.

But when using below version of xmlreader

using (XmlReader reader = XmlReader.Create(new StreamReader(inputUri,Encoding.UTF8)))

The xml gets parsed successfully. Why such differences between these two versions given both uses same encoding to parse the given xml file??

PS: I am pretty much sure the first version uses UTF-8 endoding.

Below is the snippet from XmlTextReaderImpl.cs whose instance is returned by the first version.

        private void SetupEncoding( Encoding encoding ) {
            if ( encoding == null ) {
                Debug.Assert( ps.charPos == 0 );
                ps.encoding = Encoding.UTF8;
                ps.decoder = new SafeAsciiDecoder(); // This falls back to UTF-8 decoder
            }
}
2
  • Which .net version you are using? Commented Dec 19, 2016 at 9:20
  • .Net Framework 4.5 Commented Dec 19, 2016 at 9:38

2 Answers 2

3

I got the answer in msdn forum.

"XmlReader will mark any illegal character as illegal because the XML format is broken.

On the second case, because StreamReader is a general purpose Text reader, when it encounters data that is not within range defined by Encoding, it replace the character with a replacement fallback. And therefore when you pass the resulting stream to XmlReader, all characters it can see now falls in legal range defined by the encoding."

Sign up to request clarification or add additional context in comments.

Comments

0
using (XmlReader reader = XmlReader.Create(inputUri))

The above will use the encoding of the XmlReader and will ignore the encoding declaration of the file.

Which is why the exception occurs, and is why the second method works - as you provide a UTF-8 encoding to use.

N.B. I think that the default encoding is UTF-16

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.