6

I have an issue with parsing XML which has utf-16 encoding but it works perfectly fine with utf-8.
Can any help me out on this issue?.

I get the following error :

System.Web.HttpUnhandledException' was thrown.
System.Xml.XmlException: There is no Unicode byte order mark.
Cannot switch to Unicode

XML Header:

<?xml version="1.0" encoding="utf-16"?>
<RiskAssessmentRequestValue xmlns:xsd="http://www.w3.org/2001/XMLSchema" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">

C# CODE BEHIND :

    rptTransformedXml.DataSource = parser.ExtractData(xml);
    rptTransformedXml.DataBind();


    public List<XmlDataExtract> ExtractData(string xml)
    {
        MemoryStream stream = new MemoryStream(Encoding.ASCII.GetBytes(xml));
        return ExtractData(stream);
    }


    public List<XmlDataExtract> ExtractData(Stream xmlStream)
    {
        XmlReaderSettings settings = new XmlReaderSettings
                                         {
                                             IgnoreComments = true,
                                             IgnoreWhitespace = true,
                                             CloseInput = true
                                         };

        XmlReader reader = XmlReader.Create(xmlStream, settings);
        XmlPathBuilder pathBuilder = new XmlPathBuilder(reader);
        List<XmlDataExtract> xmlDataList = new List<XmlDataExtract>();

        while (reader.Read())
        {
            if (reader.NodeType == XmlNodeType.XmlDeclaration)
                continue;
            pathBuilder.Add();
            CollectAttributeData(reader, xmlDataList, pathBuilder);
            CollectElementData(reader, xmlDataList, pathBuilder);
        }
        return xmlDataList;
    }
3
  • Try changing Encoding.ASCII.GetBytes(xml) to Encoding.GetBytes(xml) Commented Oct 9, 2013 at 17:32
  • i don't think it works. Commented Oct 9, 2013 at 18:29
  • How do the bytes make it into xml in the first place? Commented Oct 9, 2013 at 18:52

1 Answer 1

4

You can create an encoder based on the encoding of the xml content :

string encoding = "UTF-8"; // should match encoding in XML
string xml = @"<?xml version='1.0' encoding='UTF-8'?><table><row>1</row></table>";

var ms = new MemoryStream(Encoding.GetEncoding(encoding).GetBytes(xml));

var xdrs = new XmlReaderSettings()
    {IgnoreComments = true,
    IgnoreWhitespace = true,
    CloseInput = true};

var xdr = XmlReader.Create(ms, xdrs);
while (xdr.Read())
 {
    Console.Write("qqq");
 }

For more information about encoding, there is a related question

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.