3

I have an xml sheet which contains some special character "& is the special character causing issues" and i use below code to deserialize XML

           XMLDATAMODEL imported_data;

            // Create an instance of the XmlSerializer specifying type and namespace.
            XmlSerializer serializer = new XmlSerializer(typeof(XMLDATAMODEL));

            // A FileStream is needed to read the XML document.
            FileStream fs = new FileStream(path, FileMode.Open);
            XmlReader reader = XmlReader.Create(fs);


            // Use the Deserialize method to restore the object's state.
            imported_data = (XMLDATAMODEL)serializer.Deserialize(reader);
            fs.Close();

and structre of my XML MOdel is like this

    [XmlRoot(ElementName = "XMLDATAMODEL")]
    public class XMLDATAMODEL
    {
        [XmlElement(ElementName = "EventName")]
        public string EventName { get; set; }
        [XmlElement(ElementName = "Location")]
        public string Location { get; set; }
    }

I tried this code as well with Encoding mentioned but no success

            // Declare an object variable of the type to be deserialized.

            StreamReader streamReader = new StreamReader(path, System.Text.Encoding.UTF8, true);
            XmlSerializer serializer = new XmlSerializer(typeof(XMLDATAMODEL));
            imported_data = (XMLDATAMODEL)serializer.Deserialize(streamReader);
            streamReader.Close();

Both approaches failed and if i put special character inside Cdata it looks working. How can i make it work for xml data without CData as well?

Here is my XML file content

http://pastebin.com/Cy7icrgS

And error i am getting is There is an error in XML document (2, 17).

10
  • What kind of special character that causes it to fail? For example < in the "inner text"? Commented Dec 28, 2015 at 8:28
  • NO & is causing issues Commented Dec 28, 2015 at 8:31
  • 1
    You should be entity-encoding the ampersands in the source data. Commented Dec 28, 2015 at 8:34
  • 1
    I see, I get better picture. But it would be best if you could put the XML file data itself, since it will be a lot easier to reproduce the error. Commented Dec 28, 2015 at 8:35
  • how was the serialization done, in that case? Commented Dec 28, 2015 at 8:41

1 Answer 1

9

The best answer I could get after looking around is, unless you serialize the data yourself, it will be pretty trouble some to deserialize XML will special characters.

For your case, since the special character is & before you can deserialize it, you should convert it to &amp; Unless the character & is converted to &amp; we cannot really deserialize it with XmlSerializer. Yes, we still can read it by using

XmlReaderSettings settings = new XmlReaderSettings();
settings.CheckCharacters = false; //not to check false character, this setting can be set.
FileStream fs = new FileStream(xmlfolder + "\\xmltest.xml", FileMode.Open);
XmlReader reader = XmlReader.Create(fs, settings);

But we cannot deserialize it.

As how to convert & to &amp;, there are various ways with plus and minus. But the bottom line in all conversion is, do not use stream directly. Just take the data from the file and convert it to string by using, for example, File.ReadAllText and start doing the string processing. After that, convert it to MemoryStream and start the deserialization;

And now for the string processing before deserialization, there are couple of ways to do it.

The easiest, and most of the time could be the most unsafe, would be by using string.Replace("&", "&amp;").

The other way, harder but safer, is by using Regex. Since your case is something inside CData, this could be a good way too.

Another way harder yet safer, by creating your parsing for line by line.

I have yet to find what is the common, safe, way for this conversion.

But as for your example, the string.Replace would work. Also, you could potentially exploit the pattern (something inside CData) to use Regex. This could be a good way too.

Edit:

As for what are considered as special characters in XML and how to process them before hand, according to this, non-Roman characters are included.

Apart from the non-Roman characters, in here, there are 5 special characters listed:

<   ->  &lt;
>   ->  &gt;
"   ->  &quot;
'   ->  &apos;
&   ->  &amp;

And from here, we get one more:

%   -> &#37;

Hope they can help you!

Sign up to request clarification or add additional context in comments.

3 Comments

I really appreciate the effort to explain the solution. Now i suspect i will face the same issue with some other special characters as well. Is there any other special characters that we need to take care about ?
I cannot guarantee that I know the complete list of special characters since I never encounter such problem myself. Somemore, according to stackoverflow.com/questions/4899872/… even non-Roman alphabets are special characters! But apart from them, you can take a look on this weblogs.sqlteam.com/mladenp/archive/2008/10/21/… there are at least 5 special characters: <, >, ', ", &. Take a look on the article as to know how to escape from each of them.
And take a look on this: technet.microsoft.com/en-us/library/ms145315%28v=sql.90%29.aspx there is one more: %

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.