5

I'm serializing an object that contains HTML data in a String Property.

Dim Formatter As New Xml.Serialization.XmlSerializer(GetType(MyObject))
Dim fs As New FileStream(FilePath, FileMode.Create)
Formatter.Serialize(fs, Ob)
fs.Close()

But when I'm reading the XML back to the Object:

Dim Formatter As New Xml.Serialization.XmlSerializer(GetType(MyObject))
Dim fs As New FileStream(FilePath, FileMode.Open)
Dim Ob = CType(Formatter.Deserialize(fs), MyObject)
fs.Close()

I get this error:

"'', hexadecimal value 0x14, is an invalid character. Line 395, position 22."

Shouldn't .NET prevent this kind of error, escaping the invalid characters?

What's happening here and how can I fix it?

0

4 Answers 4

7

I set the XmlReaderSettings property CheckCharacters to false. I would only advise doing this if you have serialized the data yourself via XmlSerializer. If it's from an unknown source then it's not really a good idea.

public static T Deserialize<T>(string xml)
{
    var xmlReaderSettings = new XmlReaderSettings() { CheckCharacters = false };

    XmlReader xmlReader = XmlTextReader.Create(new StringReader(xml), xmlReaderSettings);
    XmlSerializer xs = new XmlSerializer(typeof(T));

    return (T)xs.Deserialize(xmlReader);
}
Sign up to request clarification or add additional context in comments.

3 Comments

-1: don't use XmlTextReader.Create. Use XmlReader.Create.
CheckCharacters = false is exactly what I needed to know about. Thanks!
Great! It finally works. You can pass XmlReaderSettings to the XmlSerializer as well. That's what I needed to do.
2

It should really have failed in the serialize step, because 0x14 is an invalid value for XML. There is no way to escape it, not even with &#x14, since it is excluded as a valid character from the XML model. I am actually surprised that the serializer lets this through, as it makes the serializer a non-conforming one.

Is it possible for you to remove the invalid characters from the string before serializing it? For what purpose do you have an 0x14 in HTML?

Or, is it possible you are writing with one encoding, and reading with a different one?

5 Comments

Well, I've gone with this solution. I've removed the invalid chars from the String before Serializing. But, I still don't understand why doesn't XmlSerializer Deserialize an object that has Serialized.
You're in good shape, unless the invalid characters were actually important.
I found a more comprehensive description of this problem here: seattlesoftware.wordpress.com/2008/09/11/…
Yeah in my case, the invalid character is important. How about that?
While 0x14 is an illegal character in an XML document, the encoded string representation &#14; is perfectly valid - it consists completely of legal characters. It's called a "numeric character reference".
1

You should really post the code of the class you're trying to serialize and deserialize. In the meantime, I'll make a guess.

Most likely, the invalid character is in a field or property of type string. You will need to serialize that as an array of bytes, assuming you can't avoid having that character present at all:

[XmlRoot("root")]
public class HasBase64Content
{
    internal HasBase64Content()
    {
    }

    [XmlIgnore]
    public string Content { get; set; }

    [XmlElement]
    public byte[] Base64Content
    {
        get
        {
            return System.Text.Encoding.UTF8.GetBytes(Content);
        }
        set
        {
            if (value == null)
            {
                Content = null;
                return;
            }

            Content = System.Text.Encoding.UTF8.GetString(value);
        }
    }
}

This produces XML like the following:

<?xml version="1.0" encoding="utf-8"?>
<root xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
      xmlns:xsd="http://www.w3.org/2001/XMLSchema">
    <Base64Content>AAECAwQFFA==</Base64Content>
</root>

I see you'd probably prefer VB.NET:

''# Prettify doesn't like attributes as the first item in a VB code block, so this comment is here so that it looks right on StackOverflow.
<XmlRoot("root")> _
Public Class HasBase64Content

    Private _content As String
    <XmlIgnore()> _
    Public Property Content() As String
        Get
            Return _content
        End Get
        Set(ByVal value As String)
            _content = value
        End Set
    End Property

    <XmlElement()> _
    Public Property Base64Content() As Byte()
        Get
            Return System.Text.Encoding.UTF8.GetBytes(Content)
        End Get
        Set(ByVal value As Byte())
            If Value Is Nothing Then
                Content = Nothing
                Return
            End If
            Content = System.Text.Encoding.UTF8.GetString(Value)
        End Set
    End Property
End Class

6 Comments

Hi John. The problem here is not Serializing an Object with invalid characters. The problem is why Xml.Serialization.XmlSerializer doesn't escape the invalid characters when Serializing.
Depending on what he's serializing, it's probably not supposed to escape it. He needs to show what he is serializing.
BTW, DK39, check my profile. I'm a bit of an expert in this area. It's not about escaping.
OK, but I still don't understand why XmlSerializer doesn't Deserialize an object with a String that himself has Serialized.
It might very well be a bug. Maybe it should have failed to serialize it. It doesn't matter - it won't be fixed. The question is what's the right way to always be able to get arbitrary strings that don't fit the XML definition of a string serialized and deserialized. The answer is above.
|
0

I would exepct .NET to handle this, but you can also have look at XmlSerializer class and XmlReaderSettings (see sample generic method below):

public static T Deserialize<T>(string xml)
{
    var xmlReaderSettings = new XmlReaderSettings()
                                {
                                    ConformanceLevel = ConformanceLevel.Fragment,
                                    ValidationType = ValidationType.None
                                };

    XmlReader xmlReader = XmlTextReader.Create(new StringReader(xml), xmlReaderSettings);
    XmlSerializer xs = new XmlSerializer(typeof(T), "");

    return (T)xs.Deserialize(xmlReader);
}

I would also check if there are no encoding (Unicode, UTF8, etc.) issues in your code. Hexadecimal value 0x14 is not a char you would expect in XML :)

4 Comments

-1 for: no using blocks, using XmlTextReader, suggesting a solution without knowing the problem.
What's the issue not using 'using' blocks?
Resource leaks. Both XmlReader and StringReader implement IDisposable.
You're righ John, thanks. However seems you did not know the problem either and yet you tried to force your solution on DK39. And btw. voting my answer down to get yours higher seems soooo lame :-P

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.