10

I have an object that I am serializing to xml. It appears that a value in one of the properties contains the hex character 0x1E. I've tried setting The Encoding property of XmlWriterSettings to both "utf-16" and "unicode" but I still get an exception thrown:

here was an error generating the XML document. ---> System.InvalidOperationException: There was an error generating the XML document. ---> System.ArgumentException: '', hexadecimal value 0x1E, is an invalid character.

Is there any way to get these characters into the xml? If not, are there other characters that will cause problems?

2
  • Please show some code that reproduces the problem. How can we help you when we don't know what you're doing to cause the problem? Commented Oct 30, 2009 at 1:26
  • 2
    Set XmlWriterSettings.CheckCharacters to false. That will allow writing illegal XML characters to the document without throwing an exception. With that flag disabled, the writer automatically escapes illegal characters in the appropriate places (e.g. different escaping in attributes) as of .Net 2.0. Commented Mar 18, 2022 at 13:12

6 Answers 6

7

The XML Recommendation (aka spec) http://www.w3.org/TR/2000/REC-xml-20001006 outlines which characters are not allowed and must be escaped


2.2 Characters

[Definition: A parsed entity contains text, a sequence of characters, which may represent markup or character data.] [Definition: A character is an atomic unit of text as specified by ISO/IEC 10646 [ISO/IEC 10646] (see also [ISO/IEC 10646-2000]). Legal characters are tab, carriage return, line feed, and the legal characters of Unicode and ISO/IEC 10646. The versions of these standards cited in A.1 Normative References were current at the time this document was prepared. New characters may be added to these standards by amendments or new editions. Consequently, XML processors must accept any character in the range specified for Char. The use of "compatibility characters", as defined in section 6.8 of [Unicode] (see also D21 in section 3.6 of [Unicode3]), is discouraged.]

Character Range

[2]     Char       ::=      #x9 | #xA | #xD | [#x20-#xD7FF] |
            [#xE000-#xFFFD] | [#x10000-#x10FFFF]    
     /* any Unicode character, excluding the surrogate blocks, FFFE, and FFFF. */

The mechanism for encoding character code points into bit patterns may vary from entity to entity. All XML processors must accept the UTF-8 and UTF-16 encodings of 10646; the mechanisms for signaling which of the two is in use, or for bringing other encodings into play, are discussed later, in 4.3.3 Character Encoding in Entities.


Sign up to request clarification or add additional context in comments.

Comments

3

You can escape then like you would for HTML. 0x1E is the same as decimal 30 so just replace your record separator character with the string, "&30;" and it should be OK.

1 Comment

Hey... I tried to use it but browsers and XML readers still repost it as an invalid character. For example, this XML is invalid: <aaa>bbb&#30;ccc</aaa>
3

i know this is an old question , but i found a link and iam posting it here , it will be useful to who come across this question. It worked for me.

http://seattlesoftware.wordpress.com/2008/09/11/hexadecimal-value-0-is-an-invalid-character/

and code from that site.(in case if the site goes down)

/// <summary>
/// Remove illegal XML characters from a string.
/// </summary>
public string SanitizeXmlString(string xml)
{
if (xml == null)
{
    throw new ArgumentNullException("xml");
}

StringBuilder buffer = new StringBuilder(xml.Length);

foreach (char c in xml)
{
    if (IsLegalXmlChar(c))
    {
        buffer.Append(c);
    }
}

return buffer.ToString();
}

/// <summary>
/// Whether a given character is allowed by XML 1.0.
/// </summary>
public bool IsLegalXmlChar(int character)
{
return
(
     character == 0x9 /* == '\t' == 9   */          ||
     character == 0xA /* == '\n' == 10  */          ||
     character == 0xD /* == '\r' == 13  */          ||
    (character >= 0x20    && character <= 0xD7FF  ) ||
    (character >= 0xE000  && character <= 0xFFFD  ) ||
    (character >= 0x10000 && character <= 0x10FFFF)
);
}

1 Comment

Useful, but that only removes the offending characters. It does not provide a way to include them in XML.
1

XML is a human-readable format and non-printable control characters are forbidden. You can use decimal character entity codes like &#30; to represent them, or base-64 encode the content.

Comments

1

Since you didn't give any details, I'm going to guess that your property is of type System.String. If so, then you cannot serialize it as-is. Instead, you must serialize it as a byte[]:

[XmlRoot("root")]
public class HasBase64Content
{
    [XmlIgnore]
    public string Content { get; set; }

    [XmlElement("Content")]
    public byte[] Base64Content
    {
        get
        {
            return System.Text.Encoding.UTF8.GetBytes(Content);
        }
        set
        {
            if (value == null)
            {
                Content = null;
                return;
            }

            Content = System.Text.Encoding.UTF8.GetString(value);
        }
    }
}

Comments

1

If your data does not allow characters from the Unicode Control Picture block, you can maintain human readability by substituting them for control characters upon serialization and back again upon deserialization.

Below are the characters:

␀ ␁ ␂ ␃ ␄ ␅ ␆ ␇ ␈ ␉ ␊ ␋ ␌ ␍ ␎ ␏

␐ ␑ ␒ ␓ ␔ ␕ ␖ ␗ ␘ ␙ ␚ ␛ ␜ ␝ ␞ ␟

␠ ␡

Hopefully, they render in your browser and editors. Even if they don't, they are legal in XML.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.