14

I'm using Python's xml.etree.ElementTree to do some XML parsing on a file. However, I get this error mid-way through the document:

xml.parsers.expat.ExpatError: not well-formed (invalid token): line X, column Y

So I go to line X, column Y in vim and I see an ampersand (&) with red background highlighting. What does this mean?

Also the two characters preceding it are >>, so maybe there's something special about >>&?

Anyone know how to fix this?

1
  • I received this error message because my input file had a UTF-8 Byte order mark. Stripping the first three bytes off the input resolved the issue. (Obviously not applicable to the OP's question, but this page was still the first search hit, so perhaps this is helpful to a future visitor.) Commented Aug 20, 2020 at 12:32

3 Answers 3

18

The & is a special character in XML, used for character entities. If your XML has & sitting there by itself, not as part of an entity like & or ѐ or the like, then the XML is invalid.

Sign up to request clarification or add additional context in comments.

5 Comments

I think the issue might be that I have a multi-line (string) element. Essentially for this one element I did a grep (regex) | head -5, to get back 5 lines, then stuck it in the file as an xml element. Would I be better off making 5 separate elements somehow?
It's not a matter of how many elements there are in it, it's a matter of what characters are in it. You just can't put the & character in an XML document by itself. You need to escape it by replacing it with &.
<element>some text & that character</element> is no good you're saying? Also I'm reading in these lines from many different files, so I'm not sure how I could automatically escape them (read in from a bash script using grep and then outputted to a file)
Right, that is invalid XML. Somehow or other you need to escape the & if you want it in the file. You will encounter similar problems if your data contains < or > characters. One possibility is to just do a simple replace of & with &amp;. If that doesn't meet your needs you'll need to google around for info on how to handle XML escaping.
Thanks for telling me about this stuff
0

You can use the escape function found in the xml module

from xml.sax.saxutils import escape

my_string = "Some string with an &"

# If the string contains &, <, or > they will be converted.
print(escape(my_string))

# Above will return: Some string with an &amp;

Reference: Escaping strings for use in XML

2 Comments

I get this error when I try the above: "TypeError: a bytes-like object is required, not 'str'"
Sounds like it might be a difference in py2 and py3 encoding.
-1

I solve it by using yattag instead

from yattag import indent
print indent(xml_string.encode('utf-8'))

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.