3

I'm trying to pull out an escape noded from an XML document. The raw text for the node looks like this:

<Notes>{&quot;Phase&quot;: 0, &quot;Flipper&quot;: 0, &quot;Guide&quot;: 0,     
&quot;Sample&quot;: 0, &quot;Triangle8&quot;: 0, &quot;Triangle5&quot;: 0,     
&quot;Triangle4&quot;: 0, &quot;Triangle7&quot;: 0, &quot;Triangle6&quot;: 0,     
&quot;Triangle1&quot;: 0, &quot;Triangle3&quot;: 0, &quot;Triangle2&quot;: 0}</Notes> 

I'm pulling the text out as follows:

infile = ET.parse("C:/userfiles/EXP011/SESAME_60/SESAME_60_runinfo.xml")
r = infile.getroot()
XMLNS = "{http://example.com/foo/bar/runinfo_v4_3}"
x=r.find(".//"+XMLNS+"Notes")
print(x.text)

I expected to get:

{"Phase": 0, "Flipper": 0, "Guide&quot": 0,     
"Sample": 0, "Triangle8": 0, "Triangle5": 0,     
"Triangle4": 0, "Triangle7": 0, "Triangle6": 0,     
"Triangle1": 0, "Triangle3": 0, "Triangle2": 0}

but, instead, I got:

 {&quot;Phase&quot;: 0, &quot;Flipper&quot;: 0, &quot;Guide&quot;: 0,      
 &quot;Sample&quot;: 0, &quot;Triangle8&quot;: 0, &quot;Triangle5&quot;: 0,   
 &quot;Triangle4&quot;: 0, &quot;Triangle7&quot;: 0, &quot;Triangle6&quot;: 0, 
 &quot;Triangle1&quot;: 0, &quot;Triangle3&quot;: 0, &quot;Triangle2&quot;: 0}

How do I get the unescaped string?

1
  • 1
    ElementTree does not unescape &quot; because you normally don't need to escape " in XML. My answer was wrong for the same reasons. Commented Sep 10, 2012 at 17:31

3 Answers 3

8

Use HTMLParser.HTMLParser():

In [8]: import HTMLParser    

In [11]: HTMLParser.HTMLParser().unescape('&quot;')
Out[11]: u'"'

saxutils handles &lt;, &gt; and &amp;, but it does not handle &quot;.

In [9]: import xml.sax.saxutils as saxutils

In [10]: saxutils.unescape('&quot;')
Out[10]: '&quot;'    
Sign up to request clarification or add additional context in comments.

2 Comments

Quite correct; " does not need to be quoted in XML, so the saxutils module doesn't deal with that (just like ElementTree).
Thank you. That did it. Someday I'll have to talk with the server devs and find out why the server is escaping the quotes in the first place.
5

Since python 3.4 you can use html.unescape.

>>> from html import unescape
>>> unescape('&quot;')
'"'

Comments

1

I have not managed to use escape for &quot; in Python 2.7.5 for some reason but I found a workaround to get " instead of &quot; in XML file using the replace function as below:

with open(xmlfilename, 'w') as f:
     f.write(myxml.toprettyxml().replace("&quot;",'"'))

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.