1

When I'm manipulating XML parsed with the Python LXML module (specifically lxml.objectify, but I don't think it makes a difference), how can I preserve CDATA?

Given the following results:

>>> from lxml import objectify, etree
>>> xml = '''
  <Root>
   <Child>
    <![CDATA[abcd]]>
   </Child>
  </Root>
  '''
>>> parser = objectify.makeparser(strip_cdata=False)
>>> parsed = objectify.XML(xml, parser=parser)
>>> etree.tostring(parsed)
'<Root><Child><![CDATA[abcd]]></Child></Root>'
>>> type(parsed.Child)
<type 'lxml.objectify.StringElement'>
>>> parsed.Child.text
'abcd'
>>> parsed.Child = 'efgh'
>>> etree.tostring(parsed)
'<Root><Child xmlns:py="http://codespeak.net/lxml/objectify/pytype" py:pytype="str">efgh</Child></Root>'

I'd like that last line to still have the <![CDATA[....]>. But I can't see any way of either preserving it or recreating it. Attempts to access the content of the <Child> element produce a bare string, and modifying the content of that element magically disappears the CDATA section.

What's the right way of doing this?

2
  • note that <![CDATA[x]]> and x are the same if they don't contain xml-specific characters. Why do you need it to say CDATA? Commented Jun 6, 2016 at 15:58
  • 1
    Because it feeds legacy software that cats about the difference. Commented Jun 6, 2016 at 16:06

1 Answer 1

2
>>> from lxml import etree
>>> parser = etree.XMLParser(strip_cdata=False)
>>> parsed = etree.XML('''
...   <Root>
...    <Child>
...     <![CDATA[abcd]]>
...    </Child>
...   </Root>
... ''', parser)
>>> print etree.tostring(parsed)
<Root>
   <Child>
    <![CDATA[abcd]]>
   </Child>
  </Root>
>>> parsed.getchildren()[0].text = etree.CDATA('efgh updated')
>>> etree.tostring(parsed)
'<Root>\n   <Child><![CDATA[efgh updated]]></Child>\n  </Root>'
>>>
Sign up to request clarification or add additional context in comments.

3 Comments

No good - this produces '<Root><Child>&lt;lxml.etree.CDATA object at 0x7f3ab54a9e70&gt;</Child></Root>'.
it could be a thing with the objectify library. I've updated the answer using just the parser.
Yes, this is what I've ended up doing.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.