0

Handling CDATA with lxml involves making parser with suitable declaration, but how about XSLT? For example:

from lxml import etree

parser = etree.XMLParser(strip_cdata=False)
tree = etree.parse('sample_with_cdata.xml', parser)
transform = etree.XSLT(etree.parse('dupe.xsl'))
xml_out = transform(tree)
xml_out.write('processed.xml')

If I process xml file with CDATA through lxml XSLT processor, all CDATA is stripped. How can I tell XSLT processor to leave CDATA as is?

PS. FYI, adding same parser to etree.XSLT doesn't change outcome

2 Answers 2

1

As far as XSLT is concerned, CDATA sections in XML are just noise. XSLT treats <![CDATA["]]> the same as &quot; which it treats the same as "; they are different ways for the document author to write the same thing.

If you are using CDATA sections in your input to convey information, that is if <![CDATA[xxx]]> means something different from xxx, then you need to change your XML design.

Sign up to request clarification or add additional context in comments.

1 Comment

My XML is RSS, and description element contains CDATA for basic formatting purposes.
1

This doesn't seem to be related to lxml. It's my lack of knowledge...

CDATA in XSLT should be handled with "cdata-section-elements" attribute in output declaration. For example, if description element in XML file contains CDATA:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" cdata-section-elements='description' />
...

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.