Consider running two XSLT transformations where first builds another XSLT with all elements defined in cdata-section-elements and scripted Identity Transform. And second XSLT processes original source using resulting script from first with CData dynamically defined for every text node.
XSLT script is borrowed from guru @DimitreNovatchev's answer here. With Python, you can simply transfer the result from first transformation directly into second all in memory without saving anything to disk. See below with demo example of top StackOverflow users of XSLT and Python:
Input XML (no indent or line breaks)
<?xml version="1.0"?><stackoverflow> <group lang="python"><topusers>
<user>Martijn Pieters</user> <link>https://stackoverflow.com/users
/100297/martijn-pieters</link> <location>Cambridge, United Kingdom
</location> <year_rep>14,102</year_rep> <total_rep>624,972</total_rep>
<tag1>python</tag1> <tag2>python-3.x</tag2> <tag3>python-2.7</tag3>
</topusers><topusers> <user>Alex Martelli</user>
<link>https://stackoverflow.com/users/95810/alex-martelli</link>
<location>Sunnyvale, CA</location> <year_rep>10,292</year_rep>
<total_rep>565,346</total_rep> <tag1>python</tag1> <tag2>list</tag2>
<tag3>c++</tag3></topusers><topusers> <user>unutbu</user>
<link>https://stackoverflow.com/users/190597/unutbu</link> <location/>
<year_rep>11,788</year_rep> <total_rep>482,061</total_rep>
<tag1>python</tag1> <tag2>pandas</tag2> <tag3>numpy</tag3></topusers>
</group> <group lang="xslt"><topusers> <user>Dimitre Novatchev</user>
<link>https://stackoverflow.com/users/36305/dimitre-novatchev</link>
<location>United States</location> <year_rep>2,028</year_rep>
<total_rep>201,945</total_rep> <tag1>xslt</tag1> <tag2>xml</tag2>
<tag3>xpath</tag3></topusers><topusers> <user>Martin Honnen</user>
<link>https://stackoverflow.com/users/252228/martin-honnen</link>
<location>Germany</location> <year_rep>2,463</year_rep>
<total_rep>99,292</total_rep> <tag1>xslt</tag1> <tag2>xml</tag2>
<tag3>xpath</tag3></topusers><topusers> <user>Michael Kay</user>
<link>https://stackoverflow.com/users/415448/michael-kay</link>
<location>Reading, United Kingdom </location> <year_rep>2,256</year_rep>
<total_rep>97,620</total_rep> <tag1>xml</tag1> <tag2>xslt</tag2>
<tag3>xpath</tag3></topusers> </group></stackoverflow>
Python (no pretty_print or tostring needed)
from lxml import etree
import sys
import re
from datetime import datetime
start_time = datetime.now()
# get input file
infile = sys.argv[1]
outfile = infile[0:infile.rindex(".")]+".trns.xml"
# get XSLT file, if it exists, else use identity transform
xsl = ''
if len(sys.argv) > 2:
xsl = etree.parse(sys.argv[2])
else:
# CREDIT: Dimitre Novatchev - https://stackoverflow.com/a/15697496/1422451
xslstr ='''<xsl:stylesheet version="1.0" xmlns:x="http://www.w3.org/1999/XSL/Transform"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xxx="xxx">
<xsl:namespace-alias stylesheet-prefix="xxx" result-prefix="xsl"/>
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:key name="kElemByName" match="*[text()[normalize-space()]]" use="name()"/>
<xsl:variable name="vDistinctNamedElems" select=
"//*[generate-id()=generate-id(key('kElemByName',name())[1])]"/>
<xsl:variable name="vDistinctNames">
<xsl:for-each select="$vDistinctNamedElems">
<xsl:value-of select="concat(name(), ' ')"/>
</xsl:for-each>
</xsl:variable>
<xsl:template match="node()|@*">
<xxx:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xxx:output omit-xml-declaration="yes" indent="yes"
cdata-section-elements="{$vDistinctNames}"/>
<xxx:strip-space elements="*"/>
<xxx:template match="node()|@*">
<xxx:copy>
<xxx:apply-templates select="node()|@*"/>
</xxx:copy>
</xxx:template>
</xxx:stylesheet>
</xsl:template>
</xsl:stylesheet>'''
parser = etree.XMLParser(huge_tree=True, strip_cdata=False)
# transform 1: build a new xslt script with cdata elems defined
xslt = etree.XML(xslstr)
transform_function = etree.XSLT(xslt)
transformed_1 = transform_function(etree.parse(infile, parser))
# transform 2: modify source with new xslt
transform_function = etree.XSLT(transformed_1)
transformed_2 = transform_function(etree.parse(infile, parser))
# write to output
with open(outfile, 'wb') as f:
f.write(transformed_2)
Output XML (where commenter above, Michael Kay, is included)
<stackoverflow>
<group lang="python">
<topusers>
<user><![CDATA[Martijn Pieters]]></user>
<link><![CDATA[https://stackoverflow.com/users/100297/martijn-pieters]]></link>
<location><![CDATA[Cambridge, United Kingdom ]]></location>
<year_rep><![CDATA[14,102]]></year_rep>
<total_rep><![CDATA[624,972]]></total_rep>
<tag1><![CDATA[python]]></tag1>
<tag2><![CDATA[python-3.x]]></tag2>
<tag3><![CDATA[python-2.7]]></tag3>
</topusers>
<topusers>
<user><![CDATA[Alex Martelli]]></user>
<link><![CDATA[https://stackoverflow.com/users/95810/alex-martelli]]></link>
<location><![CDATA[Sunnyvale, CA]]></location>
<year_rep><![CDATA[10,292]]></year_rep>
<total_rep><![CDATA[565,346]]></total_rep>
<tag1><![CDATA[python]]></tag1>
<tag2><![CDATA[list]]></tag2>
<tag3><![CDATA[c++]]></tag3>
</topusers>
<topusers>
<user><![CDATA[unutbu]]></user>
<link><![CDATA[https://stackoverflow.com/users/190597/unutbu]]></link>
<location/>
<year_rep><![CDATA[11,788]]></year_rep>
<total_rep><![CDATA[482,061]]></total_rep>
<tag1><![CDATA[python]]></tag1>
<tag2><![CDATA[pandas]]></tag2>
<tag3><![CDATA[numpy]]></tag3>
</topusers>
</group>
<group lang="xslt">
<topusers>
<user><![CDATA[Dimitre Novatchev]]></user>
<link><![CDATA[https://stackoverflow.com/users/36305/dimitre-novatchev]]></link>
<location><![CDATA[United States]]></location>
<year_rep><![CDATA[2,028]]></year_rep>
<total_rep><![CDATA[201,945]]></total_rep>
<tag1><![CDATA[xslt]]></tag1>
<tag2><![CDATA[xml]]></tag2>
<tag3><![CDATA[xpath]]></tag3>
</topusers>
<topusers>
<user><![CDATA[Martin Honnen]]></user>
<link><![CDATA[https://stackoverflow.com/users/252228/martin-honnen]]></link>
<location><![CDATA[Germany]]></location>
<year_rep><![CDATA[2,463]]></year_rep>
<total_rep><![CDATA[99,292]]></total_rep>
<tag1><![CDATA[xslt]]></tag1>
<tag2><![CDATA[xml]]></tag2>
<tag3><![CDATA[xpath]]></tag3>
</topusers>
<topusers>
<user><![CDATA[Michael Kay]]></user>
<link><![CDATA[https://stackoverflow.com/users/415448/michael-kay]]></link>
<location><![CDATA[Reading, United Kingdom ]]></location>
<year_rep><![CDATA[2,256]]></year_rep>
<total_rep><![CDATA[97,620]]></total_rep>
<tag1><![CDATA[xml]]></tag1>
<tag2><![CDATA[xslt]]></tag2>
<tag3><![CDATA[xpath]]></tag3>
</topusers>
</group>
</stackoverflow>
<a><![CDATA[<]]></a> and<a><</a>` are just different ways of inputting the same data; the user of the data shouldn't care about the detailed keystrokes used to input it. Unfortunately XML doesn't define a standard data model, but the model used by XPath and XSLT is pretty widely accepted.