I'm unable to flatten and transform an XML to CSV file using XSLT, when dealing with large XML files.
Currently, I'm parsing a nested XML file with lxml using a XSL file to flatten the output and then I write the output to a CSV file.
My XML looks something like this:
<root>
<level1>
<level2>
<topid>1</topid>
<level3>
<subtopid>1</topid>
<level4>
<subid>1</id>
<descr>test</descr>
</level4>
<level4>
<subid>2</id>
<descr>test2</descr>
</level4>
...
</level3>
...
</level2>
</level1>
</root>
I want to end up with the following CSV file:
topid,subtopid,subid,descr
1,1,1,test
1,1,2,test2
....
My XSLT:
<?xml version="1.0" encoding="UTF-8" ?>
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text" encoding="utf-8" use-character-maps="map"/>
<xsl:character-map name="map">
<xsl:output-character character="," string=" "/>
</xsl:character-map>
<xsl:strip-space elements="*"/>
<xsl:variable name="delimiter" select="','"/>
<xsl:variable name="newline" select="'
'" />
<xsl:template match="/root">
<xsl:text>topid,subtopid,subid,descr</xsl:text>
<xsl:value-of select="$newline" />
<xsl:for-each select="level1/level2/level3/level4">
<xsl:value-of select="ancestor::root/level1/level2/topid" />
<xsl:value-of select="$delimiter" />
<xsl:value-of select="ancestor::root/level1/level2/level3/subtopid" />
<xsl:value-of select="$delimiter" />
<xsl:value-of select="subid" />
<xsl:value-of select="$delimiter" />
<xsl:value-of select="descr" />
<xsl:value-of select="$newline" />
</xsl:for-each>
</xsl:template>
My Python code:
import lxml.etree as ET
xsltfile = ET.XSLT(ET.parse('transactions.xsl'))
xmlfile = ET.parse('myxmlfile.xml')
output = xsltfile(xmlfile).write_output('output.csv')
This works great for small files, but now I want to do the same with an XML file of +- 2.5gb. Using etree.parse will load it into memory, which won't work with larger files obviously.
I want to iterate somewhere, so I'm not loading the XML file into memory and write to CSV line for line, while still making use of the XSLT for transforming. I'm using the XSLT file because it's the only way I know (now) how to flatten a nested XML file.