2

Sharing Sample XML file. Need to convert this fie to CSV, even if extra tags are added in this file. {without using tag names}. And XML file tag names should be used as column names while converting it to CSV}

Example Data:

<?xml version="1.0" encoding="UTF-8"?>

<Food>
    <Info>
        <Msg>Food Store items.</Msg>
    </Info>

    <store slNo="1">
        <foodItem>meat</foodItem>
        <price>200</price>
        <quantity>1kg</quantity>
        <discount>7%</discount>
    </store>

    <store slNo="2">
        <foodItem>fish</foodItem>
        <price>150</price>
        <quantity>1kg</quantity>
        <discount>5%</discount>
    </store>

    <store slNo="3">
        <foodItem>egg</foodItem>
        <price>100</price>
        <quantity>50 pieces</quantity>
        <discount>5%</discount>
    </store>

    <store slNo="4">
        <foodItem>milk</foodItem>
        <price>50</price>
        <quantity>1 litre</quantity>
        <discount>3%</discount>
    </store>

</Food>

Tried Below code but getting error with same.

import xml.etree.ElementTree as ET
import pandas as pd

ifilepath = r'C:\DATA_DIR\feeds\test\sample.xml'
ofilepath = r'C:\DATA_DIR\feeds\test\sample.csv'
root = ET.parse(ifilepath).getroot()

print(root)
with open(ofilepath, "w") as file:
    for child in root:
        print(child.tag, child.attrib)
        # naive example how you could save to csv line wise
        file.write(child.tag+";"+child.attrib)

Above code is able to find root node, but unable to concatenate its attributes though

Tried one more code, but this works for 1 level nested XML, who about getting 3-4 nested tags in same XML file. And currently able to print values of all tags and their text. need to convert these into relational model { CSV file}

import xml.etree.ElementTree as ET

tree = ET.parse(ifilepath)
root = tree.getroot()
for member in root.findall('*'):
    print(member.tag,member.attrib)
    for i in (member.findall('*')):
        print(i.tag,i.text)

Above example works well with pandas read_xml { using lxml parser}

But when we try to use the similar way out for below XML data, it doesn't produce indicator ID value and Country ID value as output in CSV file

Example Data ::

<?xml version="1.0" encoding="UTF-8"?>
<du:data xmlns:du="http://www.dummytest.org" page="1" pages="200" per_page="20" total="1400" sourceid="5" sourcename="Dummy ID Test" lastupdated="2022-01-01">
   <du:data>
      <du:indicator id="AA.BB">various, tests</du:indicator>
      <du:country id="MM">test again</du:country>
      <du:date>2021</du:date>
      <du:value>1234567</du:value>
      <du:unit />
      <du:obs_status />
      <du:decimal>0</du:decimal>
   </du:data>
   <du:data>
      <du:indicator id="XX.YY">testing, cases</du:indicator>
      <du:country id="DD">coverage test</du:country>
      <du:date>2020</du:date>
      <du:value>3456223</du:value>
      <du:unit />
      <du:obs_status />
      <du:decimal>0</du:decimal>
   </du:data>
</du:data>

Solution Tried ::

import pandas as pd
    
pd.read_xml(ifilepath, xpath='.//du:data', namespaces= {"du": "http://www.dummytest.org"}).to_csv(ofilepath, sep=',', index=None, header=True)

Output Got ::

indicator,country,date,value,unit,obs_status,decimal
"various, tests",test again,2021,1234567,,,0
"testing, cases",coverage test,2020,3456223,,,0

Expected output ::

indicator id,indicator,country id,country,date,value,unit,obs_status,decimal
AA.BB,"various, tests",MM,test again,2021,1234567,,,0
XX.YY,"testing, cases",DD,coverage test,2020,3456223,,,0

Adding Example data , having usage of 2 or more xpath's. Looking for ways to convert the same using pandas to_csv()

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type='text/xsl'?>
<CATALOG>
    <PLANT>
    <COMMON>rose</COMMON>
    <BOTANICAL>canadensis</BOTANICAL>
    <ZONE>4</ZONE>
    <LIGHT>Shady</LIGHT>
    <PRICE>202</PRICE>
    <AVAILABILITY>446</AVAILABILITY>
    </PLANT>
    <PLANT>
    <COMMON>mango</COMMON>
    <BOTANICAL>sunny</BOTANICAL>
    <ZONE>3</ZONE>
    <LIGHT>shady</LIGHT>
    <PRICE>301</PRICE>
    <AVAILABILITY>569</AVAILABILITY>
    </PLANT>
    <PLANT>
    <COMMON>Marigold</COMMON>
    <BOTANICAL>palustris</BOTANICAL>
    <ZONE>4</ZONE>
    <LIGHT>Sunny</LIGHT>
    <PRICE>500</PRICE>
    <AVAILABILITY>799</AVAILABILITY>
    </PLANT>
    <PLANT>
    <COMMON>carrot</COMMON>
    <BOTANICAL>Caltha</BOTANICAL>
    <ZONE>4</ZONE>
    <LIGHT>sunny</LIGHT>
    <PRICE>205</PRICE>
    <AVAILABILITY>679</AVAILABILITY>
    </PLANT>
    <FOOD>
    <NAME>daal fry</NAME>
    <PRICE>300</PRICE>
    <DESCRIPTION>
    Famous daal tadka from surat
    </DESCRIPTION>
    <CALORIES>60</CALORIES>
    </FOOD>
    <FOOD>
    <NAME>Dhosa</NAME>
    <PRICE>350</PRICE>
    <DESCRIPTION>
    The famous south indian dish
    </DESCRIPTION>
    <CALORIES>80</CALORIES>
    </FOOD>
    <FOOD>
    <NAME>Khichdi</NAME>
    <PRICE>150</PRICE>
    <DESCRIPTION>
    The famous gujrati dish
    </DESCRIPTION>
    <CALORIES>40</CALORIES>
    </FOOD>
    <BOOK>
      <AUTHOR>Santosh Bihari</AUTHOR>
      <TITLE>PHP Core</TITLE>
      <GENER>programming</GENER>
      <PRICE>44.95</PRICE>
      <DATE>2000-10-01</DATE>
   </BOOK>
   <BOOK>
      <AUTHOR>Shyam N Chawla</AUTHOR>
      <TITLE>.NET Begin</TITLE>
      <GENER>Computer</GENER>
      <PRICE>250</PRICE>
      <DATE>2002-17-05</DATE>
   </BOOK>
   <BOOK>
      <AUTHOR>Anci C</AUTHOR>
      <TITLE>Dr. Ruby</TITLE>
      <GENER>Computer</GENER>
      <PRICE>350</PRICE>
      <DATE>2001-04-11</DATE>
   </BOOK>
</CATALOG>
6
  • StackOverflow is not a free code-writing service. Please research for solutions to this regular problem and make an earnest attempt at solution. Come back with a specific issue regarding your implementation. Commented Oct 25, 2022 at 13:50
  • We understand what StackOverflow is . Have tried many a ways , but looking for a generic way to convert nested XML to CSV format. Commented Oct 27, 2022 at 12:31
  • Error :: file.write(child.tag+";"+child.attrib) TypeError: can only concatenate str (not "dict") to str <Element 'Food' at 0x000002603F6139A8> Info {} Commented Oct 27, 2022 at 12:32
  • 1
    Please edit your post with attempted code and not in long, hard-to-read comments. Once done, please delete your comments. Commented Oct 27, 2022 at 20:28
  • Done with formatting of tried code Commented Nov 1, 2022 at 4:44

2 Answers 2

1

ElementTree is not really the best tool for what I believe you're trying to do. Since you have well-formed, relatively simple xml, try using pandas:

import pandas as pd

#from here, it's just a one liner
pd.read_xml('input.xml',xpath='.//store').to_csv('output.csv',sep=',', index = None, header=True)

and that should get you your csv file.

Sign up to request clarification or add additional context in comments.

3 Comments

thanks, this works well on simple XML files. But is there a way we can convert XML files to CSV generically, without knowing its xpath. And every subtag can be appended to its previous tag name as new column.
Tried this way on shared data in edited Description , output is not providing all the column values { specially not for "indicator id="AA.BB"" and country id="MM"
@Eja Of course it doesn't work. You newly edited sample xml is nothing like the pre-edit sample xml...
1

Given parsing element values and their corresponding attributes involves a second layer of iteration, consider a nested list/dict comphrehension with dictionary merge. Also, use csv.DictWriter to build CSV via dictionaries:

from csv import DictWriter
import xml.etree.ElementTree as ET

ifilepath = "Input.xml"

tree = ET.parse(ifilepath)
nmsp = {"du": "http://www.dummytest.org"}

data = [
     {
       **{el.tag.split('}')[-1]: (el.text.strip() if el.text is not None else None) for el in d.findall("*")},
       **{f"{el.tag.split('}')[-1]} {k}":v for el in d.findall("*") for k,v in el.attrib.items()},
       **d.attrib
     }     
     for d in tree.findall(".//du:data", namespaces=nmsp)    
]

dkeys = list(data[0].keys())

with open("DummyXMLtoCSV.csv", "w", newline="") as f:
    dw = DictWriter(f, fieldnames=dkeys)
    dw.writeheader()
    
    dw.writerows(data)

Output

indicator,country,date,value,unit,obs_status,decimal,indicator id,country id
"various, tests",test again,2021,1234567,,,0,AA.BB,MM
"testing, cases",coverage test,2020,3456223,,,0,XX.YY,DD

While above will add attributes to last columns of CSV. For specific ordering, re-order the dictionaries:

data = [ ... ]

cols = ["indicator id", "indicator", "country id", "country", "date", "value", "unit", "obs_status", "decimal"]

data = [
    {k: d[k] for k in cols} for d in data
]

with open("DummyXMLtoCSV.csv", "w", newline="") as f:
    dw = DictWriter(f, fieldnames=cols)
    dw.writeheader()
    
    dw.writerows(data)

Output

indicator id,indicator,country id,country,date,value,unit,obs_status,decimal
AA.BB,"various, tests",MM,test again,2021,1234567,,,0
XX.YY,"testing, cases",DD,coverage test,2020,3456223,,,0

10 Comments

This works well Parfait. But if i try to run the same code for a simple XML file { the first example with <?xml version="1.0" encoding="UTF-8"?> }. This gives me error as "*{el.tag.split('}')[1]: (el.text.strip() if el.text is not None else None) for el in d.findall("")}, IndexError: list index out of range"
Any generic way to handle both the XML files conversion to CSV. ?
See edit, adjusting index to -1 and adding the top-level attributes.
Yes, using -1 as index, and using top-level attributes works well on simple XML file also. But can we look for some generic code, that can cover both the examples together. ?
Also, instead of writing this data in to a file, can we store the same in a dataframe. That will be an ease to then convert that dataframe in to xls,csv or any other form
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.