2

I know this question has been asked before and my last was put on hold, so now I'm specifying it detailed. I have a CSV file of population information, I read it to pandas and now have to transform it to XML, for example like this
<?xml version="1.0" encoding="utf-8"?> <populationdata> <municipality> <name> Akaa </name> <year> 2014 </year> <total> 17052 ......

This is the reading part of my code:
import pandas as pd pop = pd.read_csv(r'''directory\population.csv''', delimiter=";")

Tried doing it like in mentioned before in the link here with function and cycle: How do convert a pandas/dataframe to XML?. Haven't succeeded, any other recommendations maybe?

This is an example of my dataframe:
Alahärmä 2014 0 0.1 0.2 0 Alajärvi 2014 10171 5102 5069 1 Alastaro 2014 0 0 0 2 Alavieska 2014 2687 1400 1287 3 Alavus 2014 12103 6102 6001 4 Anjalankoski 2014 0 0 0

Fairly new to python, so any help is apreciated.

3
  • Could you show us a head() of your dataframe Commented Nov 7, 2017 at 12:14
  • this is what it showed: Alahärmä 2014 0 0.1 0.2 0 Alajärvi 2014 10171 5102 5069 1 Alastaro 2014 0 0 0 2 Alavieska 2014 2687 1400 1287 3 Alavus 2014 12103 6102 6001 4 Anjalankoski 2014 0 0 0 Commented Nov 7, 2017 at 12:19
  • Does this answer your question? How do convert a pandas/dataframe to XML? Commented Mar 25, 2021 at 9:26

1 Answer 1

1

The question you have linked to actually has a great answer to your question but I guess you’re having difficulty transposing your data into that solution so Ive done it below for you.

Ok your level of detail is a bit sketchy. If your specific situation differs slightly then you'll need to tweak my answer but heres something that works for me:

First off assuming you have a text file as follows :

0       Alahärmä  2014      0   0.1   0.2
1      Alajärvi  2014  10171  5102  5069
2      Alastaro  2014      0     0     0
3     Alavieska  2014   2687  1400  1287
4        Alavus  2014  12103  6102  6001
5  Anjalankoski  2014      0     0     0

Moving on to creating the python script, we first import that text file using the following line:

pop = pd.read_csv(r'directory\population.csv', delimiter=r"\s+", names=['cityname', 'year', 'total', 'male', 'females'])

This brings in the text file as a dataframe and gives the new dataframe the correct column headers.

Then taking the data from the question you linked to, we add the following to our python script:

def func(row):
    xml = ['<item>']
    for field in row.index:
        xml.append('  <field name="{0}">{1}</field>'.format(field, row[field]))
    xml.append('</item>')
    return '\n'.join(xml)

print('\n'.join(pop.apply(func, axis=1)))

Now we put it all together and we get the below:

import pandas as pd
pop = pd.read_csv(r'directory\population.csv', delimiter=r"\s+", names=['cityname', 'year', 'total', 'male', 'females'])

def func(row):
    xml = ['<item>']
    for field in row.index:
        xml.append('  <field name="{0}">{1}</field>'.format(field, row[field]))
    xml.append('</item>')
    return '\n'.join(xml)

print('\n'.join(pop.apply(func, axis=1)))

When we run the above file we get the following output:

<item>
  <field name="cityname">Alahärmä</field>
  <field name="year">2014</field>
  <field name="total">0</field>
  <field name="male">0.1</field>
  <field name="females">0.2</field>
</item>
<item>
  <field name="cityname">Alajärvi</field>
  <field name="year">2014</field>
  <field name="total">10171</field>
  <field name="male">5102.0</field>
  <field name="females">5069.0</field>
</item>
<item>
  <field name="cityname">Alastaro</field>
  <field name="year">2014</field>
  <field name="total">0</field>
  <field name="male">0.0</field>
  <field name="females">0.0</field>
</item>
<item>
  <field name="cityname">Alavieska</field>
  <field name="year">2014</field>
  <field name="total">2687</field>
  <field name="male">1400.0</field>
  <field name="females">1287.0</field>
</item>
<item>
  <field name="cityname">Alavus</field>
  <field name="year">2014</field>
  <field name="total">12103</field>
  <field name="male">6102.0</field>
  <field name="females">6001.0</field>
</item>
<item>
  <field name="cityname">Anjalankoski</field>
  <field name="year">2014</field>
  <field name="total">0</field>
  <field name="male">0.0</field>
  <field name="females">0.0</field>
</item>
Sign up to request clarification or add additional context in comments.

3 Comments

You are gold! I was getting few errors, because of brackets or ' in wrong place. Thanks!
Last question: how about would I be able to save this output to xml file?
If it still helps you or someone: import lxml.etree as ET parser = ET.XMLParser(recover=True) tree = ET.ElementTree(ET.fromstring(xml, parser=parser)) tree.write(filelocation)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.