0

Trying to convert XML to CSV. I am new to python parsing.

Data sample ("Dummy data")

<users>
  <user firstName="Hannah" lastName="Jones" age="21" sex="Female" retired="False" dependants="2" marital_status="married or civil partner" salary="20603" pension="0" company="Ward and Sons" commute_distance="6.56" address_postcode="N06 4LG"/>
  <user firstName="Tracy" lastName="Rowley" age="50" sex="Female" retired="False" dependants="1" marital_status="single" salary="39509" pension="0" company="Fuller, King and Robinson" commute_distance="11.01" address_postcode="M1 6JD"/>
  <user firstName="Shane" lastName="Thompson" age="87" sex="Male" retired="True" dependants="2" marital_status="single" salary="53134" pension="13409" company="N/A" commute_distance="0" address_postcode="WF84 1EA"/>
  <user firstName="Michael" lastName="Anderson" age="85" sex="Male" retired="True" dependants="2" marital_status="married or civil partner" salary="58524" pension="39479" company="N/A" commute_distance="0" address_postcode="BN1 7TL"/>
</users>

I tried this code

import csv
import xml.etree.ElementTree as Xet
import pandas as pd

cols = ["firstName", "lastName", "age", "sex", "retired", "dependants", "marital_status", "salary", "pension", "company", "commute_distance", "address_postcode"]
rows = []
  
# Parsing the XML file
xmlparse = Xet.parse('/content/drive/MyDrive/DATAtask1/user_data.xml')
root = xmlparse.getroot()
for user in root:
    firstName = user.find("firstName").text
    lastName = user.find("lastName").text
    age = user.find("age").text
    sex = user.find("sex").text
    retired = user.find("retired").text
    dependants = user.find("dependants").text
    marital_status = user.find("marital_status").text
    salary = user.find("salary").text
    pension = user.find("pension").text
    company = user.find("company").text
    commute_distance = user.find("commute_distance").text
    address_postcode = user.find("address_postcode").text
  
    rows.append({"firstName": firstName,
                 "lastName": lastName,
                 "age": age,
                 "sex": sex,
                 "retired": retired,
                 "dependants": dependants,
                 "marital_status": marital_status,
                 "pension": pension,
                 "salary": salary,
                 "company": company,
                 "commute_distance": commute_distance,
                 "address_postcode": address_postcode})
  
df = pd.DataFrame(rows, columns=cols)
  
# Writing dataframe to csv
df.to_csv('/content/drive/MyDrive/DATAtask1/XMLtoCSV.csv') 

Getting this error

AttributeError Traceback (most recent call last)

<ipython-input-3-c6016197ed71> in <module>()
      7 root = xmlparse.getroot()
      8 for user in root:
----> 9     firstName = user.find("firstName").text
     10     lastName = user.find("lastName").text
     11     age = user.find("age").text

AttributeError: 'NoneType' object has no attribute 'text'
1
  • Consider the new pandas.read_xml which can parse attribute-centric XML like yours: df = pd.read_xml("user_data.xml") Commented Jan 27, 2022 at 15:14

1 Answer 1

1

The tag has an attribute with firstName. Therefore you should use:

user.attrib['firstName']

If you check: user.attrib, it will return a dictionary (this is not true, it returns lxml.etree._Attrib which can be converted to a dictionary using (dict(user.attrib)). That will give you an opportunity to make your code easier since you can just use the dictionary like a normal python dictionary.

For example you can create a list and append all dictionaries to the list. At the end it is possible to convert a list of dictionaries to a pandas dataframe:

d1 = {'name': 'john', 'age': 19}
d2 = {'name': 'Steve', 'age': 16}
# A dictionary with an extra key:
d3 = {'name': 'Jim', 'age': 25, 'additional': 'something'}
df = pd.DataFrame([d1, d2, d3])

    name  age additional
0   john   19        NaN
1  Steve   16        NaN
2    Jim   25  something
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.