0

I am trying to take data from API call that returns XML object and parse few data points into a csv file with each object in its own column.

The XML looks like this:

<?xml version="1.0" encoding="utf-8" ?>

<YourMembership_Response>
<Items>
<Item>
<ItemID></ItemID>
<ID>92304823A-2932</ID>
<WebsiteID>0987</WebsiteID>
<NamePrefix></NamePrefix>
<FirstName>John</FirstName>
<MiddleName></MiddleName>
<LastName>Smith</LastName>
<Suffix></Suffix>
<Nickname></Nickname>
<EmployerName>abc company</EmployerName>
<WorkTitle>manager</WorkTitle>
<Date>3/14/2013 2:12:39 PM</Date>
<Description>Removed from group by Administration.</Description>
</Item>
<Item>
<ItemID></ItemID>
<ID>92304823A-2932</ID>
<WebsiteID>0987</WebsiteID>
<NamePrefix></NamePrefix>
<FirstName>John</FirstName>
<MiddleName></MiddleName>
<LastName>Smith</LastName>
<Suffix></Suffix>
<Nickname></Nickname>
<EmployerName>abc company</EmployerName>
<WorkTitle>manager</WorkTitle>
<Date>3/14/2013 2:12:39 PM</Date>
<Description>Removed from group by Administration.</Description>
</Item>

I have written this code to write just IDs into CSV, which works fine.

with open("output1.csv", "wb") as f:
    writer = csv.writer(f)
    for node in tree.findall('.//ID'):
        writer.writerow([node.text])

Now when I attempting to write multiple data points into csv, the machine is simply appending the data points into one column. This the code here I have been attempting with:

with open("test1.csv", "wb") as f:
    writer = csv.writer(f)
    for node in tree.findall('.//ID'):
        writer.writerow([node.text])
    for node in tree.findall('.//FirstName'):
        writer.writerow([node.text])
    for node in tree.findall('.//LastName'):
        writer.writerow([node.text]) 

I need the data to look like this in the csv with other data points of choosing later on, what am I doing wrong?:

ID                    FirstName     LastName
92304823A-2932         John           Smith

Thank you in advance.

2
  • How big is the input xml? Commented Sep 12, 2017 at 19:26
  • I dont have an answer for the input size, but there are roughly 15000 members I have to do this for. Commented Sep 12, 2017 at 19:40

1 Answer 1

1

This is, in essence, how to collect the data.

>>> from xml.etree import ElementTree
>>> tree = ElementTree.parse('api.xml')
>>> tree.findall('.//Item')
[<Element 'Item' at 0x0000000006679EA8>, <Element 'Item' at 0x0000000006681318>]
>>> for item in tree.findall('.//Item'):
...     item.find('ID').text, item.find('FirstName').text, item.find('LastName').text
... 
('92304823A-2932', 'John', 'Smith')
('92304823A-2932', 'John', 'Smith')

In contrast, when you use a construct like tree.findall('.//ID') you are asking the xpath engine to start with tree (that's the '.' part) and look down through the branches for all occurences of 'ID' at once. This means that, in you sample xml code you will get a set of two IDs which won't even necessarily be in the original order. What you need to do is, first find all of the Item entries, then find the three corresponding data pieces of interest for that Item.

Addendum:

>>> import csv
>>> with open('api.csv', 'w', newline='') as csvfile:
...     fieldnames = ['ID', 'FirstName', 'LastName']
...     writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
...     writer.writeheader()
...     for item in tree.findall('.//Item'):
...         writer.writerow({
...             'ID': item.find('ID').text,
...             'FirstName': item.find('FirstName').text,
...             'LastName': item.find('LastName').text})

Resulting output file:

ID,FirstName,LastName
92304823A-2932,John,Smith
92304823A-2932,John,Smith
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.