Parse xml with sub-nodes and create a Pandas dataframe

Question

I have the following xml format:

<?xml version="1.0" encoding="UTF-8"?>
<results>
   <run>
      <information>
         <logfile>s.log</logfile>
         <version>33</version>
         <mach>1</mach>
         <problemname>mm1</problemname>
         <timestamp>20201218.165122.053486</timestamp>
      </information>
      <controls>
         <item>VARS</item>
      </controls>
      <result>
         <status>4</status>
         <time>3</time>
         <obj>1.0</obj>
         <gap>0.15</gap>
      </result>
   </run>
</results>

I have a sample code below to parse this file after reading this post How to convert an XML file to nice pandas dataframe?, but it returns None. However, my question is if there is a fast way to create a dataframe that contains an index from value of (i.e., VARS) and 4 columns i.e., status, time, obj, and gap.

import pandas as pd
from xml.etree import ElementTree as et

root = (et.parse('test.xml').getroot()).getchildren()


tags = {"tags":[]}
for elem in root:
    tag = {}
    tag["status"] = elem.attrib['status']
    tag["time"] = elem.attrib['time']
    tag["obj"] = elem.attrib['obj']
    tag["gap"] = elem.attrib['gap']
    tags["tags"]. append(tag)

df_users = pd.DataFrame(tags["tags"])
df_users.head()

This is the output I am looking for:


      status  time  obj   gap
VARS  4        3    1.0   0.15

What is etree outputting for you? We sort of don't care about the xml, we care about etree's output since that is what you are trying to make a df. — noah
– noah, Commented Dec 22, 2020 at 22:45
Also, see How to convert an XML file to nice pandas dataframe? — noah
– noah, Commented Dec 22, 2020 at 22:46
Your xml isn't well formed - for example, where do <run> and <results> close? — Jack Fleeting
– Jack Fleeting, Commented Dec 22, 2020 at 23:14
@noah Thanks for sharing the post. Updated my question according to that. — Alex Man
– Alex Man, Commented Dec 22, 2020 at 23:26

perl · Accepted Answer · 2021-01-08 01:29:41Z

1

+50

We can use findall and find methods of ElementTree to extract the elements that we need (children of result as columns, and controls/item as index):

pd.DataFrame({x.tag: x.text for x in et.findall('./run/result//')},
             index = [et.find('./run/controls/item').text])

Output:

     status time  obj   gap
VARS      4    3  1.0  0.15

edited Jan 8, 2021 at 1:29

answered Jan 8, 2021 at 1:04

perl

9,9811 gold badge14 silver badges23 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

ABC · Accepted Answer · 2021-01-05 03:08:52Z

1

I think you still need to loop through etree to extract bit and pieces using xml.

import pandas as pd
from xml.etree import ElementTree as et

root = et.parse('test.xml').getroot()

results = []
for ele in eles.findall('run'):
    # assumed each run contains only one control item 
    control = ele.find('controls').find('item').text
    # extract each run result and save it in the results 
    for attr in list(ele.find('result')):
        result = {}
        result['control'] = control
        result[attr.tag] = attr.text
        results.append(result)
# at last, convert into dataframe and set control as index 
results = pd.DataFrame(results)
results = results.set_index('control')

answered Jan 5, 2021 at 3:08

ABC

6453 silver badges10 bronze badges

1 Comment

Wev Over a year ago

How is "eles" created here?

T.kowshik Yedida · Accepted Answer · 2021-01-11 11:40:17Z

0

Note that, status is not under root but you are trying to find it under root.

status is under the parent result.

You need to check recursively for status under the children.

Refer to the documentation. It gives detail on the methods with samples. findall is useful as others suggested.

answered Jan 11, 2021 at 11:40

T.kowshik Yedida

1951 gold badge2 silver badges13 bronze badges

Collectives™ on Stack Overflow

Parse xml with sub-nodes and create a Pandas dataframe

3 Answers 3

Comments

1 Comment

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related