3

I have the following xml format:

<?xml version="1.0" encoding="UTF-8"?>
<results>
   <run>
      <information>
         <logfile>s.log</logfile>
         <version>33</version>
         <mach>1</mach>
         <problemname>mm1</problemname>
         <timestamp>20201218.165122.053486</timestamp>
      </information>
      <controls>
         <item>VARS</item>
      </controls>
      <result>
         <status>4</status>
         <time>3</time>
         <obj>1.0</obj>
         <gap>0.15</gap>
      </result>
   </run>
</results>

I have a sample code below to parse this file after reading this post How to convert an XML file to nice pandas dataframe?, but it returns None. However, my question is if there is a fast way to create a dataframe that contains an index from value of (i.e., VARS) and 4 columns i.e., status, time, obj, and gap.

import pandas as pd
from xml.etree import ElementTree as et

root = (et.parse('test.xml').getroot()).getchildren()


tags = {"tags":[]}
for elem in root:
    tag = {}
    tag["status"] = elem.attrib['status']
    tag["time"] = elem.attrib['time']
    tag["obj"] = elem.attrib['obj']
    tag["gap"] = elem.attrib['gap']
    tags["tags"]. append(tag)

df_users = pd.DataFrame(tags["tags"])
df_users.head()

This is the output I am looking for:


      status  time  obj   gap
VARS  4        3    1.0   0.15
8
  • What is etree outputting for you? We sort of don't care about the xml, we care about etree's output since that is what you are trying to make a df. Commented Dec 22, 2020 at 22:45
  • 1
    Also, see How to convert an XML file to nice pandas dataframe? Commented Dec 22, 2020 at 22:46
  • Your xml isn't well formed - for example, where do <run> and <results> close? Commented Dec 22, 2020 at 23:14
  • @JackFleeting. Thanks. Just updated that. Commented Dec 22, 2020 at 23:25
  • @noah Thanks for sharing the post. Updated my question according to that. Commented Dec 22, 2020 at 23:26

3 Answers 3

1
+50

We can use findall and find methods of ElementTree to extract the elements that we need (children of result as columns, and controls/item as index):

pd.DataFrame({x.tag: x.text for x in et.findall('./run/result//')},
             index = [et.find('./run/controls/item').text])

Output:

     status time  obj   gap
VARS      4    3  1.0  0.15
Sign up to request clarification or add additional context in comments.

Comments

1

I think you still need to loop through etree to extract bit and pieces using xml.

import pandas as pd
from xml.etree import ElementTree as et

root = et.parse('test.xml').getroot()

results = []
for ele in eles.findall('run'):
    # assumed each run contains only one control item 
    control = ele.find('controls').find('item').text
    # extract each run result and save it in the results 
    for attr in list(ele.find('result')):
        result = {}
        result['control'] = control
        result[attr.tag] = attr.text
        results.append(result)
# at last, convert into dataframe and set control as index 
results = pd.DataFrame(results)
results = results.set_index('control')

1 Comment

How is "eles" created here?
0

Note that, status is not under root but you are trying to find it under root.

status is under the parent result.

You need to check recursively for status under the children.

Refer to the documentation. It gives detail on the methods with samples. findall is useful as others suggested.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.