1

I want to extract the name and d tags for each food item from the xml file.

I thought about making all the d tags to become children of name tag. And then looping over the contents of name. But not sure how to go about that or if there are other more efficient ways. Open to other solutions. I have some code but not there yet. Thank you!

## XML 

<?xml version="1.0"?>
<breakfast_menu>
    <food>
        <name>Belgian Waffles</name>
        <d>price 5.95</d>
        <d>Two of our famous Belgian Waffles 
with plenty of real maple syrup</d>
        <d>650 cal</d>
        <name>Belgian Waffles Light</name>
        <d>price 5.15</d>
        <d>Two of our famous Belgian Waffles with less calories</d>
        <d>450 cal</d> 
    </food>
    <food>
        <name>Strawberry Belgian Waffles</name>
        <d>price 7.95</d>
        <d>Light Belgian waffles covered 
with strawberries and whipped cream</d>
        <d>900 cal</d>
    </food>
    <food>
        <name>French Toast</name>
        <d>price 4.50</d>
        <d>Thick slices made from our 
homemade sourdough bread</d>
        <d>600 cal</d>
    </food>
</breakfast_menu>
## My code

import xml.etree.ElementTree as ET
import pandas as pd
  
tree = ET.parse('xml_doc_txt.txt')
root = mytree.getroot()

[elem.tag for elem in root.iter()]

for node in root.iter('food'):
    for name in node.findall('name'):
        Name = name.text
    for d in node.findall('d'):
        description = node.findtext('d')       
        action = action.append(pd.DataFrame(data={'Name': Name, 'Description': description}, index = [0]), ignore_index = True)

df = pd.DataFrame(action, columns=['Name', 'Description'])
df

The desired df should have 2 columns like so:


| Name             |         Description    |
| -----------------| --------------------------------------------- |
| Belgian Waffles  | price 5.95,Two of our famous..., 650 cal|
| Belgian Waffles Light  | price 5.15, Two of our famous..., 450 cal|          
| Strawberry Belgian Waffles | price 7.95,Light Belgian waffles..., 900 cal|                              
...

4
  • Are you sure you want 2 and not 3 columns (name, price,description)? Also, it's probably easier with lxml instead of ElementTree, if you have it installed. Commented Jun 8, 2021 at 16:38
  • yeah, no need for a price column. thanks for asking though! Commented Jun 8, 2021 at 16:40
  • OK, and ET vs lxml? Commented Jun 8, 2021 at 16:41
  • as long as it gets me to the result, happy to use that. I hadn't heard of that library. Commented Jun 8, 2021 at 16:43

3 Answers 3

1

Using lxml:

from lxml import etree
menu = """your xml above"""
root = etree.fromstring(menu)
for item in items:    
    rows.append([item.text,item.xpath('./following-sibling::d[1]/text()')[0]+" "+item.xpath('./following-sibling::d[2]/text()')[0]])

pd.DataFrame(rows,columns=columns)

Output (sorry about the formatting):

    name    desc
0   Belgian Waffles     price 5.95 Two of our famous Belgian Waffles \...
1   Belgian Waffles Light   price 5.15 Two of our famous Belgian Waffles w...
2   Strawberry Belgian Waffles  price 7.95 Light Belgian waffles covered \nwit...
3   French Toast    price 4.50 Thick slices made from our \nhomema...
Sign up to request clarification or add additional context in comments.

Comments

0

The below should work

import pandas as pd
import xml.etree.ElementTree as ET

xml = '''<breakfast_menu>
    <food>
        <name>Belgian Waffles</name>
        <d>price 5.95</d>
        <d>Two of our famous Belgian Waffles 
with plenty of real maple syrup</d>
        <d>650 cal</d>
    </food>
    <food>
        <name>Strawberry Belgian Waffles</name>
        <d>price 7.95</d>
        <d>Light Belgian waffles covered 
with strawberries and whipped cream</d>
        <d>900 cal</d>
    </food>
    <food>
        <name>French Toast</name>
        <d>price 4.50</d>
        <d>Thick slices made from our 
homemade sourdough bread</d>
        <d>600 cal</d>
    </food>
</breakfast_menu>'''

root = ET.fromstring(xml)
data = []
for food in root.findall('.//food'):
    data.append({'name': food.find('name').text, 'description': ','.join([d.text for d in food.findall('d')])})
df = pd.DataFrame(data)
print(df)

output

                         name                                        description
0             Belgian Waffles  price 5.95,Two of our famous Belgian Waffles \...
1  Strawberry Belgian Waffles  price 7.95,Light Belgian waffles covered \nwit...
2                French Toast  price 4.50,Thick slices made from our \nhomema...

4 Comments

Hi, thank you for your answer. I updated the XML with a more representative sample of the doc I'm working with. In this case, I updated with the case of having 2 different foods under the same food tag.
Can't see any difference in the xml
check out the first food tag, you'll see that there are 2 options under the same food tag now. thanks!
see the lxml based answer
0

your code as some naming error. you don't have to use findall every time like name is only one time . action is not define but you are still appending it , this code generate your desire output of df

import xml.etree.ElementTree as ET
import pandas as pd
  
tree = ET.parse('xml_doc_txt.txt')
root = tree.getroot()


breakfast_lst = []
descriptions_lst = []

for node in root.iter('food'):
    breakfasts = node.findall('name')
    descriptions =  node.findall('d')
    d_tag_num = 3 #i assuming there are always three tag

    for i,breakfast_name in enumerate(breakfasts):
        breakfast_lst.append(breakfast_name.text)
        full_description = ', '.join([descrpt.text for descrpt in descriptions[i*d_tag_num:(i*d_tag_num)+d_tag_num]])
        descriptions_lst.append(full_description)


df = pd.DataFrame(data={'Name':breakfast_lst,'Description':descriptions_lst})
print(df)

1 Comment

Thank you for your answer. I've updated the XML with a more representative sample of my data. Could you update your response?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.