How to parse XML with multiple attribute values within a single tag to DataFrame?

Question

<?xml version="2.0" encoding="UTF-8" ?><timestamp="20220113">
<defintions>
    <defintion id="1" old_id="0">Lang</defintion>
    <defintion id="7" old_id="1">Eng</defintion>

How can I parse an XML file that looks like this? Here, I have multiple values within a single tag. I want to extract values such as "ID", and "OLD_ID" in a list or dataframe format.

Rolled version back cause additional question provided under: stackoverflow.com/questions/75210241/… Thanks — HedgeHog
– HedgeHog, Commented Jan 23, 2023 at 13:36

HedgeHog · Accepted Answer · 2023-01-23 11:25:57Z

2

You could use BeautifulSoup and xml parser to get your goal, simply select the elements needed and iterate ResultSet to extract attribute values via .get().

with open('filename.xml', 'r') as f:
    file = f.read() 
    soup = BeautifulSoup(file, 'xml')

Example

from bs4 import BeautifulSoup
import pandas as pd

xml = '''<?xml version="2.0" encoding="UTF-8" ?><timestamp="20220113">
<defintions>
    <defintion id="1" old_id="0">Lang</defintion>
    <defintion id="7" old_id="1">Eng</defintion>
'''
soup = BeautifulSoup(xml,'xml')


pd.DataFrame(
    [
        (e.get('id'),e.get('old_id'))
        for e in soup.select('defintion')
    ],
    columns = ['id','old_id']
)

Output

	id	old_id
0	1	0
1	7	1

edited Jan 23, 2023 at 11:25

answered Jan 23, 2023 at 11:08

HedgeHog

25.4k5 gold badges18 silver badges43 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

x89 Over a year ago

Could you also help with a second use case? In this case, I need to extract a combination: attributes of one tag (i.e offer like we did earlier), contents of some tags themselves (eg for level, name), and then the attributes of the first tag (timestamp) whose value would repeat across all fields. I edited the qs

HedgeHog Over a year ago

To keep original question clean, this would be predestined for asking a new question with exact this focus - simply drop the link in the comments to reference your new answer. would be great

x89 Over a year ago

stackoverflow.com/questions/75210241/…

Francisco Rodrigues · Accepted Answer · 2023-01-23 10:53:05Z

0

Using python Beautiful Soup, you could parse the .xml file to a Beatuful soup object and then use .findAll('defintions'). Then loop through the tags you find and get the desired values

object.findAll('defintions')

for defintion in defintions:
    old_id = defintions['old_id']
    id = defintions['id']

references: https://www.crummy.com/software/BeautifulSoup/bs4/doc/ https://linuxhint.com/parse_xml_python_beautifulsoup/

answered Jan 23, 2023 at 10:53

Francisco Rodrigues

212 bronze badges

3 Comments

x89 Over a year ago

how do you define "object" if you are reading the content from a file?

HedgeHog Over a year ago

In newer code avoid old syntax findAll() instead use find_all() or select() with css selectors - For more take a minute to check docs

Francisco Rodrigues Over a year ago

with open('teachers.xml', 'r') as f: file = f.read() # 'xml' is the parser used. For html files, which BeautifulSoup is typically used for, it would be 'html.parser'. soup = BeautifulSoup(file, 'xml') ref : stackabuse.com/parsing-xml-with-beautifulsoup-in-python

Hermann12 · Accepted Answer · 2023-01-24 18:34:15Z

0

If you have a valid XML like (timestamp tag can't have a value like an attribute):

<?xml version='1.0' encoding='utf-8'?>
<root timestamp='20220113'>
<defintions>
    <defintion id="1" old_id="0">Lang</defintion>
    <defintion id="7" old_id="1">Eng</defintion>
</defintions>
</root>

Than you can use pandas:

import pandas as pd

df = pd.read_xml('x89.xml', xpath='.//defintion')
print(df.to_string(index=False))

Output:

 id  old_id defintion
  1       0      Lang
  7       1       Eng

edited Jan 24, 2023 at 18:34

answered Jan 24, 2023 at 18:27

Hermann12

4,1382 gold badges8 silver badges21 bronze badges

Collectives™ on Stack Overflow

How to parse XML with multiple attribute values within a single tag to DataFrame?

3 Answers 3

Example

Output

3 Comments

3 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Example

Output

3 Comments

3 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related