How can I convert XML into a Python object?

Question

I need to load an XML file and convert the contents into an object-oriented Python structure. I want to take this:

<main>
    <object1 attr="name">content</object>
</main>

And turn it into something like this:

main
main.object1 = "content"
main.object1.attr = "name"

The XML data will have a more complicated structure than that and I can't hard code the element names. The attribute names need to be collected when parsing and used as the object properties.

How can I convert XML data into a Python object?

Stevoisiak · Accepted Answer · 2018-02-13 21:30:24Z

65

It's worth looking at lxml.objectify.

xml = """<main>
<object1 attr="name">content</object1>
<object1 attr="foo">contenbar</object1>
<test>me</test>
</main>"""

from lxml import objectify

main = objectify.fromstring(xml)
main.object1[0]             # content
main.object1[1]             # contenbar
main.object1[0].get("attr") # name
main.test                   # me

Or the other way around to build xml structures:

item = objectify.Element("item")
item.title = "Best of python"
item.price = 17.98
item.price.set("currency", "EUR")

order = objectify.Element("order")
order.append(item)
order.item.quantity = 3
order.price = sum(item.price * item.quantity for item in order.item)

import lxml.etree
print(lxml.etree.tostring(order, pretty_print=True))

Output:

<order>
  <item>
    <title>Best of python</title>
    <price currency="EUR">17.98</price>
    <quantity>3</quantity>
  </item>
  <price>53.94</price>
</order>

edited Feb 13, 2018 at 21:30

Stevoisiak

27.8k32 gold badges140 silver badges245 bronze badges

answered Jan 7, 2009 at 4:51

Peter Hoffmann

59.1k15 gold badges78 silver badges60 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Ryan Ginstrom Over a year ago

When I run your generation example using lxml version 2.2 beta1, my XML is full of type annotations ("<title py:pytype="str">..."). Is there a way to supress that?

Peter Hoffmann Over a year ago

you can use lxml.etree.cleanup_namespaces(order)

Paul McMillan Over a year ago

You actually want to use both lxml.objectify.deannotate(order) and lxml.etree.cleanup_namespaces(order).

Nicolas Stubi Over a year ago

Beware of some vulnerabilities, in particular with objectif: bandit.readthedocs.io/en/1.7.9/blacklists/…

Soviut · Accepted Answer · 2009-01-07 01:52:25Z

9

I've been recommending this more than once today, but try Beautiful Soup (easy_install BeautifulSoup).

from BeautifulSoup import BeautifulSoup

xml = """
<main>
    <object attr="name">content</object>
</main>
"""

soup = BeautifulSoup(xml)
# look in the main node for object's with attr=name, optionally look up attrs with regex
my_objects = soup.main.findAll("object", attrs={'attr':'name'})
for my_object in my_objects:
    # this will print a list of the contents of the tag
    print my_object.contents
    # if only text is inside the tag you can use this
    # print tag.string

edited Jan 7, 2009 at 1:52

answered Jan 7, 2009 at 0:15

Soviut

92.2k53 gold badges210 silver badges285 bronze badges

4 Comments

Stephen Belanger Over a year ago

main.findAll need to be soup.findAll, but that helped a bit. Still not exactly what I wanted--but I think I may have an idea of how to get it to work. It's going to be used in external py files that will be interpretted by the app, so I can probably just remap them before execution.

Soviut Over a year ago

I fixed the bugs in the code and updated the xml. I simply copied the original code giving the in the question.

Nas Banov Over a year ago

BeautifulSoup (BeutifulStoneSoup) breaks with empty tags <element />, e.g. <icon data="/ig/images/weather/partly_cloudy.gif"/> - and those are aplenty in xml :(

Stevoisiak Over a year ago

This should be updated to use BeautifulSoup4. The old version is no longer maintained, and is not compatible with Python 3.

Erdogan Kurtur · Accepted Answer · 2020-11-19 13:04:15Z

4

David Mertz's gnosis.xml.objectify would seem to do this for you. Documentation's a bit hard to come by, but there are a few IBM articles on it, including this one (text only version).

from gnosis.xml import objectify

xml = "<root><nodes><node>node 1</node><node>node 2</node></nodes></root>"
root = objectify.make_instance(xml)

print root.nodes.node[0].PCDATA # node 1
print root.nodes.node[1].PCDATA # node 2

Creating xml from objects in this way is a different matter, though.

edited Nov 19, 2020 at 13:04

Erdogan Kurtur

3,68624 silver badges39 bronze badges

answered Jan 7, 2009 at 1:21

Ryan Ginstrom

14.2k5 gold badges49 silver badges60 bronze badges

Comments

RoboDev · Accepted Answer · 2009-01-06 23:00:31Z

1

How about this

http://evanjones.ca/software/simplexmlparse.html

answered Jan 6, 2009 at 23:00

RoboDev

3,71811 gold badges45 silver badges51 bronze badges

Comments

JV. · Accepted Answer · 2009-01-07 01:06:09Z

#@Stephen: 
#"can't hardcode the element names, so I need to collect them 
#at parse and use them somehow as the object names."

#I don't think thats possible. Instead you can do this. 
#this will help you getting any object with a required name.

import BeautifulSoup


class Coll(object):
    """A class which can hold your Foo clas objects 
    and retrieve them easily when you want
    abstracting the storage and retrieval logic
    """
    def __init__(self):
        self.foos={}        

    def add(self, fooobj):
        self.foos[fooobj.name]=fooobj

    def get(self, name):
        return self.foos[name]

class Foo(object):
    """The required class
    """
    def __init__(self, name, attr1=None, attr2=None):
        self.name=name
        self.attr1=attr1
        self.attr2=attr2

s="""<main>
         <object name="somename">
             <attr name="attr1">value1</attr>
             <attr name="attr2">value2</attr>
         </object>
         <object name="someothername">
             <attr name="attr1">value3</attr>
             <attr name="attr2">value4</attr>
         </object>
     </main>
"""

#

soup=BeautifulSoup.BeautifulSoup(s)


bars=Coll()
for each in soup.findAll('object'):
    bar=Foo(each['name'])
    attrs=each.findAll('attr')
    for attr in attrs:
        setattr(bar, attr['name'], attr.renderContents())
    bars.add(bar)


#retrieve objects by name
print bars.get('somename').__dict__

print '\n\n', bars.get('someothername').__dict__

output

{'attr2': 'value2', 'name': u'somename', 'attr1': 'value1'}


{'attr2': 'value4', 'name': u'someothername', 'attr1': 'value3'}

Hugo · Accepted Answer · 2023-08-18 18:56:24Z

1

I would suggest xsData (https://xsdata.readthedocs.io/en/latest/)

# Parse XML
from pathlib import Path
from tests.fixtures.primer import PurchaseOrder
from xsdata.formats.dataclass.parsers import XmlParser

xml_string = Path("tests/fixtures/primer/sample.xml").read_text()
parser = XmlParser()
order = parser.from_string(xml_string, PurchaseOrder)
order.bill_to

answered Aug 18, 2023 at 18:56

Hugo

2,1853 gold badges29 silver badges40 bronze badges

Comments

user26294 · Accepted Answer · 2009-01-07 00:09:03Z

0

There are three common XML parsers for python: xml.dom.minidom, elementree, and BeautifulSoup.

IMO, BeautifulSoup is by far the best.

http://www.crummy.com/software/BeautifulSoup/

answered Jan 7, 2009 at 0:09

user26294

5,6624 gold badges25 silver badges18 bronze badges

1 Comment

Nas Banov Over a year ago

BeautifulSoup does not play well with XML - it has problem with empty tags <element/> - which is ok for HTML because those are not popular there

Collectives™ on Stack Overflow

How can I convert XML into a Python object?

7 Answers 7

4 Comments

4 Comments

Comments

Comments

Comments

Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

7 Answers 7

4 Comments

4 Comments

Comments

Comments

Comments

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related