12

I'd like to parse a simple, small XML file using python however work on pyXML seems to have ceased. I'd like to use python 2.6 if possible. Can anyone recommend an XML parser that will work with 2.6?

Thanks

5 Answers 5

19

If it's small and simple then just use the standard library:

from xml.dom.minidom import parse
doc = parse("filename.xml")

This will return a DOM tree implementing the standard Document Object Model API

If you later need to do complex things like schema validation or XPath querying then I recommend the third-party lxml module, which is a wrapper around the popular libxml2 C library.

Sign up to request clarification or add additional context in comments.

Comments

6

For most of my tasks I have used the Minidom Lightweight DOM implementation, from the official page:

from xml.dom.minidom import parse, parseString

dom1 = parse('c:\\temp\\mydata.xml') # parse an XML file by name

datasource = open('c:\\temp\\mydata.xml')
dom2 = parse(datasource)   # parse an open file

dom3 = parseString('<myxml>Some data<empty/> some more data</myxml>')

Comments

5

Here is also a very good example on how to use minidom along with explanations.

2 Comments

The link seems to redirect to the homepage without www in front of it. This is a working link: diveintopython.net/xml_processing/index.html
Fixed the link. Thank you!
3

Would lxml suit your needs? Its the first tool I turn to for xml parsing.

1 Comment

Additionally, Python 2.5+ has etree integrated. It implements what amounts to a subset of lxml. I use etree for simple XML processing and lxml when I need anything that etree doesn't quite cover.
1

A few years ago, I wrote a library for working with structured XML. It makes XML simpler by making some limiting assumptions.

You could use XML for something like a word processor document, in which case you have a complicated soup of stuff with XML tags embedded all over the place; in which case my library would not be good.

But if you are using XML for something like a config file, my library is rather convenient. You define classes that describe the structure of the XML you want, and once you have the classes done, there is a method to slurp in XML and parse it. The actual parsing is done by xml.dom.minidom, but then my library extracts the data and puts it in the classes.

The best part: you can declare a "Collection" type that will be a Python list with zero or more other XML elements inside it. This is great for things like Atom or RSS feeds (which was the original reason I designed the library).

Here's the URL: http://home.avvanta.com/~steveha/xe.html

I'd be happy to answer questions if you have any.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.