Problems parsing an XML file with xml.etree.ElementTree

Question

I have to parse xml files which contain entries like

<error code="UnknownDevice">
    <description />
</error>

which are defined elsewhere as

<group name="error definitions">
     <errordef id="0x11" name="UnknownDevice">
        <description>Indicated device is unknown</description>
     </errordef>
     ...
</group>

given

import xml.etree.ElementTree as ET

parser = ET.XMLParser()
parser.parser.UseForeignDTD(True)

tree = ET.parse(inputFileName, parser=parser)
root = tree.getroot()

How can I get those values for errorDef? I mean the value of id and of description?

How can I search for & extract those values, using unknownDevice?

[Update] The error groups have differing names, but always of the format "XXX error definitions", "YYY error definitions", etc

Further, they seem to be nested at different depths in different documents.

Given the error's title, e.g "unknownDevice", how can I search everything under the root to get the corresponding id and description values?

Can I go directly to them, using e.g "unknownDevice", or do I have to search first for the error groups?

Are you using lxml? All current answers assume that (they use the xpath method, available in lxml but not in the std lib ElementTree module). — mzjn
– mzjn, Commented Jul 23, 2015 at 17:01
No I am not (I am so new that I don't even know what it is). I will update the question when I get back to the office, to show what ET is. Thanks for pointing that out. — Mawg
– Mawg, Commented Jul 23, 2015 at 18:37

larsks · Accepted Answer · 2015-07-23 16:19:39Z

1

If you have this:

<group name="error definitions">
     <errordef id="0x11" name="UnknownDevice">
        <description>Indicated device is unknown</description>
     </errordef>
     ...
</group>

And you want to get the value of description and id for every errordef element, you could do this:

for err in tree.xpath('//errordef'):
    print err.get('id'), err.find('description').text

Which would give you something like:

0x11 Indicated device is unknown

answered Jul 23, 2015 at 16:19

larsks

318k50 gold badges474 silver badges482 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

larsks Over a year ago

That usually means someone is angry their question got downvoted and is taking it out on everybody else. C'est la vie here on SO :)

alecxe Over a year ago

That's something strange about this thread. I've got two downvotes for a, I personally think, a good answer. All of the answers got the downvotes though, but not the question. Sometimes, I just don't understand the community here.

Mawg Over a year ago

You and me, both. I particularly dislike downvotes without explanation, but my pet peeve is "drive-by" "me too" downvoting, where as soon as you go negative, they pile on the downvotes with no explanation. Btw, I am working on applying your answer. It is looking good so far

Mawg Over a year ago

Yup, this works - but I forgot to mention that the groups nested at different depths in different documents. So, I asked a new question to cover that at stackoverflow.com/questions/31605115/… and have added my own brute force solution - feel fre to improve upon it

alecxe · Accepted Answer · 2015-07-23 19:20:07Z

1

First, parse the error definitions into a dictionary:

errors = {
    errordef.attrib["name"]: {"id": errordef.attrib.get("id"), "description": errordef.findtext("description")}
    for errordef in root.xpath(".//group[@name='error definitions']/errordef[@name]")
}

Then, every time you need to get the error id and description, look it up by code:

error_code = root.find("error").attrib["code"]
print(errors.get(error_code, "Unknown Error"))

Note that the xpath() method is coming from lxml.etree. If you are using xml.etree.ElementTree, replace xpath() with findall() - the limited XPath support provided by xml.etree.ElementTree is enough for the provided expressions.

edited Jul 23, 2015 at 19:20

answered Jul 23, 2015 at 15:59

alecxe

476k127 gold badges1.1k silver badges1.2k bronze badges

2 Comments

Mawg Over a year ago

Can I use a wildcard to build the dictionary? I find that groups are named "XXX error definitions", "YYY error definitions", etc. Btw, I am using xml.etree.ElementTree, so calling findall()

Mawg Over a year ago

It gets worse. The error groups seem to be nested at different depths in different documents. Given the error's title, e.g "unknownDevice", how can I search everything under the root to get the corresponding id and description values? I have updated the question to reflect my comments here.

NITIN SRIVASTAV · Accepted Answer · 2017-07-12 08:56:51Z

1

You want to get the value of description and id for every errordef element, you could do this:

import xml.etree.ElementTree as ET
dict01={}
tree=ET.parse('grpError.xml')
root=tree.getroot()
print (root)
docExe=root.findall('errordef') #Element reference
dict01=docExe[0].attrib #Store Attributes in dictionary
print (dict01)
print (dict01['id']) #Attributes of an element
print (dict01['name']) #Attributes of an element
print (docExe[0].find('description').text) #Child Elements inside parent Element

Output is:

<Element 'group' at 0x000001A582EDB4A8>
{'id': '0x11', 'name': 'UnknownDevice'}
0x11
UnknownDevice
Indicated device is unknown

answered Jul 12, 2017 at 8:56

NITIN SRIVASTAV

134 bronze badges

Comments

dkx22 · Accepted Answer · 2015-07-23 15:59:47Z

0

You need a selector, though I'm not really sure you can do this with lxml. It has css selector but I don't find anything to select an "id" in the doc... I only used lxml to remove/add stuff to html. Maybe take a look at scrapy? Using scrapy it would look like this when you loaded your html.

response.xpath('//div[@id="0x11"]/text()').extract()

answered Jul 23, 2015 at 15:59

dkx22

1,1331 gold badge14 silver badges27 bronze badges

Collectives™ on Stack Overflow

Problems parsing an XML file with xml.etree.ElementTree

4 Answers 4

4 Comments

2 Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

4 Comments

2 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related