2

I have to parse xml files which contain entries like

<error code="UnknownDevice">
    <description />
</error>

which are defined elsewhere as

<group name="error definitions">
     <errordef id="0x11" name="UnknownDevice">
        <description>Indicated device is unknown</description>
     </errordef>
     ...
</group>

given

import xml.etree.ElementTree as ET

parser = ET.XMLParser()
parser.parser.UseForeignDTD(True)

tree = ET.parse(inputFileName, parser=parser)
root = tree.getroot()

How can I get those values for errorDef? I mean the value of id and of description?

How can I search for & extract those values, using unknownDevice?


[Update] The error groups have differing names, but always of the format "XXX error definitions", "YYY error definitions", etc

Further, they seem to be nested at different depths in different documents.

Given the error's title, e.g "unknownDevice", how can I search everything under the root to get the corresponding id and description values?

Can I go directly to them, using e.g "unknownDevice", or do I have to search first for the error groups?

2
  • 1
    Are you using lxml? All current answers assume that (they use the xpath method, available in lxml but not in the std lib ElementTree module). Commented Jul 23, 2015 at 17:01
  • No I am not (I am so new that I don't even know what it is). I will update the question when I get back to the office, to show what ET is. Thanks for pointing that out. Commented Jul 23, 2015 at 18:37

4 Answers 4

1

If you have this:

<group name="error definitions">
     <errordef id="0x11" name="UnknownDevice">
        <description>Indicated device is unknown</description>
     </errordef>
     ...
</group>

And you want to get the value of description and id for every errordef element, you could do this:

for err in tree.xpath('//errordef'):
    print err.get('id'), err.find('description').text

Which would give you something like:

0x11 Indicated device is unknown
Sign up to request clarification or add additional context in comments.

4 Comments

That usually means someone is angry their question got downvoted and is taking it out on everybody else. C'est la vie here on SO :)
That's something strange about this thread. I've got two downvotes for a, I personally think, a good answer. All of the answers got the downvotes though, but not the question. Sometimes, I just don't understand the community here.
You and me, both. I particularly dislike downvotes without explanation, but my pet peeve is "drive-by" "me too" downvoting, where as soon as you go negative, they pile on the downvotes with no explanation. Btw, I am working on applying your answer. It is looking good so far
Yup, this works - but I forgot to mention that the groups nested at different depths in different documents. So, I asked a new question to cover that at stackoverflow.com/questions/31605115/… and have added my own brute force solution - feel fre to improve upon it
1

First, parse the error definitions into a dictionary:

errors = {
    errordef.attrib["name"]: {"id": errordef.attrib.get("id"), "description": errordef.findtext("description")}
    for errordef in root.xpath(".//group[@name='error definitions']/errordef[@name]")
}

Then, every time you need to get the error id and description, look it up by code:

error_code = root.find("error").attrib["code"]
print(errors.get(error_code, "Unknown Error"))

Note that the xpath() method is coming from lxml.etree. If you are using xml.etree.ElementTree, replace xpath() with findall() - the limited XPath support provided by xml.etree.ElementTree is enough for the provided expressions.

2 Comments

Can I use a wildcard to build the dictionary? I find that groups are named "XXX error definitions", "YYY error definitions", etc. Btw, I am using xml.etree.ElementTree, so calling findall()
It gets worse. The error groups seem to be nested at different depths in different documents. Given the error's title, e.g "unknownDevice", how can I search everything under the root to get the corresponding id and description values? I have updated the question to reflect my comments here.
1

You want to get the value of description and id for every errordef element, you could do this:

import xml.etree.ElementTree as ET
dict01={}
tree=ET.parse('grpError.xml')
root=tree.getroot()
print (root)
docExe=root.findall('errordef') #Element reference
dict01=docExe[0].attrib #Store Attributes in dictionary
print (dict01)
print (dict01['id']) #Attributes of an element
print (dict01['name']) #Attributes of an element
print (docExe[0].find('description').text) #Child Elements inside parent Element

Output is:

<Element 'group' at 0x000001A582EDB4A8>
{'id': '0x11', 'name': 'UnknownDevice'}
0x11
UnknownDevice
Indicated device is unknown

Comments

0

You need a selector, though I'm not really sure you can do this with lxml. It has css selector but I don't find anything to select an "id" in the doc... I only used lxml to remove/add stuff to html. Maybe take a look at scrapy? Using scrapy it would look like this when you loaded your html.

response.xpath('//div[@id="0x11"]/text()').extract()

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.