Parsing XML with lxml and elementtree

Question

I'm trying to parse XML document to return <input> nodes that contain a ref attribute. A toy example works but the document itself returns an empty array, when it should show a match.

toy example

import elementtree.ElementTree
from lxml import etree
tree = etree.XML('<body><input ref="blabla"><label>Cats</label></input><input ref="blabla"><label>Dogs</label></input><input ref="blabla"><label>Birds</label></input></body>')
# I can return the relevant input nodes with:
print len(tree.findall(".//input[@ref]"))
2

But working with the following (reduced) file for some reason fails:

example.xml

<?xml version="1.0"?>
<h:html xmlns="http://www.w3.org/2002/xforms" xmlns:ev="http://www.w3.org/2001/xml-events" xmlns:h="http://www.w3.org/1999/xhtml" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
  <h:head>
    <h:title>A title</h:title>
  </h:head>
  <h:body>
    <group ref="blabla">
      <label>Group 1</label>
      <input ref="blabla">
        <label>Field 1</label>
      </input>
    </group>
  </h:body>
</h:html>

script

import elementtree.ElementTree
from lxml import etree
with open ("example.xml", "r") as myfile:
  xml = myfile.read()
tree = etree.XML(xml)
print len(tree.findall(".//input[@ref]"))
0

Any idea why this fails, and how to workaround? I think it may have something to do with the XML header. Very grateful for any assistance.

What's the error message? What exactly fails?

kirbyfan64sos
– kirbyfan64sos

2015-08-26 00:23:31 +00:00
Commented Aug 26, 2015 at 0:23 — kirbyfan64sos
– kirbyfan64sos, Commented Aug 26, 2015 at 0:23

sideshowbarker · Accepted Answer · 2015-08-26 02:02:17Z

2

I think the problem is that the elements in your entire document are in particular namespaces, so that the un-namespaced .findall(".//input[@ref]")) expression doesn't match the input element in the document, which is actually a namespaced input element, in the http://www.w3.org/2002/xforms namespace.

So maybe try this:

.findall(".//{http://www.w3.org/2002/xforms}input[@ref]")

Updated after my original answer, to use the xforms namespace instead of the xhtml namespace (as had been noted in another answer).

edited Aug 26, 2015 at 2:02

answered Aug 26, 2015 at 0:40

sideshowbarker

89.2k30 gold badges219 silver badges216 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

geotheory Over a year ago

Hi sideshowbarker. Sorry still an empty array for me.

sideshowbarker Over a year ago

OK, I didn’t actually test it, but will do that right now and see what I get

geotheory Over a year ago

Ha ha! .findall(".//{http://www.w3.org/2002/xforms}input[@ref]") is the ticket :)

Anand S Kumar · Accepted Answer · 2015-08-26 01:44:06Z

As can be seen from your xml , the xml-namespace for non-prefixed elements is - "http://www.w3.org/2002/xforms" , This is because that is defined as the xmlns without any prefix in the parent element h:html , only elements prefixed h: have the namespace as "http://www.w3.org/1999/xhtml".

So you need to use that namespace in your query as well. Example -

root.findall(".//{http://www.w3.org/2002/xforms}input[@ref]")

Example/Demo -

>>> s = """<?xml version="1.0"?>
... <h:html xmlns="http://www.w3.org/2002/xforms" xmlns:ev="http://www.w3.org/2001/xml-events" xmlns:h="http://www.w3.org/1999/xhtml" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
...   <h:head>
...     <h:title>A title</h:title>
...   </h:head>
...   <h:body>
...     <group ref="blabla">
...       <label>Group 1</label>
...       <input ref="blabla">
...         <label>Field 1</label>
...       </input>
...     </group>
...   </h:body>
... </h:html>"""
>>> import xml.etree.ElementTree as ET
>>> root = ET.fromstring(s)
>>> root.findall(".//{http://www.w3.org/1999/xhtml}input[@ref]")
>>> root.findall(".//{http://www.w3.org/2002/xforms}input[@ref]")
[<Element '{http://www.w3.org/2002/xforms}input' at 0x02288EA0>]

Collectives™ on Stack Overflow

Parsing XML with lxml and elementtree

2 Answers 2

3 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related