python etree with xpath and namespaces with prefix

Question

I can't find info, how to parse my XML with namespace:

I have this xml:

<par:Request xmlns:par="http://somewhere.net/actual">
  <par:actual>blabla</par:actual>
  <par:documentType>string</par:documentType>
</par:Request>

And tried to parse it:

dom = ET.parse(u'C:\\filepath\\1.xml')
rootxml = dom.getroot()
for subtag in rootxml.xpath(u'//par:actual'):
    #do something
    print(subtag)

And got exception, because it doesn't know about namespace prefix. Is there best way to solve that problem, counting that script will not know about file it going to parse and tag is going to search for?

Searching web and stackoverflow I found, that if I will add there:

namespace = {u'par': u"http://somewhere.net/actual"}
for subtag in rootxml.xpath(u'//par:actual', namespaces=namespace):
    #do something
    print(subtag)

That works. Perfect. But I don't know which XML I will parse, and searching tag (such as //par:actual) is also unknown to my script. So, I need to find way to extract namespace from XML somehow.

I found a lot of ways, how to extract namespace URI, such as:

print(rootxml.tag)
print(rootxml.xpath('namespace-uri(.)'))
print(rootxml.xpath('namespace-uri(/*)'))

But how should I extract prefix to create dictionary which ElementTree wants from me? I don't want to use regular expression monster over xml body to extract prefix, I believe there have to exist supported way for that, isn't it?

And maybe there have to exist some methods for me to extract by ETree namespace from XML as dictionary (as ETree wants!) without hands manipulation?

Steven · Accepted Answer · 2014-11-18 13:27:49Z

5

You cannot rely on the namespace declarations on the root element: there is no guarantee that the declarations will even be there, or that the document will have the same prefix for the same namespace throughout. Assuming you are going to have some way of passing the tag you want to search (because you say it is not known by your script), you should also provide a way to pass a namespace mapping as well. Or use the James Clark notation, like {http://somewhere.net/actual}actual (the ETXPath has support for this syntax, whereas "normal" xpath does not, but you can also use other methods like .findall() if you don't need full xpath)

If you don't care for the prefix at all, you could also use the local-name() function in xpath, eg. //*[local-name()="actual"] (but you won't be "really" sure it's the right "actual")

answered Nov 18, 2014 at 13:27

Steven

28.9k6 gold badges64 silver badges51 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Arkady Over a year ago

Yes, I thought about it. First I even tried to induce user to use {http://somewhere.net/actual}actual, to be sure he understand what 'actual' he use. But that also don't works and etree don't understand '//{...}actual', throwing exception. But then I just added input namespace from user and comparing it with exist xml namespace, so, that problem was solved.

Steven Over a year ago

The ETXPath class should solve the problem of not understanding the {} syntax, but you won't be able to use it with the .xpath() method, you should use it like the XPath class (when using compiled xpath expressions). Example: path = etree.ETXPath('//{http://somewhere.net/actual}actual') and then to use it results = path(rootxml)

Arkady Over a year ago

That's true. I'll use you advice :-)

Arkady · Accepted Answer · 2014-11-18 13:43:07Z

5

Oh, I found it.

After we do that:

dom = ET.parse(u'C:\\filepath\\1.xml')
rootxml = dom.getroot()

Object rootxml contains dictionary nsmap, which contains all namespaces that I want.

So, simplest solution I've found:

dom = ET.parse(u'C:\\filepath\\1.xml')
rootxml = dom.getroot()
nss = rootxml.nsmap
for subtag in rootxml.xpath(u'//par:actual', namespaces=nss):
    #do something
    print(subtag)

That works.

UPD: that works if user understand what means 'par' in XML he works with. For example, comparing supposed namespace with existing namespace before any other operations.

Still, I like much variant with XPath that understands {...}actual, that was what I tried to achieve.

edited Nov 18, 2014 at 13:43

answered Nov 18, 2014 at 11:49

Arkady

2,2473 gold badges28 silver badges50 bronze badges

2 Comments

kakyo Over a year ago

I failed to make the .nsmap work with py3.6 ` Traceback (most recent call last): File "ElementTree_Xpath_xmlns_namespace.py", line 61, in <module> nss = root.nsmap AttributeError: 'xml.etree.ElementTree.Element' object has no attribute 'nsmap' `

Arkady Over a year ago

it was python 2.7... if you will find solution for 3.6 issue please write here a reply.

Billy Willoughby · Accepted Answer · 2021-03-11 21:42:50Z

1

With Python 3.8.2 I found this question with the same issue.

This is the solution I found, put the namespace in the XPath query. (Between the {})

ApplicationArea = BOD_IN_tree.find('.//ApplicationArea', ns)
if(ApplicationArea is None):
  ApplicationArea = BOD_IN_tree.find('.//{http://www.defaultNamespace.com/2}ApplicationArea', ns)

I search for the element without the namespace, then search again if it's not found. I have no control over the inbound documents, some have namespaces, some do not.

I hope this helps!

answered Mar 11, 2021 at 21:42

Billy Willoughby

86611 silver badges15 bronze badges

Collectives™ on Stack Overflow

python etree with xpath and namespaces with prefix

3 Answers 3

3 Comments

2 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

3 Comments

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related