3

I am writing a script that processes a rdf:skos file with python3 and lxml:

I learnt that I need to pass to the findall procedure the namespaces that the XML mentions. (Ok, strange, since the XML files lists these in the header, so this seems like an unnecessary step but anyway).

When calling

for concept in root.findall('.//skos:Concept', namespaces=root.nsmap):

that works, because a root.nsmap is constructed by lxml.

But then later in my code I also need to perform a test on xml:lang

for pl in concept.findall(".//skos:prefLabel[@xml:lang='en']", namespaces=root.nsmap):

and here python tells me

SyntaxError: prefix 'xml' not found in prefix map

Ok, true, in my skos file there is no extra declaration for the xml namespace. So I try to add it to the root.nsmap dict

root.nsmap['xml'] = "http://www.w3.org/XML/1998/namespace"

but that too doesn't work

nsmap = {'rdf': 'http://www.w3.org/1999/02/22-rdf-syntax-ns#', 'uneskos': 'http://purl.org/umu/uneskos#', 'iso-thes': 'http://purl.org/iso25964/skos-thes#', 'dcterms': 'http://purl.org/dc/terms/', 'skos': 'http://www.w3.org/2004/02/skos/core#', 'rdfs': 'http://www.w3.org/2000/01/rdf-schema#'}

Seems I am not allowed to modify the root.nsmap?

Anyone an idea how this is done? I have processed tons of XML in the past with Perl XML::Twig which is very very comfortable and I assmue, the Python community has (at least) similarly comfortable ways to do that ... but how?

Any hint appreciated.

1 Answer 1

1

Modifying root.nsmap has no effect. But you can create another dictionary and modify that one. Example:

from lxml import etree

doc = """
<root xmlns:skos="http://www.w3.org/2004/02/skos/core#">
   <skos:prefLabel xml:lang='en'>FOO</skos:prefLabel>
   <skos:prefLabel xml:lang='de'>BAR</skos:prefLabel>
</root>"""

root = etree.fromstring(doc)
nsmap = root.nsmap
nsmap["xml"] = "http://www.w3.org/XML/1998/namespace" 

en = root.find(".//skos:prefLabel[@xml:lang='en']", namespaces=nsmap)
print(en.text)

Output:

FOO
Sign up to request clarification or add additional context in comments.

2 Comments

Thanks a lot! works like a breeze. Issue solved.(Still, if I may, I find it strange that lxml a) forces me to make special arrangements for the default namespace "xml" that the original XML file does not even declare and b) that I can't extend the nsmap that the parser creates ...) Anyway, glad it works. Thanks again!!
I agree that it is strange that you are forced to declare the "xml" prefix in the Python code when it does not need to be declared in the XML file.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.