I am writing a script that processes a rdf:skos file with python3 and lxml:
I learnt that I need to pass to the findall procedure the namespaces that the XML mentions. (Ok, strange, since the XML files lists these in the header, so this seems like an unnecessary step but anyway).
When calling
for concept in root.findall('.//skos:Concept', namespaces=root.nsmap):
that works, because a root.nsmap is constructed by lxml.
But then later in my code I also need to perform a test on xml:lang
for pl in concept.findall(".//skos:prefLabel[@xml:lang='en']", namespaces=root.nsmap):
and here python tells me
SyntaxError: prefix 'xml' not found in prefix map
Ok, true, in my skos file there is no extra declaration for the xml namespace. So I try to add it to the root.nsmap dict
root.nsmap['xml'] = "http://www.w3.org/XML/1998/namespace"
but that too doesn't work
nsmap = {'rdf': 'http://www.w3.org/1999/02/22-rdf-syntax-ns#', 'uneskos': 'http://purl.org/umu/uneskos#', 'iso-thes': 'http://purl.org/iso25964/skos-thes#', 'dcterms': 'http://purl.org/dc/terms/', 'skos': 'http://www.w3.org/2004/02/skos/core#', 'rdfs': 'http://www.w3.org/2000/01/rdf-schema#'}
Seems I am not allowed to modify the root.nsmap?
Anyone an idea how this is done? I have processed tons of XML in the past with Perl XML::Twig which is very very comfortable and I assmue, the Python community has (at least) similarly comfortable ways to do that ... but how?
Any hint appreciated.