1

I'm new to Python and currently learning to parse XML. All seems to be going well until I hit a wall with nested namespaces.

Below is an snippet of my xml ( with a beginning and child element that I'm trying to parse:

<?xml version="1.0" encoding="UTF-8"?>
-<CompositionPlaylist xmlns="http://www.digicine.com/PROTO-ASDCP-CPL-20040511#">
<!-- Generated by orca_wrapping version 3.8.3-0 -->
<Id>urn:uuid:e0e43007-ca9b-4ed8-97b9-3ac9b272be7a</Id>
-------------
-------------
------------- 
-<cc-cpl:MainClosedCaption xmlns:cc-cpl="http://www.digicine.com/PROTO- ASDCP-CC-CPL-20070926#"><Id>urn:uuid:0607e57f-edcc-46ec- 997a-d2fbc0c1ea3a</Id><EditRate>24 1</EditRate><IntrinsicDuration>2698</IntrinsicDuration></cc-cpl:MainClosedCaption>
------------
------------
------------
</CompositionPlaylist>

What I'm need is a solution to extract the URI of the local name 'MainClosedCaption'. In this case, I'm trying to extract the string "http://www.digicine.com/PROTO- ASDCP-CC-CPL-20070926#". I looked through a lot of tutorials but cannot seems to find a solution.

If there's anyone out there can lend your expertise, it would be much appreciated.

Here what I did so far with the help from the two contributors:

#!/usr/bin/env python

from xml.etree import ElementTree as ET #import ElementTree module as an alias ET
from lxml import objectify, etree

def parse():

import os
import sys
cpl_file = sys.argv[1]
xml_file = os.path.abspath(__file__)
xml_file = os.path.dirname(xml_file)
xml_file = os.path.join(xml_file,cpl_file)

with open(xml_file)as f:
    xml = f.read()

tree = etree.XML(xml)

caption_namespace = etree.QName(tree.find('.//{*}MainClosedCaption')).namespace

print caption_namespace
print tree.nsmap

nsmap = {}

for ns in tree.xpath('//namespace::*'):
    if ns[0]:
        nsmap[ns[0]] = ns[1]
tree.xpath('//cc-cpl:MainClosedCaption', namespace=nsmap)

return nsmap


if __name__=="__main__":

parse()

But it's not working so far. I got the result 'None' when I used QName to locate the tag and its namespace. And when I try to locate all namespace in the XML using for loop as suggested in another post, I got the error 'Unknown return type: dict'

Any suggestions pls?

5
  • I'm not following your description. In this example, exactly what string are you trying to extract? Commented May 8, 2015 at 0:01
  • I'm tryng to extract the namespace of the associated with the tag 'MainClosedCaption' Commented May 8, 2015 at 0:21
  • In this case, the string that I'm trying to extract from the xml is 'digicine.com/PROTO- ASDCP-CC-CPL-20070926#' Commented May 8, 2015 at 0:22
  • I found this solution that might be helpful. Commented May 8, 2015 at 1:20
  • @DanielTan Post some codes showing what you have tried so far. It is always easier for people to suggest solution based on what you have, instead of starting over from scratch. And usually, that kind of solution is easier for asker to understand too. Commented May 8, 2015 at 1:23

1 Answer 1

2

This program prints the namespace of the indicated tag:

from lxml import etree

xml = etree.XML('''<?xml version="1.0" encoding="UTF-8"?>
<CompositionPlaylist xmlns="http://www.digicine.com/PROTO-ASDCP-CPL-20040511#">
<!-- Generated by orca_wrapping version 3.8.3-0 -->
<Id>urn:uuid:e0e43007-ca9b-4ed8-97b9-3ac9b272be7a</Id>
<cc-cpl:MainClosedCaption xmlns:cc-cpl="http://www.digicine.com/PROTO-ASDCP-CC-CPL-20070926#">
<Id>urn:uuid:0607e57f-edcc-46ec- 997a-d2fbc0c1ea3a</Id>
<EditRate>24 1</EditRate>
<IntrinsicDuration>2698</IntrinsicDuration>
</cc-cpl:MainClosedCaption>
</CompositionPlaylist>
''')

print etree.QName(xml.find('.//{*}MainClosedCaption')).namespace

Result:

http://www.digicine.com/PROTO-ASDCP-CC-CPL-20070926#

Reference: http://lxml.de/tutorial.html#namespaces

Sign up to request clarification or add additional context in comments.

5 Comments

I did what you suggested but got 'None' as a result. Please see my original post for my codes.
When I run the code in your question against the XML in your question, I get http://www.digicine.com/PROTO-ASDCP-CC-CPL-20070926#. (Of course, I have to fix the typos in your XML first.) Perhaps the XML snippet in your question doesn't represent the XML you are actually using?
the complete XML is different with more child elements with the root tag. But I have also copied the exact code that you pasted here and I get 'None' as well.
I'm sorry, but I have no idea why we would each get different output from the exact same program.
By the way, Rob's suggestion worked for me. I'm currently having difficulty extract the //MainClosedCaption/Id element. stackoverflow.com/questions/37038148/…

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.