3

I have a dict mapping each xml tag to a dict key. I want to loop through each tag and text field in the xml, and compare it with the associated dict key value which is the key in another dict.

<2gMessage>
    <Request>
        <pid>daemon</pid>
        <emf>123456</emf>
        <SENum>2041788209</SENum>
        <MM>
            <MID>jbr1</MID>
            <URL>http://jimsjumbojoint.com</URL>
        </MM>
        <AppID>reddit</AppID>
        <CCS>
            <Mode>
                <SomeDate>true</CardPresent>
                <Recurring>false</Recurring>
            </Mode>
            <Date>
                <ASCII>B4788250000028291^RRR^15121015432112345601</ASCII>
            </Date>
            <Amount>100.00</Amount>
        </CCS>
    </Request>
</2gMessage>

The code I have so far:

parser = etree.XMLParser(ns_clean=True, remove_blank_text=True)
tree   = etree.fromstring(strRequest, parser)
for tag in tree.xpath('//Request'):
    subfields = tag.getchildren()
    for subfield in subfields:
        print (subfield.tag, subfield.text)
return strRequest

But, this only prints the tags which are direct children of Request, I want to be able to access the subchildren on children if it is an instance in the same loop. I don't want to hardcode values, as the tags and structure could be changed.

2
  • I try to run your code but gives errors. Besides that, the xml is also bad formed. Commented Feb 9, 2015 at 17:22
  • 1
    Have you tried beautiful soup 4? Very intuitive DOM navigation. "Beautiful Soup sits on top of popular Python parsers like lxml and html5lib, allowing you to try out different parsing strategies or trade speed for flexibility." Commented Feb 9, 2015 at 17:34

1 Answer 1

11

You could try it with the iter() function. It will traverse through all the children elements. The comparison of the length is to print only those that has no children:

A complete script like this one:

from lxml import etree
tree = etree.parse('xmlfile')
for tag in tree.iter():
    if not len(tag):
        print (tag.tag, tag.text)

Yields:

pid daemon
emf 123456
SENum 2041788209
MID jbr1
URL http://jimsjumbojoint.com
AppID reddit
CardPresent true
Recurring false
ASCII B4788250000028291^RRR^15121015432112345601
Amount 100.00
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.