6

for this xml

<Departments orgID="123" name="xmllist">
    <Department>
        <orgID>124</orgID>
        <name>A</name>
        <type>type a</type>
        <status>Active</status>
            <Department>
                <orgID>125</orgID>
                <name>B</name>
                <type>type b</type>
                <status>Active</status>
                <Department>
                    <orgID>126</orgID>
                    <name>C</name>
                    <type>type c</type>
                    <status>Active</status>
                </Department>
            </Department>
    </Department>
    <Department>
        <orgID>109449</orgID>
        <name>D</name>
        <type>type d</type>
        <status>Active</status>
    </Department>
</Departments>

How i can get all parents of a node using lxml etree in python.

Expected output : Input orgid=126 , it will return all the parents like ,

{'A':124,'B':125,'C':126}

2 Answers 2

7

Using lxml and XPath:

>>> s = '''
... <Departments orgID="123" name="xmllist">
...     <Department>
...         <orgID>124</orgID>
...         <name>A</name>
...         <type>type a</type>
...         <status>Active</status>
...             <Department>
...                 <orgID>125</orgID>
...                 <name>B</name>
...                 <type>type b</type>
...                 <status>Active</status>
...                 <Department>
...                     <orgID>126</orgID>
...                     <name>C</name>
...                     <type>type c</type>
...                     <status>Active</status>
...                 </Department>
...             </Department>
...     </Department>
...     <Department>
...         <orgID>109449</orgID>
...         <name>D</name>
...         <type>type d</type>
...         <status>Active</status>
...     </Department>
... </Departments>
... '''

Using ancestor-or-self axis, you can find the node itself, parent, grandparent, ...

>>> import lxml.etree as ET
>>> root = ET.fromstring(s)
>>> for target in root.xpath('.//Department/orgID[text()="126"]'):
...     d = {
...         dept.find('name').text: int(dept.find('orgID').text)
...         for dept in target.xpath('ancestor-or-self::Department')
...     }
...     print(d)
...
{'A': 124, 'C': 126, 'B': 125}
Sign up to request clarification or add additional context in comments.

4 Comments

Thanks , and what if i want include orgID=123 and name = xmllist in d ?
@Nishant, for depts in target.xpath('ancestor-or-self::Departments'): d[depts.get('name')] = depts.get('orgID') before print statement.
Thanks , But output seems to be unordered is there is any way to make it ordered ? Here we are getting {'A': 124, 'C': 126, 'B': 125} can we get it like {'A': 124, 'B': 125 ,'C': 126} ??
@Nishant, dict itself is unordered data structure. Use collection.OrderedDict if you want keep order. or list, ... if you don't need to use dict-like container.
5

Use lxml's iterancestors() method.

from lxml import etree

doc = etree.fromstring(xml)
rval = {}
for org in doc.xpath('//orgID[text()="126"]'):
    for ancestor in org.iterancestors('Department'):
        id=ancestor.find('./orgID').text
        name=ancestor.find('./name').text
        rval[name]=id

print rval 

output:

{'A': '124', 'C': '126', 'B': '125'}

If you're actually trying to preserve the order of the elements then you can't use a dict because you can't control the key order in a dict. You'll have to use an OrderedDict or just and array of tuples:

doc = etree.fromstring(xml)
a = []
for org in doc.xpath('//orgID[text()="126"]'):
    for ancestor in org.iterancestors():
        if ancestor.find('./orgID') is not None:
            id=ancestor.find('./orgID').text
            name=ancestor.find('./name').text
        elif ancestor.get('orgID'):
            id=ancestor.get('orgID')
            name=ancestor.get('name')
        else:
            continue

        print id,name
        a.append((name,id))

print "In order of discovery:\n    ", a 
print "From root to child\n    ", [x for x in reversed(a)]
print "dict keys are not sorted\n    ", dict(a)

Output:

126 C
125 B
124 A
123 xmllist
In order of discovery:
     [('C', '126'), ('B', '125'), ('A', '124'), ('xmllist', '123')]
From root to child
     [('xmllist', '123'), ('A', '124'), ('B', '125'), ('C', '126')]
dict keys are not sorted
     {'A': '124', 'xmllist': '123', 'C': '126', 'B': '125'}

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.