1

Unable to iterate through child tags within a child tag

have tried to find all child tags through root.iter() and iterate the same. however the output is not generated in the hierarchy of the tags

for child in root.iter():
    child_tag = child.tag

    for child in root.findall('.//' + child_tag):          
        txt = "tag1/" + "tag2/" + str(child_tag) + "/" + str(child)
        print(txt)

Expected output:

tag1
tag1/tag2
tag1/tag2/tag3
tag1/tag2/tag3/tag4
tag1/tag2/tag3/tag5
tag1/tag2/tag3/tag5/tag6

xml file details:

<tag1>
    <tag2>
        <tag3>
                <tag4>         </tag4>
                <tag5>  
                    <tag6>        </tag6>      
                </tag5>
        </tag3>
    </tag2>
</tag1>

Output received:

tag1
tag1/tag2
tag1/tag2/tag3
tag1/tag2/tag3/tag4
tag1/tag2/tag3/tag5
tag1/tag2/tag5/tag6

--- not as per hierarchy

1
  • If you look at your print statement, you only have room for 4 tag names: tag1, tag2, str(child_tag), str(child). So your last print won't be able to have the 5 levels of hierarchy like you want. You would need to store the grandparent of the current iterated element and output those after tag1/tag2. Commented Sep 12, 2019 at 12:00

1 Answer 1

2

Listing [Python 3.Docs]: xml.etree.ElementTree - The ElementTree XML API.

Hardcoding node tags ("tag1", "tag2": why only those and not others?) is a sign that something is (terribly) wrong.
Here's a simple variant that handles each XML node recursively.

code00.py:

#!/usr/bin/env python3

import sys
from xml.etree import ElementTree as ET


def iterate(node, path=""):
    if path:
        current_path = path + "/" + node.tag
    else:
        current_path = node.tag
    print("{0:s}".format(current_path))
    for child in node:
        iterate(child, path=current_path)


def main():
    xml_file_name = "./file00.xml"
    tree = ET.parse(xml_file_name)
    root = tree.getroot()
    iterate(root)


if __name__ == "__main__":
    print("Python {0:s} {1:d}bit on {2:s}\n".format(" ".join(item.strip() for item in sys.version.split("\n")), 64 if sys.maxsize > 0x100000000 else 32, sys.platform))
    main()
    print("\nDone.")

Output:

[cfati@CFATI-5510-0:e:\Work\Dev\StackOverflow\q057906081]> "e:\Work\Dev\VEnvs\py_064_03.07.03_test0\Scripts\python.exe" code00.py
Python 3.7.3 (v3.7.3:ef4ec6ed12, Mar 25 2019, 22:22:05) [MSC v.1916 64 bit (AMD64)] 64bit on win32

tag1
tag1/tag2
tag1/tag2/tag3
tag1/tag2/tag3/tag4
tag1/tag2/tag3/tag5
tag1/tag2/tag3/tag5/tag6

Done.
Sign up to request clarification or add additional context in comments.

3 Comments

Thanks..... hardcoding of first two tags were because there would be present for all the nodes. I will remove the hard coding now.
a follow-up question -- how do I add the output to a dataframe ?
Hmm, I don't know. I don't know the structure of the dataframe and other such details. You could ask another question though.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.