1

I'm trying to parse an xml file using Python through root.findall.

Basically my file looks like this - and I'm trying to access elements under "Level3".

Edit: @trincot, already provided solution.....but, Now, I've added namespace to the sample data(xmlns="http://xyz.abc/forms"), which is causing the trouble. Why would adding 'xmlns=' cause the issue ? :O

<?xml version="1.0" encoding="UTF-8"?>
<env:Envelope xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://xyz.abc/forms" xmlns:abc="http://bus-message-envelope" xmlns:env="http://www.w3.org/2003/05/soap-envelope" abc:version="1-2">
    <env:Header>
        <abc:col1>col1Text</abc:col1>
        <abc:col2>col2Text</abc:col2>
        <abc:col3>col3Text</abc:col3>
    </env:Header>
    <env:Body>
        <Level1>
            <Level2 schemaVersion="1-1">
                <Level3>
                    <cell1>cell1Text</cell1>
                    <cell2>cell2Text</cell2>
                    <cell3>cell3Text</cell3>
                    <cell4>cell4Text</cell4>
                </Level3>
            </Level2>
        </Level1>
    </env:Body>
</env:Envelope>

Trying this, but doesn't return anything :

from xml.etree import ElementTree
tree = ElementTree.parse("/tmp/test.xml")
root = tree.getroot()

for form in root.findall(".//Level3"):
 print(form.text)
 print("Inside Loop") --> Not even hitting this

Expected Output:

cell1Text
cell2Text
cell3Text
cell4Text

I was able to access the same elements through code below. But, how to achieve this using findall?

for x in root[1][0][0][0]:
 print(x.text)

Output:

cell1Text
cell2Text
cell3Text
cell4Text

I did go through most of Stack Overflow, but couldn't get an answer to this. Tried many things but failed :( .

3
  • This is out of my wheelhouse, but FWIW, you should make a minimal reproducible example with complete code. I'm wondering how you created root exactly. Commented Aug 12, 2022 at 17:49
  • 1
    Oops ! added the code on how I've created the root. Commented Aug 12, 2022 at 18:04
  • That update changes everything. See duplicate links for how to account for namespaces in your XPath expressions. Commented Aug 12, 2022 at 20:30

1 Answer 1

1

In the first code snippet you access form.text, but form corresponds to the Level3 element which has no other text than just white space. The actual text you want to output is sitting in its child nodes. So print(form.text) prints white space only.

The working code iterates the children of that same Level3 element:

for x in root[1][0][0][0]:
    print(x.text)

Here x is the deeper cellX element, which does have the text you expect.

To achieve this with findall do:

for x in root.findall(".//Level3/*"):
    print(x.text)

Note the extra level /* in the argument of findall, which means: any child element of Level3 elements.

See both the original and corrected code run on repl.it

If you didn't get any output with the first version, then please check spelling. It looks suspicious that the Elements in your XML sometimes start with a capital (like Level3) and sometimes not (like cell1). This could be a reason of not getting output. However, I loaded your code and XML as-is, and it produced the message "Inside Loop", as you can see when you follow the link above.

Sign up to request clarification or add additional context in comments.

9 Comments

Good, +1, but how do you account for print("Inside Loop") --> Not even hitting this?
I cannot account for that, as I cannot reproduce that problem, @kjhughes. I have added a link to repl.it where the asker can see for themselves that the output is generated in their first version of the code.
Right, I couldn't see how that line would not have been executed either. Thanks.
Level3 in the sample has a text node: \n . Try print("Text: '" + form.text + "'")
@LMC, yes, you are right -- I should say it doesn't have text other than white space. Updated.
|

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.