root.findall doesn't return anything on XML parsing [duplicate]

Question

I'm trying to parse an xml file using Python through root.findall.

Basically my file looks like this - and I'm trying to access elements under "Level3".

Edit: @trincot, already provided solution.....but, Now, I've added namespace to the sample data(xmlns="http://xyz.abc/forms"), which is causing the trouble. Why would adding 'xmlns=' cause the issue ? :O

<?xml version="1.0" encoding="UTF-8"?>
<env:Envelope xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://xyz.abc/forms" xmlns:abc="http://bus-message-envelope" xmlns:env="http://www.w3.org/2003/05/soap-envelope" abc:version="1-2">
    <env:Header>
        <abc:col1>col1Text</abc:col1>
        <abc:col2>col2Text</abc:col2>
        <abc:col3>col3Text</abc:col3>
    </env:Header>
    <env:Body>
        <Level1>
            <Level2 schemaVersion="1-1">
                <Level3>
                    <cell1>cell1Text</cell1>
                    <cell2>cell2Text</cell2>
                    <cell3>cell3Text</cell3>
                    <cell4>cell4Text</cell4>
                </Level3>
            </Level2>
        </Level1>
    </env:Body>
</env:Envelope>

Trying this, but doesn't return anything :

from xml.etree import ElementTree
tree = ElementTree.parse("/tmp/test.xml")
root = tree.getroot()

for form in root.findall(".//Level3"):
 print(form.text)
 print("Inside Loop") --> Not even hitting this

Expected Output:

cell1Text
cell2Text
cell3Text
cell4Text

I was able to access the same elements through code below. But, how to achieve this using findall?

for x in root[1][0][0][0]:
 print(x.text)

Output:

cell1Text
cell2Text
cell3Text
cell4Text

I did go through most of Stack Overflow, but couldn't get an answer to this. Tried many things but failed :( .

This is out of my wheelhouse, but FWIW, you should make a minimal reproducible example with complete code. I'm wondering how you created root exactly. — wjandrea
– wjandrea, Commented Aug 12, 2022 at 17:49
That update changes everything. See duplicate links for how to account for namespaces in your XPath expressions. — kjhughes
– kjhughes, Commented Aug 12, 2022 at 20:30

trincot · Accepted Answer · 2022-08-12 18:55:24Z

1

In the first code snippet you access form.text, but form corresponds to the Level3 element which has no other text than just white space. The actual text you want to output is sitting in its child nodes. So print(form.text) prints white space only.

The working code iterates the children of that same Level3 element:

for x in root[1][0][0][0]:
    print(x.text)

Here x is the deeper cellX element, which does have the text you expect.

To achieve this with findall do:

for x in root.findall(".//Level3/*"):
    print(x.text)

Note the extra level /* in the argument of findall, which means: any child element of Level3 elements.

See both the original and corrected code run on repl.it

If you didn't get any output with the first version, then please check spelling. It looks suspicious that the Elements in your XML sometimes start with a capital (like Level3) and sometimes not (like cell1). This could be a reason of not getting output. However, I loaded your code and XML as-is, and it produced the message "Inside Loop", as you can see when you follow the link above.

edited Aug 12, 2022 at 18:55

answered Aug 12, 2022 at 18:16

trincot

357k38 gold badges282 silver badges338 bronze badges

Sign up to request clarification or add additional context in comments.

9 Comments

kjhughes Over a year ago

Good, +1, but how do you account for print("Inside Loop") --> Not even hitting this?

trincot Over a year ago

I cannot account for that, as I cannot reproduce that problem, @kjhughes. I have added a link to repl.it where the asker can see for themselves that the output is generated in their first version of the code.

kjhughes Over a year ago

Right, I couldn't see how that line would not have been executed either. Thanks.

LMC Over a year ago

Level3 in the sample has a text node: \n . Try print("Text: '" + form.text + "'")

trincot Over a year ago

@LMC, yes, you are right -- I should say it doesn't have text other than white space. Updated.

|

Collectives™ on Stack Overflow

root.findall doesn't return anything on XML parsing [duplicate]

1 Answer 1

9 Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

9 Comments

Linked

Related