Is there simpler way to get all nested text inside of ElementTree?

Question

I am currently using the xml.etree Python library to parse HTML.

After finding a target DOM element, I am attempting to extract its text. Unfortunately, it seems that the .text attribute is severely limited in its functionality and will only return the immediate inner text of an element (and not anything nested). Do I really have to loop through all the children of the ElementTree? Or is there a more elegant solution?

Hermann12 · Accepted Answer · 2025-05-10 19:12:13Z

1

You can use itertext(), too. If you don’t like the whitespaces, indention and line break you can use strip().

import xml.etree.ElementTree as ET

html = """<html>
    <head>
        <title>Example page</title>
    </head>
    <body>
        <p>Moved to <a href="http://example.org/">example.org</a>
        or <a href="http://example.com/">example.com</a>.</p>
    </body>
</html>"""

root = ET.fromstring(html)

target_element = root.find(".//body")

# get all text
all_text = ''.join(target_element.itertext())

# get all text and remove line break etc.
all_text_clear = ' '.join(all_text.split())

print(all_text)
print(all_text_clear)

Output:

        Moved to example.org
        or example.com.
    
Moved to example.org or example.com.

edited May 10 at 19:12

answered May 10 at 18:22

Hermann12

4,1282 gold badges8 silver badges21 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

LMC · Accepted Answer · 2025-05-10 03:41:09Z

1

The descendant XPath axis should return descendant nodes (including whitespaces)

For example:

//body/descendant::text() or //body/descendant::*/text()

As a generic case

//xpath/to/target/element/descendant::text()

answered May 10 at 3:41

LMC

14.4k3 gold badges34 silver badges62 bronze badges

Collectives™ on Stack Overflow

Is there simpler way to get all nested text inside of ElementTree?

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related