Python and Selenium - get text excluding child node's text

Question

Using Python 3.

Supposing:

<whatever>
  text
  <subchild>
    other
  </subchild>
</whatever>

If I do:

elem = driver.find_element_by_xpath("//whatever")

elem.text contains "text other"

If I do:

elem = driver.find_element_by_xpath("//whatever/text()[normalize-space()]")

elem is not Webelement.

How my I proceed to grab only "text" (and not "other")?

Id est: grab only text in direct node, not the child nodes.

UPDATE:

Original HTML is:

<div class="border-ashes the-code text-center">
VIVEGRPN
  <span class="cursor"></span>
  <button class="btn btn-ashes zclip" data-clipboard-target=".the-code" data-coupon-code="VklWRUdSUE4=">
  <span class="r">Hen, la.</span>
</div>

do the tags have ids or classes? Or are they just plain html tags? — Oceanic_Panda
– Oceanic_Panda, Commented Jul 21, 2017 at 13:03
They have ids and classes. I updated the question with the original HTML. — Álvaro N. Franz
– Álvaro N. Franz, Commented Jul 21, 2017 at 13:04
so then if div has a class of "border-ashes the-code text-center" what text does this return: driver.find_element_by_xpath("//div[@class='border-ashes the-code text-center']") — Oceanic_Panda
– Oceanic_Panda, Commented Jul 21, 2017 at 13:06
Does this answer your question? How to get text of an element in Selenium WebDriver, without including child element text? — Pikamander2
– Pikamander2, Commented Apr 15, 2020 at 4:01

Benjamin Loison · Accepted Answer · 2023-06-23 14:31:04Z

Bear in mind that the replacement approach mentioned by @Guy doesn't work for many structures.

For instance, having this structure:

<div>
    Hello World
    <b>e</b>
</div>

The parent text would be Hello World e, the child text would be e, and the replacement would result in Hllo World instead of Hello World.

A safe solution

To get the own text of an element in a safe manner, you have to iterate over the children of the node, and concat the text nodes. Since you can't do that in pure Selenium, you have to execute JS code.

OWN_TEXT_SCRIPT = "if(arguments[0].hasChildNodes()){var r='';var C=arguments[0].childNodes;for(var n=0;n<C.length;n++){if(C[n].nodeType==Node.TEXT_NODE){r+=' '+C[n].nodeValue}}return r.trim()}else{return arguments[0].innerText}"
parent_text = driver.execute_script(OWN_TEXT_SCRIPT, elem)

The script is a minified version of this simple function:

if (arguments[0].hasChildNodes()) {
    var res = '';
    var children = arguments[0].childNodes;
    for (var n = 0; n < children.length; n++) {
        if (children[n].nodeType == Node.TEXT_NODE) {
            res += ' ' + children[n].nodeValue;
        }
    }
    return res.trim()
}
else {
    return arguments[0].innerText
}

Guy · Accepted Answer · 2017-07-21 13:22:59Z

6

You can remove the child node text from the all text

all_text = driver.find_element_by_xpath("//whatever").text
child_text = driver.find_element_by_xpath("//subchild").text

parent_text = all_text.replace(child_text, '')

answered Jul 21, 2017 at 13:22

Guy

51.2k10 gold badges49 silver badges96 bronze badges

1 Comment

Alon G Over a year ago

this could cause problems in case they both have the same text

Radan · Accepted Answer · 2017-07-21 13:12:48Z

6

I had similar problem recently, where selenium always gave me all the text inside the element including the spans. I ended up splitting the string with newline "\n". for e.g.

all_text = driver.find_element_by_xpath(xpath).text
req_text = str.split(str(all_text ), "\n")[0]

answered Jul 21, 2017 at 13:12

Radan

1,6505 gold badges25 silver badges38 bronze badges

Comments

Jack Miller · Accepted Answer · 2021-06-09 11:37:32Z

0

You can firstly extract the outerHTML from the element, then build the soup with BeautifulSoup, and remove any element you want.

Small example:

el = driver.find_element_by_css_selector('whatever')
outerHTML = el.get_attribute('outerHTML')
soup = BeautifulSoup(outerHTML)
inner_elem = soup.select('subchild')[0].extract()
text_inner_elem = inner_elem.text
text_outer_elem = soup.text

edited Jun 9, 2021 at 11:37

Jack Miller

7,8064 gold badges53 silver badges82 bronze badges

answered May 21, 2020 at 9:02

magic_of_gnu

1

Collectives™ on Stack Overflow

Python and Selenium - get text excluding child node's text

4 Answers 4

Comments

1 Comment

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Comments

1 Comment

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related