Get second element text with XPath?

Question

<span class='python'>
  <a>google</a>
  <a>chrome</a>
</span>

I want to get chrome and have it working like this already.

q = item.findall('.//span[@class="python"]//a')
t = q[1].text # first element = 0

I'd like to combine it into a single XPath expression and just get one item instead of a list.
I tried this but it doesn't work.

t = item.findtext('.//span[@class="python"]//a[2]') # first element = 1

And the actual, not simplified, HTML is like this.

<span class='python'>
  <span>
    <span>
      <img></img>
      <a>google</a>
    </span>
    <a>chrome</a>
  </span>
</span>

Your expression .//span[@class="python"]//a[2] works for me. — Ken Bloom
– Ken Bloom, Commented Nov 7, 2010 at 13:42
Hmmm it seems I have a mistake somewhere, or the simplification of the actual HTML I posted is too simple. I'll try and then modify the question. — user479870
– user479870, Commented Nov 7, 2010 at 13:47
@pdnsk: Good question, +1. See my answer for an explanation and for a simple solution. :) — Dimitre Novatchev
– Dimitre Novatchev, Commented Nov 7, 2010 at 15:37
so glad you posted this question. Been trying to figure out a similar problem for about a day. — Fractal
– Fractal, Commented Jun 19, 2019 at 14:58

Dimitre Novatchev · Accepted Answer · 2014-03-09 18:51:59Z

42

I tried this but it doesn't work.
t = item.findtext('.//span[@class="python"]//a[2]')

This is a FAQ about the // abbreviation.

.//a[2] means: Select all a descendents of the current node that are the second a child of their parent. So this may select more than one element or no element -- depending on the concrete XML document.

To put it more simply, the [] operator has higher precedence than //.

If you want just one (the second) of all nodes returned you have to use brackets to force your wanted precedence:

(.//a)[2]

This really selects the second a descendent of the current node.

For the actual expression used in the question, change it to:

(.//span[@class="python"]//a)[2]

or change it to:

(.//span[@class="python"]//a)[2]/text()

edited Mar 9, 2014 at 18:51

user479870

answered Nov 7, 2010 at 15:37

Dimitre Novatchev

244k27 gold badges308 silver badges438 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

user479870 Over a year ago

Thank you for the explanation, but I have one question, or actually two. If there is only one matching element, will [2] throw an exception or return None? And do you know why this works with xpath but not findtext?

Dimitre Novatchev Over a year ago

@pdnsk: My answer is pure XPath. I don't know Python.

user479870 Over a year ago

I tried and it just returns no element, which is good because one reason why I wanted to avoid lists and have it in a single expression is to not have an additional check.

Fractal Over a year ago

Been trying to figure out a similar answer for a full day. Thanks a ton for the help!

MattH · Accepted Answer · 2010-11-07 13:56:22Z

2

I'm not sure what the problem is...

>>> d = """<span class='python'>
...   <a>google</a>
...   <a>chrome</a>
... </span>"""
>>> from lxml import etree
>>> d = etree.HTML(d)
>>> d.xpath('.//span[@class="python"]/a[2]/text()')
['chrome']
>>>

answered Nov 7, 2010 at 13:56

MattH

38.4k11 gold badges85 silver badges84 bronze badges

Comments

score 2 · Accepted Answer · 2010-11-07 14:29:33Z

2

From Comments:

or the simplification of the actual HTML I posted is too simple

You are right. What is the meaning of .//span[@class="python"]//a[2]? This will be expanded to:

self::node()
 /descendant-or-self::node()
  /child::span[attribute::class="python"]
   /descendant-or-self::node()
    /child::a[position()=2]

It will finaly select the second a child (fn:position() refers to the child axe). So, nothing will be select if your document is like:

<span class='python'> 
  <span> 
    <span> 
      <img></img> 
      <a>google</a><!-- This is the first "a" child of its parent --> 
    </span> 
    <a>chrome</a><!-- This is also the first "a" child of its parent --> 
  </span> 
</span>

If you want the second of all descendants, use:

descendant::span[@class="python"]/descendant::a[2]

edited Nov 7, 2010 at 14:29

answered Nov 7, 2010 at 14:10

user357812

4 Comments

user479870 Over a year ago

It works with xpath but not with findtext, and returns a list with one item.

user357812 Over a year ago

@pdknsk: That's because this XPath expression return a node set result: it could be empty, it could be a singleton, it could be many spans with a "python" class an a second descendant... If you want the string value of the first of this results, use string() function with this expression as argument. I don't know what kind of data type can return your xpath method...

user479870 Over a year ago

It works. I used a combination of the previous answer, with /text(), and this answer, but I'll accept this answer because it details the problem. I only have one question. What is the short equivalent to /descandant::?

user357812 Over a year ago

@pdknsk: First, text() will return all the text node children. string() or the DOM method for string value will return the concatenation of all descendant text nodes. It's not the same. Second, there is no abbreviated form for descendant axe. My last expression it's equivalent to (.//span[@class="python"]//a)[2]? so the position() predicate gets applied to the whole expression not just last step.

Collectives™ on Stack Overflow

Get second element text with XPath?

3 Answers 3

4 Comments

Comments

4 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

4 Comments

Comments

4 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related