6

I use tree.xpath to iterate over all interesting HTML elements but I need to be able to tell whether the current element is part of a certain CSS class or not.

from lxml import html

mypage = """
<div class="otherclass exampleclass">some</div>
<div class="otherclass">things</div>
<div class="exampleclass">are</div>
<div class="otherclass">better</div>
<div>left</div>"""

tree = html.fromstring(mypage)

for item in tree.xpath( "//div" ):
  print("testing")
  #if "exampleclass" in item.getListOfClasses():
  #  print("foo")
  #else:
  #  print("bar")

The overall structure should remain the same.

What is a fast way to check whether or not the current div has the exampleclass class or not?

In above example, item is of lxml.html.HtmlElement class, which has the property classes but I don't understand what this means:

classes
A set-like wrapper around the 'class' attribute.

Get Method:
unreachable.classes(self) - A set-like wrapper around the 'class' attribute.

Set Method:
unreachable.classes(self, classes)

It returns a lxml.html.Classes object, which has a __iter__ method and it turns out iter() works. So I construct this code:

for item in tree.xpath( "//div" )
  match = False
  for classname in iter(item.classes):
    if classname == "exampleclass":
      match = True
  if match:
    print("foo")
  else:
    print("bar")

But I'm hoping there is a more elegant method.

I tried searching for similar questions but all I found were various "how do I get all elements of 'classname'", however I need all divs in the loop, I just want to treat some of them differently.

2 Answers 2

8

There is no need for iter, if "exampleclass" in item.classes: does the exact same thing, only more efficiently.

from lxml import html

mypage = """
<div class="otherclass exampleclass">some</div>
<div class="otherclass">things</div>
<div class="exampleclass">are</div>
<div class="otherclass">better</div>
<div>left</div>"""

tree = html.fromstring(mypage)

for item in tree.xpath("//div"):
    if "exampleclass" in item.classes:
        print("foo")

The difference is calling iter on a set makes the lookup linear so definitely not an efficient way to search a set, not much difference here but in some cases there would be a monumental diffrence:

In [1]: st = set(range(1000000))

In [2]: timeit 100000 in st
10000000 loops, best of 3: 51.4 ns per loop

In [3]: timeit 100000 in iter(st)
100 loops, best of 3: 1.82 ms per loop

You can also use css selectors using lxml:

for item in tree.cssselect("div.exampleclass"):
    print("foo")

Depending on the case, you may also be able to use contains:

for item in tree.xpath("//div[contains(@class, 'exampleclass')]"):
    print("foo")
Sign up to request clarification or add additional context in comments.

2 Comments

Nice, thanks. I can't use selectors though because I need divs with and without the class in the loop, updated sample code to hopefully make that clearer. xpath contains would be problematic in cases where the class exampleclass-numbertwo exists, see stackoverflow.com/a/1604480/188159
@qubodup, yep, that was why I added Depending on the case. Are you looking for more than one class or just that single class?
0

You can elegantly use the membership test operator in:

for item in tree.xpath( "//div" ):
  if "exampleclass" in iter(item.classes):
    print("foo")

For user-defined classes which do not define __contains__() but do define __iter__(), x in y is true if some value z with x == z is produced while iterating over y.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.