python lxml - simply get/check class of HTML element

Question

I use tree.xpath to iterate over all interesting HTML elements but I need to be able to tell whether the current element is part of a certain CSS class or not.

from lxml import html

mypage = """
<div class="otherclass exampleclass">some</div>
<div class="otherclass">things</div>
<div class="exampleclass">are</div>
<div class="otherclass">better</div>
<div>left</div>"""

tree = html.fromstring(mypage)

for item in tree.xpath( "//div" ):
  print("testing")
  #if "exampleclass" in item.getListOfClasses():
  #  print("foo")
  #else:
  #  print("bar")

The overall structure should remain the same.

What is a fast way to check whether or not the current div has the exampleclass class or not?

In above example, item is of lxml.html.HtmlElement class, which has the property classes but I don't understand what this means:

classes
A set-like wrapper around the 'class' attribute.

Get Method:
unreachable.classes(self) - A set-like wrapper around the 'class' attribute.

Set Method:
unreachable.classes(self, classes)

It returns a lxml.html.Classes object, which has a __iter__ method and it turns out iter() works. So I construct this code:

for item in tree.xpath( "//div" )
  match = False
  for classname in iter(item.classes):
    if classname == "exampleclass":
      match = True
  if match:
    print("foo")
  else:
    print("bar")

But I'm hoping there is a more elegant method.

I tried searching for similar questions but all I found were various "how do I get all elements of 'classname'", however I need all divs in the loop, I just want to treat some of them differently.

Padraic Cunningham · Accepted Answer · 2016-09-19 22:40:52Z

8

There is no need for iter, if "exampleclass" in item.classes: does the exact same thing, only more efficiently.

from lxml import html

mypage = """
<div class="otherclass exampleclass">some</div>
<div class="otherclass">things</div>
<div class="exampleclass">are</div>
<div class="otherclass">better</div>
<div>left</div>"""

tree = html.fromstring(mypage)

for item in tree.xpath("//div"):
    if "exampleclass" in item.classes:
        print("foo")

The difference is calling iter on a set makes the lookup linear so definitely not an efficient way to search a set, not much difference here but in some cases there would be a monumental diffrence:

In [1]: st = set(range(1000000))

In [2]: timeit 100000 in st
10000000 loops, best of 3: 51.4 ns per loop

In [3]: timeit 100000 in iter(st)
100 loops, best of 3: 1.82 ms per loop

You can also use css selectors using lxml:

for item in tree.cssselect("div.exampleclass"):
    print("foo")

Depending on the case, you may also be able to use contains:

for item in tree.xpath("//div[contains(@class, 'exampleclass')]"):
    print("foo")

edited Sep 19, 2016 at 22:40

answered Sep 19, 2016 at 19:13

Padraic Cunningham

181k30 gold badges264 silver badges327 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

qubodup Over a year ago

Nice, thanks. I can't use selectors though because I need divs with and without the class in the loop, updated sample code to hopefully make that clearer. xpath contains would be problematic in cases where the class exampleclass-numbertwo exists, see stackoverflow.com/a/1604480/188159

Padraic Cunningham Over a year ago

@qubodup, yep, that was why I added Depending on the case. Are you looking for more than one class or just that single class?

qubodup · Accepted Answer · 2016-09-19 15:18:32Z

0

You can elegantly use the membership test operator in:

for item in tree.xpath( "//div" ):
  if "exampleclass" in iter(item.classes):
    print("foo")

For user-defined classes which do not define __contains__() but do define __iter__(), x in y is true if some value z with x == z is produced while iterating over y.

answered Sep 19, 2016 at 15:18

qubodup

9,8535 gold badges44 silver badges51 bronze badges

Collectives™ on Stack Overflow

python lxml - simply get/check class of HTML element

2 Answers 2

2 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related