Remove class attribute from HTML using Python and lxml

Question

Question

How do I remove class attributes from html using python and lxml?

Example

I have:

<p class="DumbClass">Lorem ipsum dolor sit amet, consectetur adipisicing elit</p>

I want:

<p>Lorem ipsum dolor sit amet, consectetur adipisicing elit</p>

What I've tried so far

I've checked out lxml.html.clean.Cleaner however, it does not have a method to strip out class attributes. You can set safe_attrs_only=True however, this does not remove the class attribute.

Significant searching has turned up nothing workable. I think the fact that class is used in both html and python further muddies search results. Many of the results also seem to deal strictly with xml as well.

I'm open to other python modules that offer humane interfaces as well.

Thanks much.

Solution

Thanks to @Dan Roberts answer below, I came up with the following solution. Presented for folks arriving here in the future trying to solve the same problem.

import lxml.html

# Our html string we want to remove the class attribute from
html_string = '<p class="DumbClass">Lorem ipsum dolor sit amet, consectetur adipisicing elit</p>'

# Parse the html
html = lxml.html.fromstring(html_string)

# Print out our "Before"
print lxml.html.tostring(html)

# .xpath below gives us a list of all elements that have a class attribute
# xpath syntax explained:
# // = select all tags that match our expression regardless of location in doc
# * = match any tag
# [@class] = match all class attributes
for tag in html.xpath('//*[@class]'):
    # For each element with a class attribute, remove that class attribute
    tag.attrib.pop('class')

# Print out our "After"
print lxml.html.tostring(html)

Thanks. I figure if folks are nice enough to help me, I gotta pay it forward and make it easy for them and others in the future :) — Jeff
– Jeff, Commented Apr 6, 2012 at 16:09

Dan Roberts · Accepted Answer · 2012-04-05 23:27:48Z

18

I can't test this at the moment but this appears to be the general idea

for tag in node.xpath('//*[@class]'):
    tag.attrib.pop('class')

answered Apr 5, 2012 at 23:27

Dan Roberts

4,6943 gold badges37 silver badges46 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Jeff Over a year ago

Thanks Dan. Your code worked. I added my solution based on your suggestion here as an addendum to my question for others.

Benoît Galy · Accepted Answer · 2019-11-24 15:14:03Z

3

lxml.html.clean.Cleaner does work, but needs proper configuration.

import lxml.html
from lxml.html import clean

html_string = '<p id="test" class="DumbClass">Lorem ipsum dolor sit amet, consectetur adipisicing elit</p>'
tree = html.fromstring(html_string)

cleaner = html.clean.Cleaner()
cleaner.safe_attrs_only = True
cleaner.safe_attrs=frozenset(['id'])
cleaned = cleaner.clean_html(tree)
print(html.tostring(cleaned))

Result in :

b'<p id="test">Lorem ipsum dolor sit amet, consectetur adipisicing elit</p>'

answered Nov 24, 2019 at 15:14

Benoît Galy

836 bronze badges

Comments

hahakubile · Accepted Answer · 2014-07-28 08:52:18Z

0

For lxml elment, the .attrib object contains the dict of attributes, you can just del it as you like.

Below is just a simple example to show how to replace an attribute name in html.

Given html:

<div><img src="http://www.example.com/logo.png"></div>

Code:

from lxml.html import fromstring
from lxml.html import _transform_result

html = "<div><img src=\"http://www.example.com/logo.png\"></div>"
doc = fromstring(html)
for el in doc.iter('img'):
    if "src" in el.attrib:
        el.set('data-src', el.get('src'))
        del el.attrib["src"]
print _transform_result(type(html), doc)

Result:

<div><img data-src="http://www.example.com/logo.png"></div>

answered Jul 28, 2014 at 8:52

hahakubile

7,6205 gold badges30 silver badges18 bronze badges

Collectives™ on Stack Overflow

Remove class attribute from HTML using Python and lxml

Question

Example

What I've tried so far

Solution

3 Answers 3

1 Comment

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

Question

Example

What I've tried so far

Solution

3 Answers 3

1 Comment

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related