How to extract text from a html table row

Question

This is my string :

content = '<tr class="cart-subtotal"><th>RTO / Registration office :</th><td><span class="amount"><h5>Yadgiri</h5></span></td></tr>'

I have tried below regular expression to extract the text which is in between h5 element tag:

   reg = re.search(r'<tr class="cart-subtotal"><th>RTO / Registration office :</th><td><span class="amount"><h5>([A-Za-z0-9%s]+)</h5></span></td></tr>' % string.punctuation,content)

It's exactly returns what I wants.

Is there any more pythonic way to get this one ?

i want in regular expression instead of beautifulsoup and scrapy. — Veera Balla Deva
– Veera Balla Deva, Commented Jan 18, 2018 at 12:30
Do NOT use regex for parsing html/xml/tag-style data. See here — James
– James, Commented Jan 18, 2018 at 12:33

Srevilo · Accepted Answer · 2018-01-18 12:34:03Z

2

Dunno whether this qualifies as more pythonic or not, but it handles it as HTML data.

from lxml import html
content = '<tr class="cart-subtotal"><th>RTO / Registration office :</th><td><span class="amount"><h5>Yadgiri</h5></span></td></tr>'
HtmlData = html.fromstring(content)
ListData = HtmlData.xpath(‘//text()’)

And to get the last element:

ListData[-1]

answered Jan 18, 2018 at 12:34

Srevilo

1741 silver badge11 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Srevilo Over a year ago

To install on a Debian based system use python3-lxml

Collectives™ on Stack Overflow

How to extract text from a html table row

1 Answer 1

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related