Python beautifulsoup trying to remove html tags 'span'

Question

I am trying to remove

[<span class="street-address">
            510 E Airline Way
           </span>]

and I have used this clean function to remove the one that is in between < >

def clean(val):
 if type(val) is not StringType: val = str(val)
 val = re.sub(r'<.*?>', '',val) 
 val = re.sub("\s+" , " ", val)
 return val.strip()

and it produces [ 510 E Airline Way ]

i am trying to add within "clean" function to remove the char '[' and ']' and basically i just want to get the "510 E Airline Way".

anyone has any clue what can i add to clean function?

thank you

where exactly are you using beautifulsoup here?

SilentGhost
– SilentGhost

2010-03-27 16:24:17 +00:00
Commented Mar 27, 2010 at 16:24 — SilentGhost
– SilentGhost, Commented Mar 27, 2010 at 16:24
See also stackoverflow.com/questions/1732348/…

a paid nerd
– a paid nerd

2010-04-04 01:56:46 +00:00
Commented Apr 4, 2010 at 1:56 — a paid nerd
– a paid nerd, Commented Apr 4, 2010 at 1:56

Max Shawabkeh · Accepted Answer · 2010-03-27 16:45:50Z

9

Using re:

>>> import re
>>> s='[<span class="street-address">\n            510 E Airline Way\n           </span>]'
>>> re.sub(r'\[|\]|\s*<[^>]*>\s*', '', s)
'510 E Airline Way'

Using BeautifulSoup:

>>> from BeautifulSoup import BeautifulSoup
>>> s='[<span class="street-address">\n            510 E Airline Way\n           </span>]'
>>> b = BeautifulSoup(s)
>>> b.find('span').getText()
u'510 E Airline Way'

Using lxml:

>>> from lxml import html
>>> s='[<span class="street-address">\n            510 E Airline Way\n           </span>]'
>>> h = html.document_fromstring(s)
>>> h.cssselect('span')[0].text.strip()
'510 E Airline Way'

answered Mar 27, 2010 at 16:45

Max Shawabkeh

38.7k10 gold badges85 silver badges92 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Python beautifulsoup trying to remove html tags 'span'

1 Answer 1

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related