testing for tags with no content with beautifulsoup python

Question

I m working with BeautifulSoup in Python for scraping a webpage. The html under issue looks like below:

<td><a href="blah.html>blahblah</a></td>
<td>line2</td>
<td></td>

i wish to take the contents of the td tag. So for the first td, i need the "blahblah" text and for the next td, i want to write "line2" and for the last td, "blank" because there is no content.

my code snippet looks like this -

row = [] 
for each_td in td:                        
    link = each_td.find_all('a')                                                
    if link:
        row.append(link[0].contents[0])
        row.append(link[0]['href'])
    elif each_td.contents[0] is None:
        row.append('blank')                
    else:
        row.append(each_td.contents[0])
print row

However on running, i get the error -

elif each_td.contents[0] is None:
IndexError: list index out of range

Note- i am working with beautifulsoup.

How do I test for the "no-content-td" and weite appropriately? Why is the "... is None" not working?

user2665694 · Accepted Answer · 2012-09-09 06:24:27Z

11

Who said that 'contents' has always at least one element? Obviously you encounter the situation that 'contents' has no elements and therefore you will this error.

A more appropriate check would be:

if each_td.contents:

or

if len(each_td.contents) > 0:

But your preassumption is just wrong.

answered Sep 9, 2012 at 6:24

user2665694

Sign up to request clarification or add additional context in comments.

Comments

Sufian Latif · Accepted Answer · 2012-09-09 06:20:17Z

4

You can use .text to get the text.

row = [] 
for each_td in td:
    row.append(each_td.text)
print row

answered Sep 9, 2012 at 6:20

Sufian Latif

13.4k3 gold badges36 silver badges71 bronze badges

Comments

borngold · Accepted Answer · 2012-09-09 06:27:23Z

1

You can handle the exception . Below is the code

try:
  row.append(each_td.contents[0])
except IndexError:
  //do what is required if it is empty ...

answered Sep 9, 2012 at 6:27

borngold

692 silver badges6 bronze badges

Collectives™ on Stack Overflow

testing for tags with no content with beautifulsoup python

3 Answers 3

Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related