3

I m working with BeautifulSoup in Python for scraping a webpage. The html under issue looks like below:

<td><a href="blah.html>blahblah</a></td>
<td>line2</td>
<td></td>

i wish to take the contents of the td tag. So for the first td, i need the "blahblah" text and for the next td, i want to write "line2" and for the last td, "blank" because there is no content.

my code snippet looks like this -

row = [] 
for each_td in td:                        
    link = each_td.find_all('a')                                                
    if link:
        row.append(link[0].contents[0])
        row.append(link[0]['href'])
    elif each_td.contents[0] is None:
        row.append('blank')                
    else:
        row.append(each_td.contents[0])
print row

However on running, i get the error -

elif each_td.contents[0] is None:
IndexError: list index out of range

Note- i am working with beautifulsoup.

How do I test for the "no-content-td" and weite appropriately? Why is the "... is None" not working?

3 Answers 3

11

Who said that 'contents' has always at least one element? Obviously you encounter the situation that 'contents' has no elements and therefore you will this error.

A more appropriate check would be:

if each_td.contents:

or

if len(each_td.contents) > 0:

But your preassumption is just wrong.

Sign up to request clarification or add additional context in comments.

Comments

4

You can use .text to get the text.

row = [] 
for each_td in td:
    row.append(each_td.text)
print row

Comments

1

You can handle the exception . Below is the code

try:
  row.append(each_td.contents[0])
except IndexError:
  //do what is required if it is empty ...

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.