1
import os, re, sys, urllib2
from bs4 import BeautifulSoup
import lxml

def get_epg(channel, html):
    soup = BeautifulSoup(html, "lxml")
    main_div = soup.find("div", {"class":"viewport-container"})
    elements = main_div.find_all("li")
    for element in elements:
        cmp = element.find("div", { "class" : "channel" } ).getText()
        #return cmp
        if channel == cmp:
            print "found"
            return element

EPG_URL = "http://www.hoerzu.de/tv-programm/jetzt/"
html = urllib2.urlopen(EPG_URL)
print get_epg("ZDF", html)

results in:

Traceback (most recent call last):
  File "epg.py", line 17, in <module>
    print get_epg("ZDF", html)
  File "epg.py", line 10, in get_epg
    cmp = element.find("div", { "class" : "channel" } ).getText()
AttributeError: 'NoneType' object has no attribute 'getText'

I really don't get what is wrong here, because when I do:

    for element in elements:
        cmp = element.find("div", { "class" : "channel" } ).getText()
        return cmp

the error doesn't shows up and all goes as expected ...

1
  • Downvote for an improper claim and improper analysis. Commented Apr 16, 2013 at 4:27

2 Answers 2

3

The second iteration clearly returns None.

<div class="channel">Das Erste</div>
None
None
None
<div class="channel">ZDF</div>
None
None
None
<div class="channel">RTL</div>
None
None
None
<div class="channel">Sat.1</div>
None
None
None
<div class="channel">ProSieben</div>
None
None
None
<div class="channel">kabel eins</div>
None
None
None
<div class="channel">RTL II</div>
None
None
None
<div class="channel">VOX</div>
None
None
None
<div class="channel">Arte</div>
None
None
None
<div class="channel">3sat</div>
None
None
None
<div class="channel">Super RTL</div>
None
None
None
<div class="channel">KiKA</div>
None
None
None
<div class="channel">NDR</div>
None
None
None
<div class="channel">WDR</div>
None
None
None
<div class="channel">MDR</div>
None
None
None
<div class="channel">BR</div>
None
None
None
<div class="channel">SWR</div>
None
None
None
<div class="channel">HR</div>
None
None
None
<div class="channel">RBB</div>
None
None
None
<div class="channel">n-tv</div>
None
None
None
<div class="channel">N24</div>
None
None
None
<div class="channel">Servus TV</div>
None
None
None
<div class="channel">SPORT1</div>
None
None
None
<div class="channel">TV.Berlin</div>
None
None
None
<div class="channel">Hamburg 1</div>
None
None
None
<div class="channel">Eurosport</div>
None
None
None
<div class="channel">München TV</div>
None
None
None
<div class="channel">Franken Fernsehen</div>
None
None
None
<div class="channel">Tele 5</div>
None
None
None
<div class="channel">Das VIERTE</div>
None
None
None
<div class="channel">NRW TV</div>
None
None
None
<div class="channel">Nickelodeon / Comedy Central</div>
None
None
None

So you have to check for this condition instead of calling blindly getText().

Sign up to request clarification or add additional context in comments.

Comments

-1

from bs4 import BeautifulSoup

you should use "main_div.findAll"

for Bs4: find_all ---> findAll

1 Comment

You should look at improving the quality of this answer.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.