2

I am using the following code from this tutorial (http://jeriwieringa.com/blog/2012/11/04/beautiful-soup-tutorial-part-1/).

from bs4 import BeautifulSoup

soup = BeautifulSoup (open("43rd-congress.html"))

final_link = soup.p.a
final_link.decompose()

trs = soup.find_all('tr')

for tr in trs:
    for link in tr.find_all('a'):
        fulllink = link.get ('href')
print fulllink #print in terminal to verify results

tds = tr.find_all("td")

try: #we are using "try" because the table is not well formatted. This allows the program to continue after encountering an error.
    names = str(tds[0].get_text()) # This structure isolate the item by its column in the table and converts it into a string.
    years = str(tds[1].get_text())
    positions = str(tds[2].get_text())
    parties = str(tds[3].get_text())
    states = str(tds[4].get_text())
    congress = tds[5].get_text()

except:
    print "bad tr string"
    continue #This tells the computer to move on to the next item after it encounters an error

print names, years, positions, parties, states, congress

However, I get an error saying that 'continue' is not properly in the loop on line 27. I am using notepad++ and windows powershell. How do I make this code work?

2
  • If you want to continue the upcoming process use "pass" instead of "continue". Continue is for continuing the next iteration of the loop. but you are using continue outside the loop Commented Oct 21, 2013 at 3:56
  • Same thing as in your last questions: Indent your code properly and please read some beginner's tutorial for python. Commented Oct 21, 2013 at 7:49

5 Answers 5

2

Everything from print fulllink down is outside the for loop

for tr in trs:
    for link in tr.find_all('a'):
        fulllink = link.get ('href')
    ## indented here!!!!!
    print fulllink #print in terminal to verify results

    tds = tr.find_all("td")

    try: #we are using "try" because the table is not well formatted. This allows the program to continue after encountering an error.
        names = str(tds[0].get_text()) # This structure isolate the item by its column in the table and converts it into a string.
        years = str(tds[1].get_text())
        positions = str(tds[2].get_text())
        parties = str(tds[3].get_text())
        states = str(tds[4].get_text())
        congress = tds[5].get_text()

    except:
        print "bad tr string"
        continue #This tells the computer to move on to the next item after it encounters an error

    print names, years, positions, parties, states, congress
Sign up to request clarification or add additional context in comments.

1 Comment

the spacing from the left varies there it says which block the code belongs to indendation is must in python @RobB.
1

Looks like your indentation is off, try this.

from bs4 import BeautifulSoup

soup = BeautifulSoup (open("43rd-congress.html"))

final_link = soup.p.a
final_link.decompose()

trs = soup.find_all('tr')

for tr in trs:
    for link in tr.find_all('a'):
        fulllink = link.get ('href')

        print fulllink #print in terminal to verify results

        tds = tr.find_all("td")

        try: #we are using "try" because the table is not well formatted. This allows the program to continue after encountering an error.
            names = str(tds[0].get_text()) # This structure isolate the item by its column in the table and converts it into a string.
            years = str(tds[1].get_text())
            positions = str(tds[2].get_text())
            parties = str(tds[3].get_text())
            states = str(tds[4].get_text())
            congress = tds[5].get_text()

        except:
            print "bad tr string"
            continue #This tells the computer to move on to the next item after it encounters an error

        print names, years, positions, parties, states, congress

3 Comments

What is a good rule or way to figure out when you need to indent something?
Every time you have : you know you need to indent a block of code after it. For instance, a for loop will execute all of the code in the indentened block after it as many times as you tell it to. When you unindent, python knows that comes after you have done all of the loop. If you were to just indent one line after a for x: line, python would just execute that one line in the for loop.
sortfiend is on the right track, in addition you should probably familiarize yourself with the concept of blocks, basically a collection of statements that you want to run in conjunction with each other.
1

White space has significance in python.

This is where things go downhill:

for tr in trs:
    for link in tr.find_all('a'):
        fulllink = link.get ('href')
print fulllink #print in terminal to verify results

You should start, and continue, to indent the code with the appropriate number of tabs, for as long as you intend to loop.

for tr in trs:
    for link in tr.find_all('a'):
        fulllink = link.get ('href')
        print fulllink #print in terminal to verify results

2 Comments

No, I just wanted to write one loop.
Ah, no problem, think everyone's answered the question pretty solidly now :)
0

You have to indent your code another indentation level (ie 4 spaces/1 tab) beyond the indentation of the for loop. The try/except isn't I'm the for loop which is why you get the continue error.

Indentation shows where blocks go together (a for loop starts a new block and you need to indent underneath that)

Comments

0

My answer maybe this simple, but it really is not on a loop, it must be on a loop the same way break works on conditionals and loops. Maybe your indentation is off, it is a big MUST and really important in python.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.