2

This webpage has multiple tables on it: http://www.nfl.com/player/tombrady/2504211/gamelogs .

Within the HTML all of the tables are labeled the exact same:

<table class="data-table1" width="100%" border="0" summary="Game Logs For Tom Brady In 2014">

I can scrape data from only the first table (Preseason table) but I do not know how to skip the first table (Preseason) and scrape data from the second and third tables (Regular Season and Post Season).

I'm trying to scrape specific numbers.

My code:

import pickle
import math
import urllib2
from lxml import etree
from bs4 import BeautifulSoup
from urllib import urlopen

year = '2014'
lastWeek = '2'
favQB1 = "Tom Brady"

favQBurl2 = 'http://www.nfl.com/player/tombrady/2504211/gamelogs'
favQBhtml2 = urlopen(favQBurl2).read()
favQBsoup2 = BeautifulSoup(favQBhtml2)
favQBpass2 = favQBsoup2.find("table", { "summary" : "Game Logs For %s In %s" % (favQB1, year)})
favQBrows2 = []

for row in favQBpass2.findAll("tr"):
    if lastWeek in row.findNext('td'):  
        for item in row.findAll("td"):
            favQBrows2.append(item.text)
print ("Enter: Starting Quarterback QB Rating of Favored Team for the last game played (regular season): "),
print favQBrows2[15]

2 Answers 2

2

Rely on the table title, which is located in the td element in the first table row:

def find_table(soup, label):
    return soup.find("td", text=label).find_parent("table", summary=True)

Usage:

find_table(soup, "Preseason")
find_table(soup, "Regular Season")
find_table(soup, "Postseason")

FYI, find_parent() documentation reference.

Sign up to request clarification or add additional context in comments.

Comments

1

Following should work as well -

import pickle
import math
import urllib2
from lxml import etree
from bs4 import BeautifulSoup
from urllib import urlopen

year = '2014'
lastWeek = '2'
favQB1 = "Tom Brady"

favQBurl2 = 'http://www.nfl.com/player/tombrady/2504211/gamelogs'
favQBhtml2 = urlopen(favQBurl2).read()
favQBsoup2 = BeautifulSoup(favQBhtml2)
favQBpass2 = favQBsoup2.find_all("table", { "summary" : "Game Logs For %s In %s" % (favQB1, year)})[1]
favQBrows2 = []

for row in favQBpass2.findAll("tr"):
    if lastWeek in row.findNext('td'):
        for item in row.findAll("td"):
            favQBrows2.append(item.text)
print ("Enter: Starting Quarterback QB Rating of Favored Team for the last game played (regular season): "),
print favQBrows2[15]

3 Comments

Nevermind... Ignore my last comment.. it worked perfect!! Thanks!!
What error? I just tried and its working fine for me.
Not a problem. @alecxe answer is good as well. In my answer, you just need to change the index position. Please upvote my answer :)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.