Convert dynamically loaded table into Pandas Dataframe

Question

Running Python 3.6.1 |Anaconda 4.4.0 (64-bit) on a Windows device.

Using selenium I gather the following html source:

from selenium import webdriver
from selenium.webdriver.common.keys import Keys

url = "https://nextgenstats.nfl.com/stats/receiving#yards"
driver = webdriver.Chrome(executable_path=r"C:/Program Files (x86)/Google/Chrome/chromedriver.exe")
driver.get(url)
htmlSource = driver.page_source

If one checks the url, they will see a nice table that is dynamically loaded. I am unsure how this table can be extracted from htmlsource so that a Pandas dataframe can be constructed from it.

pandas has read_html() which can find all <table> in file. — furas
– furas, Commented Dec 15, 2017 at 8:45
@furas read_html() without BeautifulSoup returned an error saying no tables found. The answer from COLDSPEED works. — sunspots
– sunspots, Commented Dec 15, 2017 at 8:49
it was no answer but only commet with sugestion what to use. — furas
– furas, Commented Dec 15, 2017 at 8:57

cs95 · Accepted Answer · 2017-12-15 08:44:38Z

3

You're pretty close. You just need to help pandas a bit here. Here's what you need to do in a nutshell.

Load the source into BeautifulSoup
Find the table in question. Use soup.find
Call pd.read_html

from bs4 import BeautifulSoup

soup = BeautifulSoup(htmlSource, 'html.parser')
table = soup.find('div', class_='ngs-data-table')

df_list = pd.read_html(table.prettify())

Now, df_list contains a list of all tables on that page -

df_list[1].head()

                0    1   2    3    4     5      6   7    8      9     10  11
0    Antonio Brown  PIT  WR  4.3  2.6  13.7  45.32  99  160  61.88  1509   9
1  DeAndre Hopkins  HOU  WR  4.6  2.1  13.1  42.19  88  155  56.77  1232  11
2     Adam Thielen  MIN  WR  5.8  2.6  11.0  37.38  80  124  64.52  1161   4
3      Julio Jones  ATL  WR  5.2  2.4  14.2  43.34  73  118  61.86  1161   3
4     Keenan Allen  LAC  WR  5.4  2.6   9.5  31.30  83  129  64.34  1143   5

answered Dec 15, 2017 at 8:44

cs95

406k106 gold badges744 silver badges797 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

sunspots Over a year ago

Awesome, BeautifulSoup was the missing link.

cs95 Over a year ago

@sunspots There may be another way to do this, but as far as I know, this is by far the easiest way. All it takes is examining the data, pinpointing the table, and the rest, as they say, is history.

cs95 Over a year ago

In a day or two I might put a bounty on this question just to see if anyone has any other tricks they'd like to share, using any of read_html's multitudinous arguments.

sunspots Over a year ago

@COLDSPEED dryscape might be another way, but the support for Windows seems complicated: pypi.python.org/pypi/dryscrape/1.0

user2314737 · Accepted Answer · 2017-12-15 13:01:07Z

As a Scrapy user, I'm used to look at XHR requests. If you change year in your site you'll see the API call to https://appapi.ngs.nfl.com/statboard/receiving?season=2017&seasonType=REG

The API returns JSON, so it makes sense to use a JSON parser like read_json for the data.

Here's how you can use this is the Scrapy shell:

$ scrapy shell

In [1]: fetch("https://appapi.ngs.nfl.com/statboard/receiving?season=2017&seasonType=REG")
2017-12-15 13:11:30 [scrapy.core.engine] INFO: Spider opened
2017-12-15 13:11:31 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://appapi.ngs.nfl.com/statboard/receiving?season=2017&seasonType=REG> (referer: None)

In [2]: import pandas as pd

In [3]: data = pd.read_json(response.body)

In [4]: data.keys()
Out[4]: Index([u'season', u'seasonType', u'stats', u'threshold'], dtype='object')

In [5]: pd.DataFrame(list(data['stats']))

If you don't have scrapy, you can use requests

import requests
import pandas as pd

url = "https://appapi.ngs.nfl.com/statboard/receiving?season=2017&seasonType=REG"

response = requests.get(url)
data = pd.read_json(response.text)
df = pd.DataFrame(list(data['stats']))

Collectives™ on Stack Overflow

Convert dynamically loaded table into Pandas Dataframe

2 Answers 2

4 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

4 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related