PYTHON: How do I use BeautifulSoup to parse a table into a pandas dataframe

Question

I am trying to scrape the CDC website for the data of the last 7 days reported cases for COVID-19. https://covid.cdc.gov/covid-data-tracker/#cases_casesinlast7days I've tried to find the table, by name, id, class, and it always returns as none type. When I print the data scraped, I cant manually locate the table in the html either. Not sure what I'm doing wrong here. Once the data is imported, I need to populate a pandas dataframe to later use for graphing purposes, and export the data table as a csv.

for extra information, it appears that the table is generated in javascript so selemium will need to be used to get this data — Taylor Killen
– Taylor Killen, Commented Oct 17, 2020 at 19:33
what Taylor says is right. Additionally, I see that there is a button "download" on your website, so you might just try that (with selenium) — qmeeus
– qmeeus, Commented Oct 17, 2020 at 19:38

Hryhorii Pavlenko · Accepted Answer · 2020-10-17 20:41:03Z

1

You might as well request data from the API directly (check out Network tab in your browser while refreshing the page):

import requests
import pandas as pd


endpoint = "https://covid.cdc.gov/covid-data-tracker/COVIDData/getAjaxData"
data = requests.get(endpoint, params={"id": "US_MAP_DATA"}).json()
df = pd.DataFrame(data["US_MAP_DATA"])

EDIT: Trying to make this answer more general and useful.

How did you discern that this was how to parse the data?

Firstly, you need to inspect the page (Ctrl + Shift + I) and navigate to network tab:

Secondly, you need to refresh the page to record network activity.

Where to look?

Check XHR to limit number of records (1);

Look through the records by clicking on them (2) and check their preview responses (3) to find out if it's the data you need.

It doesn't always work but when it does, parsing data from API directly is so much easier than writing scrapers via requests / bs4 / selenium etc and should be the first choice.

edited Oct 17, 2020 at 20:41

answered Oct 17, 2020 at 19:43

Hryhorii Pavlenko

3,9104 gold badges21 silver badges38 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Taylor Killen Over a year ago

Wow! Worked like a charm! Its possible my assignment intended for me to use beautiful soup but this solution is far more fast and effective then selenium and a chromium executable. How did you discern that this was how to parse the data?

Taylor Killen Over a year ago

Thank you! Very thorough explanation

Collectives™ on Stack Overflow

PYTHON: How do I use BeautifulSoup to parse a table into a pandas dataframe

1 Answer 1

EDIT: Trying to make this answer more general and useful.

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

EDIT: Trying to make this answer more general and useful.

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related