Web Scrape table from nested divs

Question

I am trying to use BeautifulSoup and requests to scrape the table here into a data frame. I used to be able to do this using this:

url = "https://www.vegasinsider.com/college-basketball/odds/las-vegas/money/"
soup = BeautifulSoup(requests.get(url).content, "html.parser")

for br in soup.select("br"):
    br.replace_with("\n")
base = pd.read_html(str(soup.select_one(".frodds-data-tbl")))[0]

Sadly, the website's layout completely changed overnight. I am now getting this value error:

ValueError: No tables found

This is because I was previously looking for a table. Now the data is stored in a series of nested divs. I have been able to make some headway with this code here:

url = "https://www.vegasinsider.com/college-basketball/odds/las-vegas/"
page = requests.get(url).text
soup = BeautifulSoup(page, "html.parser")

divList = soup.findAll('div', attrs={"class" : "bc-odds-table bc-table"})
print(divList)

I have also been able to get thought and find where I am looking to pull data from here:

I was also able to get something by doing this:

data = [[x.text for x in y.findAll('div')] for y in divList]
df = pd.DataFrame(data)
print(df)

[1 rows x 5282 columns]

How would I be able to loop through these divs and return the data in a pandas dataframe?

When using div.text, it returns one long string of the data that I want. I could split this string up into many pieces and glue it into a df where I want it to go. But that seems like a hack job at best.

Be aware that pandas.read_html() only read tables and what you inspected is not a table it is a div. So you have to scrape it "manually" or find an api. — HedgeHog
– HedgeHog, Commented Feb 23, 2022 at 15:56
this makes sense. This is my first time doing anything like this. Thank you — smitty_werben_jagerm
– smitty_werben_jagerm, Commented Feb 23, 2022 at 15:57

Matt · Accepted Answer · 2022-02-23 17:58:00Z

2

You basically need to go through all the divs by identifying unique identifiers in the class names. Try this:

import pandas as pd
import requests
from bs4 import BeautifulSoup

def extract_data_from_div(div):
    # contains the names of the teams
    left_side_div = div.find('div', class_='d-flex flex-column odds-comparison-border position-relative')

    name_data = []
    for name in left_side_div.find_all('div', class_='team-stats-box'):
        name_data.append(name.text.strip())

    # to save all the extracted odds
    odds = []

    # now isolate the divs with the odds
    for row in div.find_all('div', class_='px-1'):

        # all the divs for each bookmaker
        odds_boxes = row.find_all('div', class_='odds-box')

        odds_box_data = []
        for odds_box in odds_boxes:
            # sometimes they're just 'N/A' so this will stop the code breaking
            try:
                pt_2 = odds_box.find('div', class_='pt-2').text.strip()
            except:
                pt_2 = ''

            try:
                pt_1 = odds_box.find('div', class_='pt-1').text.strip()
            except:
                pt_1 = ''

            odds_box_data.append((pt_2, pt_1))

        # append to the odds list
        odds.append(odds_box_data)

    # put the names and the odds together
    extracted_data = dict(zip(name_data, odds))

    return extracted_data

url = "https://www.vegasinsider.com/college-basketball/odds/las-vegas/"
resp = requests.get(url)

soup = BeautifulSoup(str(resp.text), "html.parser")

# this will give you a list of each set of match odds
div_list = soup.find_all('div', class_='d-flex flex-row hide-scrollbar odds-slider-all syncscroll tracks')

data = {}
for div in div_list:
    extracted = extract_data_from_div(div)
    data = {**data, **extracted}

# finally convert to a dataframe
df = pd.DataFrame.from_dict(data, orient='index').reset_index()

answered Feb 23, 2022 at 17:58

Matt

9823 gold badges11 silver badges23 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

smitty_werben_jagerm Over a year ago

This is perfect. If I wanted to grab the money lines and throw them into a different data frame, would it be as simple as pt_3= odds_box.find('div', class_='pt-3').text.strip()?

Matt Over a year ago

In theory, yes, but it looks like the div only appears when you click on the money lines button. You might need to use Selenium webdriver to click the button and then get the HTML

Collectives™ on Stack Overflow

Web Scrape table from nested divs

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related