1

I have some problem with webscraping. I need data from betting site, scrape and store it at dataframe.

My code:

import numpy as numpy
import pandas as pd
from urllib.parse import urljoin
import requests
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
import time

DRIVER_PATH = 'C:\\executables\\chromedriver.exe'

options = Options()
options.headless = True
options.add_argument("--window-size=1920,1200")

driver = webdriver.Chrome(options=options, executable_path=DRIVER_PATH)

driver.get("https://www.nike.sk/live-stavky/futbal")

time.sleep(10)


soup = BeautifulSoup(driver.page_source, 'html.parser')

# match time
out_1 = soup.find_all(class_='ellipsis flex fs-10 c-black-50 justify-between pr-5')
# home and away teams
out_2 = soup.find_all(class_='ellipsis f-condensed c-black-100 text-extra-bold match-opponents pr-10')
# match status
out_3 = soup.find_all(class_='flex justify-center text-right flex-col match-score-col fs-12 c-orange text-extra-bold')
# match status 2
out_4 = soup.find_all(class_='flex justify-center text-right flex-col match-score-col fs-12 text-extra-bold c-default-light')

My output (out_1, ..., out_4) is messy blocks of text. How can I put it in a complete dataframe? Can I turn it to dataframe without regex?

0

1 Answer 1

1

You can try to use their Ajax API to download the data in Json format, then make dataframe from this data:

import json
import re

import pandas as pd
import requests
from bs4 import BeautifulSoup

url = "https://push.nike.sk/snapshot?path=%2Fn1%2Foverview%2Ffutbal%2Ftournaments%2F"

html_doc = requests.get("https://www.nike.sk/live-stavky/futbal").text

token = re.search(r'"securityToken":"([^"]+)"', html_doc).group(1)


data = json.loads(requests.get(url, headers={"x-security-token": token}).json()[0][-1])

all_data = []
for m in data["matches"]:
    s1 = m["score"]["scores"]["TOTAL"]["home"]
    s2 = m["score"]["scores"]["TOTAL"]["away"]
    all_data.append((m["home"]["en"], m["away"]["en"], s1, s2))

df = pd.DataFrame(all_data, columns=["Team 1", "Team 2", "Score 1", "Score 2"])
print(df)

Prints:

                        Team 1             Team 2 Score 1 Score 2
0                Barito Putera           Makassar       1       1
1               Rahmatgonj MFS       Sheikh Jamal       2       0
2  Stredoafrická republika SRL        Etiópia SRL       2       1
3                   Kosovo SRL       Arménsko SRL       0       0
4             Mohammedan Dhaka  Azampur FC Uttara       3       0
Sign up to request clarification or add additional context in comments.

2 Comments

Thank you, your code works great. But for me it s a bit like a magic formula... :D Pls, how did you get the url (push.nike.sk...)?
@314mip Try to open Webdeveloper tools -> Network Tab in Chrome and Firefox and reload the page. You shall see this URL inside it along with the data. (The page is using Javascript to get and render the data).

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.