0

I have code all assembled to work on this response:

response = requests.get(url).text
soup = BeautifulSoup(response, 'html.parser')

But I will need to use Selenium WebDriver instead of requests, I tried to use it this way:

driver = web_driver()
response = driver.page_source
soup = BeautifulSoup(response, 'html.parser')

But the responses are different and this generates several errors in the code. Is there a way to get Selenium WebDriver to collect exactly the same response as using requests?

Detail: if possible, there is a way to always get the same result regardless of the url used.

Example of Usage

In a scenario where I want to create a pandas dataframe with the exact values ​​of this table that exists in this url. requests delivers a result in terms of reading elements different from Selenium.

https://int.soccerway.com/teams/egypt/pharco/38185/squad/

enter image description here

9
  • did you use webdriver.Firefox()? and driver.get(url)? driver.page_source should return the html. Commented Jul 4, 2024 at 3:42
  • Hi @GTK Yes, it returns HTML, but the result is different from what is delivered by requests, if you compare both responses, you will see that the result is not equal. Example on this page: int.soccerway.com/teams/egypt/pharco/38185/squad Commented Jul 4, 2024 at 4:01
  • 1
    well selenium runs javascript, so the html might be mutated using js. requests returns the static html, why are you using selenium if you only need static html? Commented Jul 4, 2024 at 4:08
  • 1
    what you're trying to do is possible, but not recommended, there are other ways of getting around rate limiting, please clarify what data you're trying to scrape, the table for example can be fetched from the api Commented Jul 4, 2024 at 15:31
  • 1
    it's in the network tab, try changing the date(season), if you post your actual problem, I may be able to give a proper answer that you can use as an example. Commented Jul 4, 2024 at 16:00

1 Answer 1

1

Here's how you can get the table from the api (found through inspecting the network tab):

import requests
from bs4 import BeautifulSoup
import pandas as pd 


url = 'https://int.soccerway.com/a/block_team_squad'
params = {
    'block_id': 'page_team_1_block_team_squad_7',
    'callback_params': '{"team_id":"38185"}',
    'action': 'changeSquadSeason',
    'params': '{"season_id":"23909"}',
}

headers = {
    'x-requested-with': 'XMLHttpRequest',
    'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/126.0.0.0 Safari/537.36'
}

response = requests.get(url, headers=headers, params=params)
html = response.json()['commands'][0]['parameters']['content']
table = BeautifulSoup(html, 'html.parser')

# class selectors for the columns we want to keep
selectors = [
    'shirtnumber', 
    'name', 
    'age', 
    'position', 
    'game-minutes', 
    'appearances',
    'lineups',
    'subs-in',
    'subs-out',
    'subs-on-bench',
    'goals',
    'assists',
    'yellow-cards',
    '2nd-yellow-cards',
    'red-cards'
]


# dynamically add column names
# alternatively you can use a static list or the selectors themselves
head = []
for selector in selectors:
    th = table.thead.find('th', {'class': selector})
    if txt := th.get_text(strip=True):
        head.append(txt)
    elif th.img:
        head.append(th.img['title'])
    else:
        head.append(selector.capitalize())


# creating the dataframe
rows = [[td.text for td in tr.find_all('td', {'class': selectors})] for tr in table.tbody.find_all('tr')]
df = pd.DataFrame(rows, columns=head)

Since some column names are text and the others are images, I used the 'title' attr or the class name instead. you can skip that step and load the table into a dataframe using pd.read_html() but the result will not have all header names, and empty cols(photo/flag).

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.