-1

Here are my codes:

`import pandas as pd
import requests
from bs4 import BeautifulSoup


url = "https://www.payscale.com/college-salary-report/majors-that-pay-you-back/bachelors/"
response = requests.get(url)
soup_job = response.text
soup = BeautifulSoup(soup_job, "html.parser")
table = soup.find_all('table', class_="data-table")
print(table)`

`Even when I do this, it is still not working. Is there anyone who can help me, please?'

page=1
while page <= 34:
 
    response = requests.get(
        f"https://www.payscale.com/college-salary-report/majors-that-pay-you-back/bachelors/page/{page}")
    page += 1

I have used selenium to get that done, but it is still not working. I'm trying to see if someone can review and provide some hint on how to get that done.

import pandas as pd
import requests
from bs4 import BeautifulSoup

url = "https://www.payscale.com/college-salary-report/majors-that-pay-you-back/bachelors"
response = requests.get(url)
soup_job = response.text
soup = BeautifulSoup(soup_job, "html.parser")
table = soup.find_all('table', class_="data-table")
print(table)

When I print the table, I expected to get some text or data, but it gives me an empty list. I have changed the class name to see if I can get something better; it is showing me the same issue, and when I tried to use find instead of find all, it returned none instead of an empty list. I want to get the data inside the table there, and after that, I will be able to tract the head and the body of the table to get the data I want; nothing works for me so far.

7
  • Welcome to Stack Overflow! In what way is your code not working as expected? Please elaborate on the specific problem you are observing and what debugging you have done. To learn more about this community and how we can help you, please start with the tour and read How to Ask and its linked resources. Commented Dec 25, 2024 at 16:45
  • You're using requests, which cannot process javascript content. Commented Dec 25, 2024 at 17:26
  • Do you have any idea about what to use instead of requests? Commented Dec 25, 2024 at 20:29
  • @Dave: If you’ve determined that the content being scraped needs to process JavaScript or in some way perform functionality that a browser would perform then plain HTTP requests alone wouldn’t do that. You’d need to use a “headless browser” tool like Selenium to process the page’s client-side functionality in memory so you can observe the resulting updated DOM. Is that what you are observing? The question is currently light on such details. Commented Dec 25, 2024 at 21:10
  • I have tried this but still no answer; from selenium import webdriver from selenium.webdriver.chrome.service import Service from selenium.webdriver.common.by import By from webdriver_manager.chrome import ChromeDriverManager Commented Dec 25, 2024 at 22:54

2 Answers 2

1

I suspect that you are being blocked by the site. You need to be a little stealthy.

import time
import undetected_chromedriver as uc
import pandas as pd
from bs4 import BeautifulSoup


URL = "https://www.payscale.com/college-salary-report/majors-that-pay-you-back/bachelors/"

driver = uc.Chrome(headless=False, use_subprocess=True)

driver.get(URL)

# Wait for page to fully load. Could actually wait for a specific element.
time.sleep(5)

soup = BeautifulSoup(driver.page_source, "lxml")

data = []

for row in soup.select_one("table.data-table > tbody").select("tr"):
    data.append({
        "rank": row.select_one("td.csr-col--rank .data-table__value").text,
        "school": row.select_one("td.csr-col--school-name .data-table__value").text,
        "degree": row.select_one("td.csr-col--school-type .data-table__value").text,
        "early": row.select_one("td:nth-of-type(4) .data-table__value").text,
        "mid": row.select_one("td:nth-of-type(5) .data-table__value").text,
    })

driver.quit()

data = pd.DataFrame(data)
print(data)

This is what the output should look like:

   rank                                            school     degree     early       mid
0     1                             Petroleum Engineering  Bachelors   $98,100  $212,100
1     2      Operations Research & Industrial Engineering  Bachelors  $101,200  $202,600
2     3  Electrical Engineering & Computer Science (EECS)  Bachelors  $128,500  $192,300
3     4                                Interaction Design  Bachelors   $77,400  $178,800
4     5                                  Building Science  Bachelors   $71,100  $172,400
5     6                  Applied Economics and Management  Bachelors   $81,200  $169,300
6     7                             Actuarial Mathematics  Bachelors   $71,200  $167,500
7     8                     Optical Science & Engineering  Bachelors   $81,500  $166,400
8     9                            Quantitative Economics  Bachelors   $78,400  $165,100
9    10                               Operations Research  Bachelors   $94,900  $164,900
10   11                               Systems Engineering  Bachelors   $89,700  $163,800
11   12                    Information & Computer Science  Bachelors   $73,200  $162,900
12   13                                 Public Accounting  Bachelors   $71,500  $162,200
13   14                                 Cognitive Science  Bachelors   $80,300  $162,100
14   15                        Aeronautics & Astronautics  Bachelors   $89,800  $161,600
15   16                                 Aerospace Studies  Bachelors   $64,500  $158,400
16   17                                          Pharmacy  Bachelors   $71,500  $158,000
17   18                              Managerial Economics  Bachelors   $78,200  $157,800
18   19                                   Foreign Affairs  Bachelors   $65,200  $157,700
19   20                                 Political Economy  Bachelors   $75,800  $156,700
20   21                              Chemical Engineering  Bachelors   $87,700  $156,100
21   21                  Marine Transportation Management  Bachelors   $78,500  $156,100
22   23               Computer Science (CS) & Engineering  Bachelors   $93,500  $154,100
23   24                    Corporate Accounting & Finance  Bachelors   $79,100  $154,000
24   25                         Computer Engineering (CE)  Bachelors   $92,000  $153,800
Sign up to request clarification or add additional context in comments.

Comments

0

BeautifulSoup is just a parser as it retrieves the static HTML content from the server and can't handle JavaScript-rendered content, while Selenium can because it emulates a browser.

2 Comments

Even when I used Selenium, I still couldn't get the result.
Do you have any idea about what to use instead of BeautifulSoup and Selenium to get that done?

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.