1

I'm trying to use python to webscrap information from a car website to have name & price. but the output is empty lists

car website: https://www.contactcars.com/en/cars/used/toyota?page=1&sortOrder=false&sortBy=CreatedAt

my code:

import pandas as pd
import requests
from bs4 import BeautifulSoup

web_page = requests.get("https://www.contactcars.com/en/cars/used/toyota?page=1&sortOrder=false&sortBy=CreatedAt").text
soup = BeautifulSoup(web_page)

data = []
for x in soup:
  name = soup.find_all('div', attrs={'class':'n-engine-card__model ng-star-inserted'})
  price= soup.find_all('div', attrs={'class':'n-engine-card__price'})

  data.append({
        'name':name,
        'price':price
    })
  
print(data)

output:

[{'name': [], 'price': []}, {'name': [], 'price': []}]
4
  • Can you do a quick sanity check? Remove the attrs from soup.find_all and check if it produces any output. Commented Sep 28, 2022 at 15:05
  • 3
    data in script block whit id="serverApp-state" Commented Sep 28, 2022 at 15:12
  • @ZiadAmerr when remove all Attr , it give a long list, this list doesn't include any info related to the cars Commented Sep 28, 2022 at 17:37
  • It looks like the .text does not contain the html, this is generated by JavaScript Commented Sep 28, 2022 at 17:54

1 Answer 1

3

Try this:

import json

import requests
from bs4 import BeautifulSoup

from tabulate import tabulate

headers = {
    "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.114 Safari/537.36",
}

url = "https://www.contactcars.com/en/cars/used/toyota?page=1&sortOrder=false&sortBy=CreatedAt"
response = requests.get(url, headers=headers)
soup = (
    BeautifulSoup(response.text, "html.parser")
    .find("script", {"id": "serverApp-state"})
    .text
    .replace("&q;", '"')
)
search = "http_requests:https://api.live.contactcars.com/gateway/vehicles/carsSearch/used"
data = json.loads(soup)[search]["body"]["result"]["items"]

table = []
for item in data:
    table.append(
        [
            item['make']['nameEn'],
            item['model']['nameEn'],
            item['year'],
            item['price'],
        ]
    )

print(
    tabulate(
        table,
        headers=['Make', 'Model', 'Year', 'Price'],
        tablefmt='github',
    )
)

Output:

| Make   | Model    |   Year |   Price |
|--------|----------|--------|---------|
| Toyota | Corolla  |   2022 |  650000 |
| Toyota | Yaris    |   2007 |  225000 |
| Toyota | Corolla  |   1998 |  110000 |
| Toyota | Corolla  |   2018 |  490000 |
| Toyota | Fortuner |   2017 |  900000 |
| Toyota | Corolla  |   2020 |  600000 |
| Toyota | Corolla  |   2021 |  750000 |
| Toyota | Corolla  |   2020 |  630000 |
| Toyota | Corolla  |   1993 |   88000 |
| Toyota | Corolla  |   1995 |   90000 |
| Toyota | Corolla  |   2013 |  410000 |
| Toyota | Hiace    |   2007 |  280000 |
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.