I'm trying to scrape a product page from Digikala using Pyppeteer because the site is heavily JavaScript-rendered.
Here is my render class:
import asyncio
from pyppeteer import launch
from pyppeteer.errors import TimeoutError
class JsRender:
def __init__(self, chromium_path):
self.chromium_path = chromium_path
async def fetch_content(self, url, headers=None):
try:
browser = await launch(executablePath=self.chromium_path, headless=True)
page = await browser.newPage()
if headers:
await page.setExtraHTTPHeaders(headers)
await page.goto(url, {'waitUntil': 'networkidle2', 'timeout': 60000})
await asyncio.sleep(2) # Added sleep just in case
content = await page.content()
await browser.close()
return content
except TimeoutError:
print(f'Timeout while loading {url}')
return None
def get_page_content(self, url, headers=None):
return asyncio.get_event_loop().run_until_complete(self.fetch_content(url, headers))
And here’s how I call it:
from browser_configure import JsRender
from bs4 import BeautifulSoup
from utility import covert_price
browser = JsRender(r'C:\Users\sama\AppData\Local\Chromium\Application\chrome.exe')
url = 'https://www.digikala.com/product/dkp-17986495/%DA%AF%D9%88%D8%B4%DB%8C-%D9%85%D9%88%D8%A8%D8%A7%DB%8C%D9%84-%D8%A7%D9%BE%D9%84-%D9%85%D8%AF%D9%84-iphone-16-ch-%D8%AF%D9%88-%D8%B3%DB%8C%D9%85-%DA%A9%D8%A7%D8%B1%D8%AA-%D8%B8%D8%B1%D9%81%DB%8C%D8%AA-128-%DA%AF%DB%8C%DA%AF%D8%A7%D8%A8%D8%A7%DB%8C%D8%AA-%D9%88-%D8%B1%D9%85-8-%DA%AF%DB%8C%DA%AF%D8%A7%D8%A8%D8%A7%DB%8C%D8%AA/'
content = browser.get_page_content(url)
soup = BeautifulSoup(content,'html.parser')
#-------------------------------------------------------------
tilel = soup.select_one('h1.text-h5-180').text
star = soup.select_one('p.text-body1-strong')
star = float(star.text) if star else 0
price = soup.select_one('div.w-full.flex.gap-1.item-center').text
price = covert_price(price)
colors = soup.select('span.text-body-1-180.whitespace-nowrap.text-neutral-650')
colors = [color.text for color in colors if color ]
#-------------------------------------------------------------
detail = soup.select_one('table.border-collapse')
print(detail)
No matter what I try, content is always None.
I've tried:
- Adding
await asyncio.sleep(2) - Using waitUntil: 'networkidle2'
- Making sure Chromium path is valid
- Changing selectors
Still, page.content() returns nothing or fails.
The Digikala page works in the browser, but not with Pyppeteer.
How can I properly render and scrape content from Digikala with Pyppeteer?
I tried several approaches to make sure the page content is fully loaded before extracting it:
- Added
await asyncio.sleep(2)afterpage.gototo wait extra time for rendering. - Used waitUntil: 'networkidle2' in
page.gototo wait until network activity calms down. - Tried
await page.waitForSelector('div#ProductTopFeatures')to wait for a specific element on the page. - Verified the Chromium executable path is correct.
- Checked if the URL is accessible and not blocked.
- Used
BeautifulSoupto parse the returned HTML content.
I expected to get the full HTML content of the product page with all dynamic elements rendered, so I could scrape the product details successfully. But instead, the content is always None or empty.