1

I'm trying to scrape a product page from Digikala using Pyppeteer because the site is heavily JavaScript-rendered.

Here is my render class:

import asyncio
from pyppeteer import launch
from pyppeteer.errors import TimeoutError

class JsRender:
    def __init__(self, chromium_path):
        self.chromium_path = chromium_path

    async def fetch_content(self, url, headers=None):
        try:
            browser = await launch(executablePath=self.chromium_path, headless=True)
            page = await browser.newPage()

            if headers:
                await page.setExtraHTTPHeaders(headers)

            await page.goto(url, {'waitUntil': 'networkidle2', 'timeout': 60000})
            await asyncio.sleep(2)  # Added sleep just in case
            content = await page.content()
            await browser.close()
            return content
        except TimeoutError:
            print(f'Timeout while loading {url}')
            return None

    def get_page_content(self, url, headers=None):
        return asyncio.get_event_loop().run_until_complete(self.fetch_content(url, headers))

And here’s how I call it:

from browser_configure import JsRender
from bs4 import BeautifulSoup
from utility import covert_price

browser = JsRender(r'C:\Users\sama\AppData\Local\Chromium\Application\chrome.exe')

url = 'https://www.digikala.com/product/dkp-17986495/%DA%AF%D9%88%D8%B4%DB%8C-%D9%85%D9%88%D8%A8%D8%A7%DB%8C%D9%84-%D8%A7%D9%BE%D9%84-%D9%85%D8%AF%D9%84-iphone-16-ch-%D8%AF%D9%88-%D8%B3%DB%8C%D9%85-%DA%A9%D8%A7%D8%B1%D8%AA-%D8%B8%D8%B1%D9%81%DB%8C%D8%AA-128-%DA%AF%DB%8C%DA%AF%D8%A7%D8%A8%D8%A7%DB%8C%D8%AA-%D9%88-%D8%B1%D9%85-8-%DA%AF%DB%8C%DA%AF%D8%A7%D8%A8%D8%A7%DB%8C%D8%AA/'
content = browser.get_page_content(url)

soup = BeautifulSoup(content,'html.parser')

#-------------------------------------------------------------
tilel = soup.select_one('h1.text-h5-180').text
star = soup.select_one('p.text-body1-strong')
star = float(star.text) if star else 0
price = soup.select_one('div.w-full.flex.gap-1.item-center').text
price = covert_price(price)
colors = soup.select('span.text-body-1-180.whitespace-nowrap.text-neutral-650')
colors = [color.text for color in colors if color ]
#-------------------------------------------------------------

detail = soup.select_one('table.border-collapse')
print(detail)

No matter what I try, content is always None.

I've tried:

  • Adding await asyncio.sleep(2)
  • Using waitUntil: 'networkidle2'
  • Making sure Chromium path is valid
  • Changing selectors

Still, page.content() returns nothing or fails.

The Digikala page works in the browser, but not with Pyppeteer.

How can I properly render and scrape content from Digikala with Pyppeteer?

I tried several approaches to make sure the page content is fully loaded before extracting it:

  • Added await asyncio.sleep(2) after page.goto to wait extra time for rendering.
  • Used waitUntil: 'networkidle2' in page.goto to wait until network activity calms down.
  • Tried await page.waitForSelector('div#ProductTopFeatures') to wait for a specific element on the page.
  • Verified the Chromium executable path is correct.
  • Checked if the URL is accessible and not blocked.
  • Used BeautifulSoup to parse the returned HTML content.

I expected to get the full HTML content of the product page with all dynamic elements rendered, so I could scrape the product details successfully. But instead, the content is always None or empty.

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.