0

How can i include infinite loading using playwright i have tried many solutions that i got online but none of them solved my issue

This is how it looks like it dosen't load the whole page instead it gives an Timeout error

Below is the code that displays all the cards of the specific page

from playwright.sync_api import sync_playwright
import pandas as pd

def main():
    with sync_playwright() as p:
        chromium_path = r"C:\Users\pranit\AppData\Local\ms-playwright\chromium-1076\chrome-win\chrome.exe"
        browser = p.chromium.launch(executable_path=chromium_path, headless=False)

        page_url = f'https://www.justdial.com/Mumbai/Tourist-Attraction/nct-10596038'

        page = browser.new_page()
        page.goto(page_url, timeout=600000)

        attractions = page.locator('//div[@class="jsx-a3a43292cc1e6428 results_listing_container"]/div').all()
        print(f'There are: {len(attractions)} attractions.')

        attractions_list =[]
        for attraction in attractions:
            attraction_dict = {} 
            attraction_dict['place'] = attraction.locator('//div[@class="jsx-4d407376001b01ad"]/h2/a').inner_text()


            attractions_list.append(attraction_dict)
        df = pd.DataFrame(attractions_list)
        df.to_excel('attractions_list.xlsx', index=False)
        df.to_csv('attractions_list.csv', index=False)

        browser.close()

if __name__ == '__main__':
    main()

Here's one of the code that i tried

for i in range(5): #make the range as long as needed
        page.mouse.wheel(0, 15000)
        time.sleep(2)
        i += 1
    
    time.sleep(15)

I am not able to figure out what's the problem in here I just want all the attractions to be loaded and not just the start 10 . My code gives output as 10 and throws an timeout error

here's the error

PS C:\Users\pranit\OneDrive\Desktop\BeautifulSoup tutorial> python -u "c:\Users\pranit\OneDrive\Desktop\BeautifulSoup tutorial\TravelAdvisorAttractions.py"
There are: 10 attractions.
Traceback (most recent call last):
  File "c:\Users\pranit\OneDrive\Desktop\BeautifulSoup tutorial\TravelAdvisorAttractions.py", line 31, in <module>
    main()
  File "c:\Users\pranit\OneDrive\Desktop\BeautifulSoup tutorial\TravelAdvisorAttractions.py", line 20, in main
    attraction_dict['place'] = attraction.locator('//div[@class="jsx-4d407376001b01ad"]/h2/a').inner_text()
  File "C:\Users\pranit\AppData\Local\Programs\Python\Python39\lib\site-packages\playwright\sync_api\_generated.py", line 17228, in inner_text
    self._sync(self._impl_obj.inner_text(timeout=timeout))
  File "C:\Users\pranit\AppData\Local\Programs\Python\Python39\lib\site-packages\playwright\_impl\_sync_base.py", line 109, in _sync
    return task.result()
  File "C:\Users\pranit\AppData\Local\Programs\Python\Python39\lib\site-packages\playwright\_impl\_locator.py", line 444, in inner_text
    return await self._frame.inner_text(
  File "C:\Users\pranit\AppData\Local\Programs\Python\Python39\lib\site-packages\playwright\_impl\_frame.py", line 619, in inner_text
  File "C:\Users\pranit\AppData\Local\Programs\Python\Python39\lib\site-packages\playwright\_impl\_connection.py", line 482, in wrap_api_call
    return await cb()
  File "C:\Users\pranit\AppData\Local\Programs\Python\Python39\lib\site-packages\playwright\_impl\_connection.py", line 97, in inner_send
    result = next(iter(done)).result()
playwright._impl._api_types.TimeoutError: Timeout 30000ms exceeded.
=========================== logs ===========================
waiting for locator("xpath=//div[@class=\"jsx-a3a43292cc1e6428 results_listing_container\"]/div").first.locator("xpath=//div[@class=\"jsx-4d407376001b01ad\"]/h2/a")

1 Answer 1

0

More or less this is what you want:

import time
from playwright.sync_api import sync_playwright

with sync_playwright() as p:
    browser = p.chromium.launch(headless=False)
    context = browser.new_context()

    page = context.new_page()
    page.goto('https://www.justdial.com/Mumbai/Tourist-Attraction/nct-10596038')

    # We get alll the cards (10 at the beginning)
    cards = page.locator("//*[@class='jsx-a3a43292cc1e6428 results_listing_container']/div")

    # We repeat the process till we have 100 cards (Which is the maximum)
    while len(cards.all()) < 100:
        # We do scroll to the last present card
        # In the first while loop it will be scroll to the card 10, second loop will be scroll to the card 20 and so on
        cards.all()[-1].scroll_into_view_if_needed()
        time.sleep(2)

    # Once we have 100 cards, we print the info of every card (Here you can investigate in order to show the info that you want, in the example I am showing everything
    for card in cards.all():
        print(card.inner_text())
        print("------------------------------O--------------------------------")

I explained the code with comments.

I noticed that the page sometimes gets stuck waiting for loading the new cards, if that happen you must investigate a way of managing it.

But well, at least with that you can get all the cards of the page.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.