1

I try to load all comments from this site to scrape them but i cant figure out how to load them all.

When i run my code i get error in console it says:

WebDriverWait(driver, 20).until(EC.element_to_be_clickable( File "C:\Users\Jakub\dev\rok_quests\rok_quests\Lib\site-packages\selenium\webdriver\support\wait.py", line 95, in until raise TimeoutException(message, screen, stacktrace) selenium.common.exceptions.TimeoutException: Message:

Doesnt it mean it doesnt find a button or it cant click on it ?

the url i use:

https://www.rok.guide/buildings/lyceum-of-wisdom/

The code here is meant to load all comments from comments section then i will get page_source and scrape .

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from bs4 import BeautifulSoup
import time


def scrape_comments(url):
    # Set up Chrome driver options
    chrome_options = Options()
    chrome_options.add_argument("start-maximized")
    chrome_options.add_argument('disable-infobars')
    chrome_options.add_argument("--block-notifications")
    chrome_options.add_argument("--headless")
    driver = webdriver.Chrome(options=chrome_options)
    wait = WebDriverWait(driver, 10)
    comments = []
    try:
        # Open the website
        driver.get(url)
        get_url = driver.current_url

        wait.until(EC.url_to_be(url))

        if get_url != url:
            raise Exception('Site url doesnt match')
        WebDriverWait(driver, 20).until(EC.visibility_of_element_located(
            (By.CSS_SELECTOR, ".wpd-comment-text")))

        while True:
            try:
                WebDriverWait(driver, 20).until(EC.element_to_be_clickable(
                    (By.XPATH, "/html/body/div[3]/div/div[1]/main/div/div[2]/div[2]/div[3]/div[3]/div[51]/div/button"))).click()
                print("clicked")
            except TimeoutError:
                print("No more to load")
                break

        print(driver.page_source)
        return comments
    finally:
        # Close the web driver
        driver.quit()
2
  • 1
    but it doesnt work Saying that something "doesn't work" is a poor description of the problem. Instead, tell us what the code actually does, and explain what you wanted instead. Commented May 27, 2023 at 17:51
  • @JohnGordon thanks for suggestions i tried to explain my problem hope it is understandable Commented May 27, 2023 at 18:12

1 Answer 1

0

To use beautifulsoup to load the comments you can use next example:

import requests
from bs4 import BeautifulSoup

api_url = 'https://www.rok.guide/wp-admin/admin-ajax.php'

headers = {
    'X-Requested-With': 'XMLHttpRequest',
    'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:109.0) Gecko/20100101 Firefox/113.0'
}

multipart_form_data = {
    'action': (None, 'wpdLoadMoreComments'),
    'sorting': (None, 'newest'),
    'offset': (None, '0'),
    'lastParentId': (None, '0'),
    'isFirstLoad': (None, '1'),
    'wpdType': (None, ''),
    'postId': (None, '3568'),
    'wpdiscuz_nonce': (None, ''),
}

ofs = 0
while True:
    data = requests.post(api_url, headers=headers, files=multipart_form_data).json()
    soup = BeautifulSoup(data['data']['comment_list'], 'html.parser')

    for c in soup.select('.comment'):
        print(c.select_one('.wpd-comment-text').get_text(strip=True, separator='\n'))
        print('-'*80)

    if not data['data']['is_show_load_more']:
        break

    multipart_form_data['lastParentId'] = (None, data['data']['last_parent_id'])
    ofs += 1
    multipart_form_data['offset'] = (None, ofs)

Prints:


...

Q: In RoK, the Throwing Axeman is which civilization’s special unit?
A: France
--------------------------------------------------------------------------------
Q: In Ark of Osiris, how many teleports does the first alliance to occupy an obelisk earn?
A: 8
--------------------------------------------------------------------------------
Q: Which of the following is not a natural resource?
A: Clothes
--------------------------------------------------------------------------------
Q: French National Day is on July 14 in order to coincide with which historical event?
A: Storming of the Bastille
--------------------------------------------------------------------------------
Q: In Ark of Osiris, how many teleports does the first alliance to occupy an obelisk earn?
A: 8
--------------------------------------------------------------------------------
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.