0

Theres a webnovel site, noveltop (.net) and on every chapter page (of a webnovel) is a select drop down which allows you to pick the chapter to jump to.

Using selenium/python with firefox driver (or chrome) I've dumped the page source and all it shows in the html is:

<div class="c-selectpicker selectpicker_chapter chapter-selection chapters_selectbox_holder" data-chapter="chapter-892-892-bet-limit-shocking-change" data-
manga="1189459" data-type="content" data-vol="0">
</div>

So, obviously it's not being loaded/run. I have tried various solutions to try and wait for the page to load fully including...

  1. WebDriverWait(self.selenium_driver,10).until(EC.presence_of_element_located((By.XPATH, '//body')))
  2.     while True:
         page_state = self.selenium_driver.execute_script('return document.readyState;')
         print("wait4js: page state is:", page_state)
         if page_state == "complete":
             break
    

3.self.selenium_driver.implicitly_wait(2)

  1. NEW EDIT: I've also waited for the elements presence to be found, both by xpath/class and also on it's attributes, also for expected condition to the select to be clickable. The dynamic js doesn't seem to kick in , i've tried both the chrome and firefox drivers.

I can't find the elements I need to gather the options. Obviosuly its loading them in at run time and adding to the div the select and all the options.

It should look like this:

<div class="c-selectpicker selectpicker_chapter chapter-selection chapters_selectbox_holder" data-manga="1248315" data-chapter="chapter-1-invincible-after-a-hundred-years-of-seclusion" data-vol="0" data-type="content">          <label>
                                <select class="c-selectpicker selectpicker_chapter selectpicker single-chapter-select" style="" for="volume-id-0">
                                                                    <option class="short " data-limit="40" value="chapter-1-invincible-after-a-hundred-years-of-seclusion" data-redirect="https://noveltop.net/novel/i-stayed-at-home-for-a-century-when-i-emerged-i-was-invincible/chapter-460-460-conflicts-and-chaos-part-2/">Chapter 460  -  460 Conflicts And Chaos (Part 2)</option>

Can someone teach me how to figure this out so that I can use driver.find_elements to gather all the option elements.

Is it an iframe, do I need to click on the div, run a javascript attached to an html element ? Help.... Deeply frustrated with this code weirdness!

Thank you in advance if you can help me. New to selenium so please be kind.

3
  • What's the URL? Commented Jan 1, 2023 at 18:08
  • An example would be found on: noveltop.net/novel/… Commented Jan 1, 2023 at 19:41
  • As I'm new to this, does selenium run the JS when the page is loaded ? I.e what triggers the JS to dynamic populate the empty div , I'm lost. Commented Jan 1, 2023 at 19:48

1 Answer 1

0

You were close in your p.1, but instead of waiting for the whole page - wait for your desired element, i.e.:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.support import expected_conditions as ec

driver = webdriver.Chrome()
driver.maximize_window()
driver.get("https://noveltop.net/novel/i-stayed-at-home-for-a-century-when-i-emerged-i-was-invincible/chapter-460-460-conflicts-and-chaos-part-2/")
options = WebDriverWait(driver, timeout=10).until(ec.presence_of_element_located((By.XPATH, "/html/body/div[1]/div/div/div/div/div/div/div[1]/div/div[1]/div[1]/div/div[2]/div/label/select")))
print(options.get_attribute("innerHTML"))

btw, anytime to check if your element is available AT ALL, you may use delay with time.sleep(seconds) - but don't use it in real code, only for research

Update on disabling site to recognize automated scripts

chrome_options = webdriver.ChromeOptions()
# we should pretend to be a human
chrome_options.add_argument('start-maximized')
chrome_options.add_argument('--disable-web-security')
chrome_options.add_argument('--allow-running-insecure-content')
# personalize chrome profile
chrome_options.add_argument(options['user_data_dir'])
chrome_options.add_argument(options['chrome_profile'])
chrome_options.add_argument('--enable-sync')
# turn off recognition of automation by browser
chrome_options.add_argument('--disable-extensions')
chrome_options.add_experimental_option('useAutomationExtension', False)
chrome_options.add_experimental_option('excludeSwitches', ['enable-automation'])
Sign up to request clarification or add additional context in comments.

6 Comments

Just found out it's a cloudflarew protected site. p.s I have also TRIED webdriver wait for the element with presence of element located and also visibility of all elements located usinmg by XPATH and a selector of '//option[@value and contains(@value, "chapter")]' I think the culprit is either cloudlfare of dynamic js not loading because of cloudlfare
with the code above I could get all the options. then it's not a coding problem
how do you connect to webdriver? try to create chrome session under specific chrome profile. sometimes being personalized helps, as well some additional chrome options. I've updated my answer with snippet I've used to avoid captcha, try if any of it could help
and maybe this could help you too stackoverflow.com/questions/71518406/…
Thanks kadis, will try your suggestion
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.