0
import os
from webdriver_manager.chrome import ChromeDriverManager
import time

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

options = Options()
options.add_argument('--ignore-certificate-errors')
options.add_argument('--start-maximized')
options.page_load_strategy = 'eager'

driver = webdriver.Chrome(options=options)
url = "https://www.moneycontrol.com/financials/marutisuzukiindia/ratiosVI/MS24#MS24"
driver.get(url)
wait = WebDriverWait(driver, 20)

I want to find the value of cash EPS (standalone as well as consolidated) but the main problem is, only 5 values are on the page and other values are retrieved with the arrow button till it ends.

How to retrieve such values in one go?

9
  • how about pressing that button? probably can't unless they are already loaded or You have access to their database which renders this method useless in the first place Commented Apr 18, 2021 at 3:15
  • yes but how to know when final value exist because button still exist on that page too. Commented Apr 18, 2021 at 3:17
  • @Matiiss No, dont have access to database, so need to scrap from webpage only Commented Apr 18, 2021 at 3:17
  • well press the button until You don't get any values anymore Commented Apr 18, 2021 at 3:19
  • Thats what I want to know how to know that its of pressing because it wont load any page after last value Commented Apr 18, 2021 at 3:20

2 Answers 2

1

Taking my comment further to the code. Comment: this is a paging element, it's getting href as "javascript:void();" once click are over paging count. If data is still there its has a paging # number there(refer 4 in this case). moneycontrol.com/financials/marutisuzukiindia/ratiosVI/MS24/…. So any one condition can be used for the exit!

comment in code refers to the suggestion.

df_list=pd.read_html(driver.page_source) # read the table through pandas
result=df_list[0] #load the result, which will be eventually appended for next pages.

current_page=driver.find_element_by_class_name('nextpaging') # find elment of span 
while True:
    current_page.click()
    time.sleep(20) # sleep for 20 
    current_page=driver.find_element_by_class_name('nextpaging')
    paging_link = current_page.find_element_by_xpath('..') # get the parent of this span which has the href
    print(f"Currentl url : { driver.current_url } Next paging link : { paging_link.get_attribute('href')} ")
    if "void" in paging_link.get_attribute('href'):
        print(f"Time to exit {paging_link.get_attribute('href')}")
        break # exit rule 

    df_list=pd.read_html(driver.page_source)
    result=result.append(df_list[0]) # append the result
   

enter image description here

Sign up to request clarification or add additional context in comments.

Comments

1

Based on looking at the URL while navigating through this sight

https://www.moneycontrol.com/financials/marutisuzukiindia/ratiosVI/MS24/1#MS24

It appears the arrows navigate to a new URL, incrementing a number in the URL in front of the # symbol.

so, navigating through pages looks like this:

Page1: https://www.moneycontrol.com/financials/marutisuzukiindia/ratiosVI/MS24/1#MS24
Page2: https://www.moneycontrol.com/financials/marutisuzukiindia/ratiosVI/MS24/2#MS24
Page3: https://www.moneycontrol.com/financials/marutisuzukiindia/ratiosVI/MS24/3#MS24
etc...

these separate urls can be used to navigate through this particular website. Probably this would work

def get_pg_url(pgnum):
    return 'https://www.moneycontrol.com/financials/marutisuzukiindia/ratiosVI/MS24/{}#MS24'.format(pgnum)

web scraping requires tuning to fit the target sight. I entered pgnum=10000, which resulted in the text Data Not Available for Key Financial Ratios being displayed. You can probably us this text to tell you when there are no remaining pages.

1 Comment

Thats what my problem was, how to know that next page loading has been ended. Will try this,

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.