1

I am trying to scrape data from a stock website, but the problem is that the contents of the table are hidden. The website is http://www.moneycontrol.com/stocks/histstock.php

1.Select Index
2.Select S&P BSE MIDCAP
3.Filter data from Jan 2019 to Jan 2020 to get to the final page 
4.I want to scrape the table contents of this page

This is what I have tried using soup

import requests
from bs4 import BeautifulSoup
link='http://www.moneycontrol.com/stocks/hist_index_result.php?indian_indices=25'
html=requests.get(link)
html.status_code #200
raw=html.content
soup=BeautifulSoup(raw,'html.parser') #have tried with xml and html5lib
soup.find_all('table',{'class':'tblchart'})
#output
[<table border="0" cellpadding="0" cellspacing="0" class="tblchart">
                    </table>]

I have tried using selenium as well but the result is the same.

I am having difficulty trying to get the information.

Any suggestions, answers or a nudge in the right direction will be appreciated.

3
  • Where is your selenium code? Commented Jun 11, 2020 at 18:26
  • My code was working fine with Selenium, I just had to update the Selenium package. Commented Jun 11, 2020 at 18:47
  • see @AndrejKesely answer Commented Jun 11, 2020 at 19:00

2 Answers 2

1

A solution just with BeautifulSoup. The data is loaded dynamically via Ajax, but you can simulate the requests just with requests module:

import requests
from bs4 import BeautifulSoup


data = {
    'mth_frm_mth':'01',
    'mth_frm_yr':'2019',
    'mth_to_mth':'01',
    'mth_to_yr':'2020',
    'hdn':'monthly'
}

url = 'https://www.moneycontrol.com/stocks/hist_index_result.php?indian_indices=26'
soup = BeautifulSoup(requests.post(url, data=data).content, 'html.parser')

all_data = []
for tr in soup.select('.tblchart tr:has(td)'):
    tds = [td.get_text(strip=True) for td in tr.select('td')]
    all_data.append(tds)

# print on screen
print('{:<15}{:<15}{:<15}{:<15}{:<15}'.format('Date', 'Open', 'High', 'Low', 'Close'))
for row in all_data:
    print('{:<15}{:<15}{:<15}{:<15}{:<15}'.format(*row))

Prints:

Date           Open           High           Low            Close          
Jan 2020       13720.24       14946.21       13686.28       14667.96       
Dec 2019       13584.07       13716.74       13103.54       13699.37       
Nov 2019       13598.71       13729.32       13310.46       13560.57       
Oct 2019       13190.78       13583.13       12669.63       13558.05       
Sep 2019       12536.96       13648.30       12321.25       13170.76       
Aug 2019       12698.94       12755.07       11950.86       12534.70       
July 2019      14275.76       14375.47       12492.30       12692.18       
June 2019      14882.18       15022.09       13803.07       14239.33       
May 2019       14653.64       15039.53       13693.41       14867.04       
Apr 2019       15069.13       15229.85       14585.92       14624.56       
Mar 2019       13719.93       15034.53       13719.80       15027.36       
Feb 2019       13961.93       14064.51       13099.46       13689.84       
Jan 2019       14724.03       14790.99       13652.03       13926.22       
Sign up to request clarification or add additional context in comments.

2 Comments

Thanks, this is what I was looking for using beautifulSoup. I had a question, I am unable to understand what is happening here '.tblchart tr:has(td)'
@NitinKumar '.tblchart tr:has(td)' is CSS selector: it means select all <tr> tags that have <td> inside and the <tr> tags are under tag with class=tblchart More here w3schools.com/cssref/css_selectors.asp
0

Okay guys I actually solved it using selenium, I had to update my selenium package and it worked like a charm.

Here is how I did it:

  import pandas as pd
  from selenium import webdriver

  link='http://www.moneycontrol.com/stocks/histstock.php'

  driver=webdriver.Chrome()
  driver.get(link)

  #selecting the index in Step 1
  driver.find_element_by_xpath('//*[@id="wutabs2"]').click()

  #Selecting from the dropdown Index options in step 2
  drop=driver.find_element_by_xpath('//*[@id="indian_indices"]')
  drop.click()
  drop.send_keys('S&P BSE MIDCAP')      

  #select the month in step 3

  month=driver.find_element_by_xpath('/html/body/div[3]/div[3]/div/div[7]/div[2]/div[6]/table/tbody/tr/td[3]/form/div[2]/select[2]')
  month.click()
  month.send_keys('2019')

  #click on search 
  driver.find_element_by_xpath('/html/body/div[3]/div[3]/div/div[7]/div[2]/div[6]/table/tbody/tr/td[3]/form/div[4]/input[1]').click()

  #getting the contents
  for i in driver.find_elements_by_css_selector('table.tblchart'):
       a=i.text

  a=a.split('\n')

  #storing it as a data frame
  df=pd.DataFrame(a)

  #removing the first column as it contained table headers
  df.drop(df.iloc[0:1,:],inplace=True)

  #splitting the columns using space and storing them seperately
  df['Month']=df[0].str.split(' ', expand=True)[0]
  df['Year']=df[0].str.split(' ', expand=True)[1]
  df['Open']=df[0].str.split(' ', expand=True)[2]
  df['High']=df[0].str.split(' ', expand=True)[3]
  df['Low']=df[0].str.split(' ', expand=True)[4]
  df['Close']=df[0].str.split(' ', expand=True)[5]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.