Scraping a table with Python

Question

I'm trying to scrape this table (https://futures.tradingcharts.com/marketquotes/ZC.html) with python. I've tried something based on this post, but when I manually inspect the source of the website, I don't see the table. How do I scrape this table?

<div class="mq_page_wrapper">

 <script type="text/javascript">
    $(document).ready(function(){
      generateTCPSLink();
    });

    function generateTCPSLink(){
      var root = location.protocol + '//' + location.host;

      var url_param = {
        action:'tcps_logged_in',
        timestamp: (new Date()).getTime()
      };

      $.getJSON(root+'/widgets/footer_ajax/footer_common_functions.php?'+$.param(url_param),function(data){
        if(data.logged_in){
          $('span#tcps_link').html("Logout:<br>&nbsp;<a href='"+root+"/premium_subscriber/tcps_logout.php"+"'>Premium Subscriber</a><br>");
        }else{
          $('span#tcps_link').html("Login:<br>&nbsp;<a href='"+root+"/premium_subscriber/login_subscribe.php?premium_link"+"'>Premium Subscriber</a><br>");
        }
      });
    }
 </script>
 <div id="members_classic">
   <span id="tcps_link"></span>
 </div>

from selenium import webdriver
import time
import os
from bs4 import BeautifulSoup
chrome_path = r"C:\Users\Desktop\chromedriver.exe"
driver = webdriver.Chrome(chrome_path)
driver.get('https://futures.tradingcharts.com/marketquotes/ZC.html')
time.sleep(80)
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
time.sleep(5)
html = driver.page_source
soup = BeautifulSoup(html,'html.parser')
print soup

Open your browser's inspector and watch Network tab while loading the page. You will see that the table is actually generated using the data that comes from another URL (getQuote.json). You will need to post same API keys which you should be able to find by inspecting the request headers. — Selcuk
– Selcuk, Commented Dec 8, 2019 at 23:55

αԋɱҽԃ αмєяιcαη · Accepted Answer · 2019-12-09 00:15:54Z

2

import requests

data = {
    'apikey': '2d8b3b803594b13e02a7dc827f4a63f8',
    'fields': 'settlement,previousClose,previousOpenInterest',
    'symbols': 'ZCY00,ZC*1,ZC*2,ZC*3,ZC*4,ZC*5,ZC*6,ZC*7,ZC*8,ZC*9,ZC*10,ZC*11,ZC*12,ZC*13,ZC*14,ZC*15,ZC*16,ZC*17,ZC*18,ZC*19,ZC*20,ZC*21,ZC*22,ZC*23,ZC*24,ZC*25,ZC*26,ZC*27,ZC*28,ZC*29,ZC*30,ZC*31,ZC*32,ZC*33,ZC*34,ZC*35,ZC*36,ZC*37,ZC*38,ZC*39,ZC*40,ZC*41,ZC*42,ZC*43,ZC*44,ZC*45,ZC*46,ZC*47,ZC*48,ZC*49,ZC*50'
}

r = requests.post(
    'https://ondemand.websol.barchart.com/getQuote.json', data=data).json()

for item in r['results']:
    print(item)

answered Dec 9, 2019 at 0:15

αԋɱҽԃ αмєяιcαη

11.6k3 gold badges23 silver badges58 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

random_dsp_guy Over a year ago

How do you see the args that getQuote.json gets called with in the network tab of chrome?

αԋɱҽԃ αмєяιcαη Over a year ago

@Seth I'm using firefox and the data included in request body

Jack Fleeting · Accepted Answer · 2019-12-09 00:01:59Z

2

This is a way to get the data into json format:

import requests
import json

headers = {
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:70.0) Gecko/20100101 Firefox/70.0',
        'Accept': 'application/json, text/javascript, */*; q=0.01',
        'Accept-Language': 'en-US,en;q=0.5',
        'Content-Type': 'application/x-www-form-urlencoded',
        'Origin': 'https://futures.tradingcharts.com',
        'Connection': 'keep-alive',
        'Referer': 'https://futures.tradingcharts.com/futures/quotes/ZC.html',
        'Cache-Control': 'max-age=0',
        'TE': 'Trailers',
    }

data = {
  'apikey': '2d8b3b803594b13e02a7dc827f4a63f8',
  'fields': 'settlement,previousClose,previousOpenInterest',
  'symbols': 'ZCY00,ZC*1,ZC*2,ZC*3,ZC*4,ZC*5,ZC*6,ZC*7,ZC*8,ZC*9,ZC*10,ZC*11,ZC*12,ZC*13,ZC*14,ZC*15,ZC*16,ZC*17,ZC*18,ZC*19,ZC*20,ZC*21,ZC*22,ZC*23,ZC*24,ZC*25,ZC*26,ZC*27,ZC*28,ZC*29,ZC*30,ZC*31,ZC*32,ZC*33,ZC*34,ZC*35,ZC*36,ZC*37,ZC*38,ZC*39,ZC*40,ZC*41,ZC*42,ZC*43,ZC*44,ZC*45,ZC*46,ZC*47,ZC*48,ZC*49,ZC*50'
}

response = requests.post('https://ondemand.websol.barchart.com/getQuote.json', headers=headers, data=data)

data = json.loads(response.text)
data['results']

And you can take it from there.

answered Dec 9, 2019 at 0:01

Jack Fleeting

25k6 gold badges27 silver badges49 bronze badges

2 Comments

αԋɱҽԃ αмєяιcαη Over a year ago

why you loading json and why you sending headers. you don't need to import json as requests already have. also for calling API with max-age of zero. you don't need headers.

Jack Fleeting Over a year ago

@αԋɱҽԃαмєяιcαη - thank; I'll look into these points when I have more time, but basically I just copies API call as-is.

Collectives™ on Stack Overflow

Scraping a table with Python

2 Answers 2

2 Comments

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related