How do I grab the table content from a webpage with javascript using python?

Question

I like to grab the table content from this page. The following is my code and I got NaN (without the data). How come the numbers are not showing up? How do I grab the table with the corresponding data? Thanks.

page_source is going to get you the same thing you get with requests, the html before any js executes. — SuperStew
– SuperStew, Commented Apr 9, 2019 at 15:08

chitown88 · Accepted Answer · 2019-04-09 15:37:19Z

1

You can get a nice json format from the api:

import requests
import pandas as pd

url = 'https://api.blockchain.info/stats'

headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36'}
params = {'cors': 'true'}

data = requests.get(url, headers=headers, params=params).json()

# if you want it as a table
df = pd.DataFrame(data.items())

Option 2:

Let the page fully render. There is abetter way to use wait with Selenium, but just quickly threw a 5 second wait in there to show:

from selenium import webdriver
import pandas as pd
import time

url = 'https://www.blockchain.com/stats'


browser = webdriver.Chrome('C:/chromedriver_win32/chromedriver.exe')
browser.get(url)
time.sleep(5)

dfs = pd.read_html(browser.page_source)
print(dfs[0])

browser.close()

Output:

                    0                   1                   2   3
0         Blocks Mined                 150                 150 NaN
1  Time Between Blocks        9.05 minutes        9.05 minutes NaN
2       Bitcoins Mined  1,875.00000000 BTC  1,875.00000000 BTC NaN

edited Apr 9, 2019 at 15:37

answered Apr 9, 2019 at 15:29

chitown88

29.1k6 gold badges34 silver badges67 bronze badges

Sign up to request clarification or add additional context in comments.

8 Comments

saga Over a year ago

option 1: it says ValueError: DataFrame constructor not properly called! option 2: How come it's shown if we use wait?

chitown88 Over a year ago

Option 1. What version of pandas are you using?

chitown88 Over a year ago

Option 2. The page is dynamic so the table is rendered after you send the request to the url. You just have to wait a second or 2 for that table to get the data.

saga Over a year ago

ver 3.7. By the way I think I am blocked by option 1 somehow. It says "ConnectionError: HTTPSConnectionPool(host='api.blockchain.com', port=443): Max retries exceeded with url: /stats?cors=true (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x00000000091BAC18>: Failed to establish a new connection: [Errno 11004] getaddrinfo failed',))"

chitown88 Over a year ago

That’s Python 3.7. I was wondering about pandas. And that could be a possibility. Or possibly you’re being blocked by other means. I’ll look at it again tomorrow, but atleast the selenium option works.

|

Collectives™ on Stack Overflow

How do I grab the table content from a webpage with javascript using python?

1 Answer 1

8 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

8 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related