0

I like to grab the table content from this page. The following is my code and I got NaN (without the data). How come the numbers are not showing up? How do I grab the table with the corresponding data? Thanks.

enter image description here

1
  • page_source is going to get you the same thing you get with requests, the html before any js executes. Commented Apr 9, 2019 at 15:08

1 Answer 1

1

You can get a nice json format from the api:

import requests
import pandas as pd

url = 'https://api.blockchain.info/stats'

headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36'}
params = {'cors': 'true'}

data = requests.get(url, headers=headers, params=params).json()

# if you want it as a table
df = pd.DataFrame(data.items())

Option 2:

Let the page fully render. There is abetter way to use wait with Selenium, but just quickly threw a 5 second wait in there to show:

from selenium import webdriver
import pandas as pd
import time

url = 'https://www.blockchain.com/stats'


browser = webdriver.Chrome('C:/chromedriver_win32/chromedriver.exe')
browser.get(url)
time.sleep(5)

dfs = pd.read_html(browser.page_source)
print(dfs[0])

browser.close()

Output:

                    0                   1                   2   3
0         Blocks Mined                 150                 150 NaN
1  Time Between Blocks        9.05 minutes        9.05 minutes NaN
2       Bitcoins Mined  1,875.00000000 BTC  1,875.00000000 BTC NaN
Sign up to request clarification or add additional context in comments.

8 Comments

option 1: it says ValueError: DataFrame constructor not properly called! option 2: How come it's shown if we use wait?
Option 1. What version of pandas are you using?
Option 2. The page is dynamic so the table is rendered after you send the request to the url. You just have to wait a second or 2 for that table to get the data.
ver 3.7. By the way I think I am blocked by option 1 somehow. It says "ConnectionError: HTTPSConnectionPool(host='api.blockchain.com', port=443): Max retries exceeded with url: /stats?cors=true (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x00000000091BAC18>: Failed to establish a new connection: [Errno 11004] getaddrinfo failed',))"
That’s Python 3.7. I was wondering about pandas. And that could be a possibility. Or possibly you’re being blocked by other means. I’ll look at it again tomorrow, but atleast the selenium option works.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.