I like to grab the table content from this page. The following is my code and I got NaN (without the data). How come the numbers are not showing up? How do I grab the table with the corresponding data? Thanks.
1 Answer
You can get a nice json format from the api:
import requests
import pandas as pd
url = 'https://api.blockchain.info/stats'
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36'}
params = {'cors': 'true'}
data = requests.get(url, headers=headers, params=params).json()
# if you want it as a table
df = pd.DataFrame(data.items())
Option 2:
Let the page fully render. There is abetter way to use wait with Selenium, but just quickly threw a 5 second wait in there to show:
from selenium import webdriver
import pandas as pd
import time
url = 'https://www.blockchain.com/stats'
browser = webdriver.Chrome('C:/chromedriver_win32/chromedriver.exe')
browser.get(url)
time.sleep(5)
dfs = pd.read_html(browser.page_source)
print(dfs[0])
browser.close()
Output:
0 1 2 3
0 Blocks Mined 150 150 NaN
1 Time Between Blocks 9.05 minutes 9.05 minutes NaN
2 Bitcoins Mined 1,875.00000000 BTC 1,875.00000000 BTC NaN
8 Comments
saga
option 1: it says ValueError: DataFrame constructor not properly called! option 2: How come it's shown if we use wait?
chitown88
Option 1. What version of pandas are you using?
chitown88
Option 2. The page is dynamic so the table is rendered after you send the request to the url. You just have to wait a second or 2 for that table to get the data.
saga
ver 3.7. By the way I think I am blocked by option 1 somehow. It says "ConnectionError: HTTPSConnectionPool(host='api.blockchain.com', port=443): Max retries exceeded with url: /stats?cors=true (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x00000000091BAC18>: Failed to establish a new connection: [Errno 11004] getaddrinfo failed',))"
chitown88
That’s Python 3.7. I was wondering about pandas. And that could be a possibility. Or possibly you’re being blocked by other means. I’ll look at it again tomorrow, but atleast the selenium option works.
|

page_sourceis going to get you the same thing you get with requests, the html before any js executes.