Python table scrape returning no data

Question

This seems similar to my previous post (i'll link at the bottom), but this is a different url and it uses tables. when i run the following code, i can get all of the data within that extracted:

import requests

from bs4 import BeautifulSoup

url = "https://www.nascar.com/wp-content/plugins/raw-feed/raw-feed.php"
r = requests.get(url)


soup = BeautifulSoup(r.text, "lxml")

try:
     data = soup.find('div', class_='div-col1')
     print(data)

except:
     print("You Get Nothing!")

I then change up the try to

try:
     data = soup.find_all('td', class_='car')
     print(data)

except:
     print("You Get Nothing!")

and I am only getting the info pulled from the thead and not the tbody

Is there something i'm missing, or doing wrong? The further in i try to nail down, i either error out, or just get a return of empty [ ]

Also, this webpage is Dynamic, and i tried what was given to me in my previous thread Old Post, and i understand the layout and coding between the 2 pages is different, but my concern with that is that loading Chrome every time I run the script will be a lot since it will probably need tp be refreshed every 30sec-1min 300-400 times.

johnII · Accepted Answer · 2018-03-26 16:09:35Z

2

why don't you just go directly with the source, if you see the page source of the link it is getting data from https://www.nascar.com/live/feeds/live-feed.json, with that you can easily get the data in json format and parse it as you like.

import requests
import json

url = "https://www.nascar.com/live/feeds/live-feed.json"
res = requests.get(url)
print(r.json())

answered Mar 26, 2018 at 16:09

johnII

1,4431 gold badge14 silver badges20 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

sbiondio Over a year ago

I should have mentioned that i'm very new at this, i didn't know i could do that, but this helps as well. Thank you!!

Keyur Potdar Over a year ago

@johnll, this is the perfect solution for the question. But, I guess it'll help the OP to understand a bit more if you showed how to use the JSON and print something, like, all the names. Also, remove the import json line, it is not needed for response.json() and may confuse others.

Keyur Potdar Over a year ago

@sbiondio, as you said, the page is updating the data continuously (about every 5 secs to be precise) by fetching the data from the link johnll has shown. You can get all the table items from this JSON. Also, requests.json() is way faster than any other approach that uses bs4.

sbiondio Over a year ago

@KeyurPotdar Thank you for the clarification, this helps a lot!!! I'm playing around with what this it outputting now!

Keyur Potdar Over a year ago

@sbiondio, have a look at this question. Maybe it'll help you to understand it better. (Just remember that you don't have to use the seperate json module while using requests which has its own built-in response.json() parser).

SIM · Accepted Answer · 2018-03-26 16:32:41Z

0

The data you wish to fetch from that page gets generated dynamically so when you make a http request using requests library, it can't handle that. However, you can try with new library from the same author requests-html. It is capable of handling dynamically generated content. This is how you can go with this new library:

import requests_html

URL = "https://www.nascar.com/wp-content/plugins/raw-feed/raw-feed.php"

with requests_html.HTMLSession() as session:
    r = session.get(URL)
    r.html.render(sleep=5)
    for items in r.html.find('#pqrStatistic tr'):
        data = [item.text for item in items.find("th,td")]
        print(data)

Partial results:

['pos', 'car', 'driver', 'manuf', 'delta', 'laps', 'last lap', 'best time', 'best speed', 'best lap']
['1', '54', 'Kyle Benjamin(i)', '', '--', '161', '36.474', '20.198', '93.752', '8']
['2', '98', 'Grant Enfinger', '', '0.761', '161', '36.402', '20.144', '94.003', '157']
['3', '4', 'Todd Gilliland #', '', '1.407', '161', '36.359', '20.142', '94.013', '158']
['4', '8', 'John H. Nemechek(i)', '', '2.177', '161', '36.304', '20.234', '93.585', '31']
['5', '16', 'Brett Moffitt', '', '3.268', '161', '36.145', '20.359', '93.010', '8']

answered Mar 26, 2018 at 16:32

SIM

22.5k6 gold badges45 silver badges116 bronze badges

5 Comments

sbiondio Over a year ago

This may be just what i'm looking for! But when I try to run it, i get all kinds of errors. I installed requests_html, but the slew of errors were: Traceback (most recent call last): File "/Users/salbiondio4/Documents/App Creation/PythonScripts/NASCAR/livefeed.py", line 68, in <module> r.html.render(sleep=5) started with that... it probably doesn't help, but i'll do some digging

SIM Over a year ago

It requires python 3.6.

sbiondio Over a year ago

thought that might be the problem, but I'm running in PyCharm with python 3.6.2. Tried in terminal with python3, same errors. the start of it looks like it's trying to download chromium?? "[W:pyppeteer.chromium_downloader] start chromium download. Download may take a few minutes. Traceback (most recent call last):"

SIM Over a year ago

Yes, it downloads chromium in the first run. However, in the second or third run (when you experiment for the first time), It should work. Did it fetch you the data along with errors or only the errors you have got so far?

sbiondio Over a year ago

I have only gotten errors, no data. Could it be I always have Chromium install from my previous project? (just trying to come up with thoughts to help)

Collectives™ on Stack Overflow

Python table scrape returning no data

2 Answers 2

5 Comments

5 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

5 Comments

5 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related