0

This seems similar to my previous post (i'll link at the bottom), but this is a different url and it uses tables. when i run the following code, i can get all of the data within that extracted:

import requests

from bs4 import BeautifulSoup

url = "https://www.nascar.com/wp-content/plugins/raw-feed/raw-feed.php"
r = requests.get(url)


soup = BeautifulSoup(r.text, "lxml")

try:
     data = soup.find('div', class_='div-col1')
     print(data)

except:
     print("You Get Nothing!")

I then change up the try to

try:
     data = soup.find_all('td', class_='car')
     print(data)

except:
     print("You Get Nothing!")

and I am only getting the info pulled from the thead and not the tbody

Is there something i'm missing, or doing wrong? The further in i try to nail down, i either error out, or just get a return of empty [ ]

Also, this webpage is Dynamic, and i tried what was given to me in my previous thread Old Post, and i understand the layout and coding between the 2 pages is different, but my concern with that is that loading Chrome every time I run the script will be a lot since it will probably need tp be refreshed every 30sec-1min 300-400 times.

2 Answers 2

2

why don't you just go directly with the source, if you see the page source of the link it is getting data from https://www.nascar.com/live/feeds/live-feed.json, with that you can easily get the data in json format and parse it as you like.

import requests
import json

url = "https://www.nascar.com/live/feeds/live-feed.json"
res = requests.get(url)
print(r.json())
Sign up to request clarification or add additional context in comments.

5 Comments

I should have mentioned that i'm very new at this, i didn't know i could do that, but this helps as well. Thank you!!
@johnll, this is the perfect solution for the question. But, I guess it'll help the OP to understand a bit more if you showed how to use the JSON and print something, like, all the names. Also, remove the import json line, it is not needed for response.json() and may confuse others.
@sbiondio, as you said, the page is updating the data continuously (about every 5 secs to be precise) by fetching the data from the link johnll has shown. You can get all the table items from this JSON. Also, requests.json() is way faster than any other approach that uses bs4.
@KeyurPotdar Thank you for the clarification, this helps a lot!!! I'm playing around with what this it outputting now!
@sbiondio, have a look at this question. Maybe it'll help you to understand it better. (Just remember that you don't have to use the seperate json module while using requests which has its own built-in response.json() parser).
0

The data you wish to fetch from that page gets generated dynamically so when you make a http request using requests library, it can't handle that. However, you can try with new library from the same author requests-html. It is capable of handling dynamically generated content. This is how you can go with this new library:

import requests_html

URL = "https://www.nascar.com/wp-content/plugins/raw-feed/raw-feed.php"

with requests_html.HTMLSession() as session:
    r = session.get(URL)
    r.html.render(sleep=5)
    for items in r.html.find('#pqrStatistic tr'):
        data = [item.text for item in items.find("th,td")]
        print(data)

Partial results:

['pos', 'car', 'driver', 'manuf', 'delta', 'laps', 'last lap', 'best time', 'best speed', 'best lap']
['1', '54', 'Kyle Benjamin(i)', '', '--', '161', '36.474', '20.198', '93.752', '8']
['2', '98', 'Grant Enfinger', '', '0.761', '161', '36.402', '20.144', '94.003', '157']
['3', '4', 'Todd Gilliland #', '', '1.407', '161', '36.359', '20.142', '94.013', '158']
['4', '8', 'John H. Nemechek(i)', '', '2.177', '161', '36.304', '20.234', '93.585', '31']
['5', '16', 'Brett Moffitt', '', '3.268', '161', '36.145', '20.359', '93.010', '8']

5 Comments

This may be just what i'm looking for! But when I try to run it, i get all kinds of errors. I installed requests_html, but the slew of errors were: Traceback (most recent call last): File "/Users/salbiondio4/Documents/App Creation/PythonScripts/NASCAR/livefeed.py", line 68, in <module> r.html.render(sleep=5) started with that... it probably doesn't help, but i'll do some digging
It requires python 3.6.
thought that might be the problem, but I'm running in PyCharm with python 3.6.2. Tried in terminal with python3, same errors. the start of it looks like it's trying to download chromium?? "[W:pyppeteer.chromium_downloader] start chromium download. Download may take a few minutes. Traceback (most recent call last):"
Yes, it downloads chromium in the first run. However, in the second or third run (when you experiment for the first time), It should work. Did it fetch you the data along with errors or only the errors you have got so far?
I have only gotten errors, no data. Could it be I always have Chromium install from my previous project? (just trying to come up with thoughts to help)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.