Capturing info from console using Python

Question

I'm creating a script where I'm trying to rip m4a files from a website specifically. I'm using BS4 and selenium for this purpose presently.

I'm having some trouble getting the info. The file link is not located in the HTML source for the page. Instead, I can only find it in the console. The link I'm trying to get is here in this image (https://i.sstatic.net/5rUJH.jpg) labeled "audio_url_m4a:".

Here's some sample code I'm using:

from selenium import webdriver
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities\

d = DesiredCapabilities.CHROME
d['loggingPrefs'] = {'browser':'ALL ' }
driver = webdriver.Chrome(r'chromedriver path', desired_capabilities = d)

~~lots of code doing other things not relevant to the post~~

for URL in audm_URL: #this is referencing a line of code where I construct a list of URLs
            driver.get(audm)
            time.sleep(3)

            for entry in driver.get_log('browser'):
                print(entry)

Here is the output I get:


{'level': 'SEVERE', 'message': 'https://audm.herokuapp.com/favicon.ico - Failed to load resource: the server responded with a status of 404 (Not Found)', 'source': 'network', 'timestamp': 1611291689357}
{'level': 'SEVERE', 'message': 'https://cdn.segment.com/analytics.js/v1/5DOhLj2nIgYtQeSfn9YF5gpAiPqRtWSc/analytics.min.js - Failed to load resource: net::ERR_NAME_NOT_RESOLVED', 'source': 'network', 'timestamp': 1611291689357}

Most questions relating to grabbing things from the console point me towards grabbing the logs, but nothing that seems to let me know how to grab those other variables. Any ideas?

Here's a link to a random audio page that I want to grab the file from: https://audm.herokuapp.com/player-embed?pub=newyorker&articleID=5fe0b9b09fabedf20ec1f70c

Thanks everyone!

could you accept and upvote the answer

PDHide
– PDHide

2021-02-02 14:33:26 +00:00
Commented Feb 2, 2021 at 14:33 — PDHide
– PDHide, Commented Feb 2, 2021 at 14:33

PDHide · Accepted Answer · 2021-01-22 09:00:29Z

driver.get(
    "https://audm.herokuapp.com/player-embed?pub=newyorker&articleID=5fe0b9b09fabedf20ec1f70c")

WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.CSS_SELECTOR,"button"))).click()
src=WebDriverWait(driver, 10).until(
    EC.presence_of_element_located((By.CSS_SELECTOR, ".react-player video"))).get_attribute("src")



print(src)

if you just want to get src you can use above code .

you need to import

from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait

If you want to get it through console log then use : IT SEEMS ITS WORKING ONLY FOR HEADLESS I AM INVESTIGATING:

from selenium import webdriver
from selenium.webdriver.chrome.options import Options

options = Options()

options.headless = True

capabilities = webdriver.DesiredCapabilities().CHROME.copy()

capabilities['loggingPrefs'] = {'browser': 'ALL'}
driver = webdriver.Chrome(options=options,desired_capabilities=capabilities)

driver.maximize_window()


time.sleep(3)

driver.get(
    "https://audm.herokuapp.com/player-embed?pub=newyorker&articleID=5fe0b9b09fabedf20ec1f70c")



for entry in driver.get_log('browser'):
    print(entry)

Update

in headless mode w3c is false and hence it is working ,

For non headless mode you have to use:

options.add_experimental_option('w3c', False)

JetJaguar124 · Accepted Answer · 2021-01-24 22:04:16Z

0

This did the trick. I was looking at it the wrong way and wasn't trying to get an src. Thanks for the input!

answered Jan 24, 2021 at 22:04

JetJaguar124

191 silver badge3 bronze badges

Collectives™ on Stack Overflow

Capturing info from console using Python

2 Answers 2

Update

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Update

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related