How to scrape the actual data from the website in headless mode chrome python

Question

from selenium.webdriver import Chrome
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.keys import Keys

opts = Options()
opts.set_headless()
assert opts.headless  # Operating in headless mode
browser = Chrome(executable_path=r"C:\Users\taksh\AppData\Local\Programs\Python\Python37-32\chromedriver.exe", options=opts)
browser.implicitly_wait(3)
browser.get('https://ca.finance.yahoo.com/quote/AMZN/profile?p=AMZN')

results = browser.find_elements_by_xpath('//*[@id="quote-header-info"]/div[3]/div/div/span[1]')
print(results)

And I get back:

[<selenium.webdriver.remote.webelement.WebElement (session="b3f4e2760ffec62836828e62530f082e", element="3e2741ee-8e7e-4181-9b76-e3a731cefecf")>]

What I actually what selenium to scrape is the price of the stock. I thought i was doing it correctly because this would find the element when I used selenium on Chrome without headless mode. How can I scrape the actual data from the website in headless mode?

NarendraR · Accepted Answer · 2020-05-26 09:03:45Z

1

You need to further extract the data after getting all element in a list.

results = browser.find_elements_by_xpath('//*[@id="quote-header-info"]/div[3]/div/div/span[1]')

for result in results:
    print(result.text)

This will display all the data present in list.

answered May 26, 2020 at 9:03

NarendraR

7,71811 gold badges49 silver badges89 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Jack Jones Over a year ago

I seeeee! So basically if I am using selenium with headlesss mode, any sort of data that I scrape I will have to write this for loop to display it basically correct?

NarendraR Over a year ago

@JackJones, exactly, you should do write a loop to extract data, no matter whether its GUI mode or headless. find_elements returns list of webelement not list of string. .text is there to get individual web element text. in your case while you printing results its printing all weblement present in that list nothing else. If there is single element then go with find_elements

Jack Jones Over a year ago

i see, so basically if for some reason you may get an error when trying to scrape the data, it isn't a bad idea to try find_element instead of find_elements because you might have multiple elements of that type correct?

NarendraR Over a year ago

If there is only one element then use find_element and if scrapping multiple same type of element then find_elements. If element is not there while using find_element then it will throw NoSuchElementException while find_elements returns 0 in this case. So to avoid exception you can use find_elements and check if size >0 the element there else no element. without facing the exception.

KomalP · Accepted Answer · 2020-05-26 10:13:14Z

0

It could be same xpath and locator appearing multiple time in html. So if we can put this code in try-catch while checking in headless mode.

Headless mode basically will scan HTML only so to debug better Try - differnt version of xpath like going to its parent of span and then traversing it

answered May 26, 2020 at 10:13

KomalP

261 bronze badge

Collectives™ on Stack Overflow

How to scrape the actual data from the website in headless mode chrome python

2 Answers 2

4 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

4 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related