Scraping with xpath with requests and lxml but having problems

Question

I keep running into an issue when I scrape data with lxml by using the xpath. I want to scrape the dow price but when I print it out in python it says Element span at 0x448d6c0. I know that must be a block of memory but I just want the price. How can I print the price instead of the place in memory it is?

from lxml import html
import requests

page = requests.get('https://markets.businessinsider.com/index/realtime- 
chart/dow_jones')
content = html.fromstring(page.content)

#This will create a list of prices:
prices = content.xpath('//*[@id="site"]/div/div[3]/div/div[3]/div[2]/div/table/tbody/tr[1]/th[1]/div/div/div/span')

#This will create a list of volume:


print (prices)

W Stokvis · Accepted Answer · 2018-08-01 16:05:11Z

3

You're getting generators which as you said are just memory locations. To access them, you need to call a function on them, in this case, you want the text so .text

Additionally, I would highly recommend changing your XPath since it's a literal location and subject to change.

prices = content.xpath("//div[@id='site']//div[@class='price']//span[@class='push-data ']")
prices_holder = [i.text for i in prices]
prices_holder
 ['25,389.06',
 '25,374.60',
 '7,251.60',
 '2,813.60',
 '22,674.50',
 '12,738.80',
 '3,500.58',
 '1.1669',
 '111.7250',
 '1.3119',
 '1,219.58',
 '15.43',
 '6,162.55',
 '67.55']

Also of note, you will only get the values at load. If you want the prices as they change, you'd likely need to use Selenium.

answered Aug 1, 2018 at 16:05

W Stokvis

1,43910 silver badges16 bronze badges

Sign up to request clarification or add additional context in comments.

11 Comments

Kamikaze_goldfish Over a year ago

Maybe close the request? I'm going to look it up to see if it is possible.

W Stokvis Over a year ago

requests.get() is getting the content on load. You could keep running but that requires reloading the page multiple times and that is gonna be taxing. Using selenium, you can load the page once, get the price at the time and then implement a wait or continually check if the price has changed and store that value in your list.

Kamikaze_goldfish Over a year ago

What would you recommend for me to change the xpath to, since it is a literal path?

W Stokvis Over a year ago

@Kamikaze_goldfish If you want only the top quote: //div[@id='site']//div[@class='box table_distance realtime-pricebox']//div[@class='price']//span[@class='push-data ']. By relying on classes, you can jump from div to div within the html tree instead of going level by level.

Kamikaze_goldfish Over a year ago

I’ll check that out too.

|

T. Ray · Accepted Answer · 2018-08-01 15:58:46Z

1

The variable prices is a list containing a web element. You need to call the text method to extract the value.

print(prices[0].text)

'25,396.03'

answered Aug 1, 2018 at 15:58

T. Ray

6414 silver badges10 bronze badges

Collectives™ on Stack Overflow

Scraping with xpath with requests and lxml but having problems

2 Answers 2

11 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

11 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related