2

I keep running into an issue when I scrape data with lxml by using the xpath. I want to scrape the dow price but when I print it out in python it says Element span at 0x448d6c0. I know that must be a block of memory but I just want the price. How can I print the price instead of the place in memory it is?

from lxml import html
import requests

page = requests.get('https://markets.businessinsider.com/index/realtime- 
chart/dow_jones')
content = html.fromstring(page.content)

#This will create a list of prices:
prices = content.xpath('//*[@id="site"]/div/div[3]/div/div[3]/div[2]/div/table/tbody/tr[1]/th[1]/div/div/div/span')

#This will create a list of volume:


print (prices)
0

2 Answers 2

3

You're getting generators which as you said are just memory locations. To access them, you need to call a function on them, in this case, you want the text so .text

Additionally, I would highly recommend changing your XPath since it's a literal location and subject to change.

prices = content.xpath("//div[@id='site']//div[@class='price']//span[@class='push-data ']")
prices_holder = [i.text for i in prices]
prices_holder
 ['25,389.06',
 '25,374.60',
 '7,251.60',
 '2,813.60',
 '22,674.50',
 '12,738.80',
 '3,500.58',
 '1.1669',
 '111.7250',
 '1.3119',
 '1,219.58',
 '15.43',
 '6,162.55',
 '67.55']

Also of note, you will only get the values at load. If you want the prices as they change, you'd likely need to use Selenium.

Sign up to request clarification or add additional context in comments.

11 Comments

Maybe close the request? I'm going to look it up to see if it is possible.
requests.get() is getting the content on load. You could keep running but that requires reloading the page multiple times and that is gonna be taxing. Using selenium, you can load the page once, get the price at the time and then implement a wait or continually check if the price has changed and store that value in your list.
What would you recommend for me to change the xpath to, since it is a literal path?
@Kamikaze_goldfish If you want only the top quote: //div[@id='site']//div[@class='box table_distance realtime-pricebox']//div[@class='price']//span[@class='push-data ']. By relying on classes, you can jump from div to div within the html tree instead of going level by level.
I’ll check that out too.
|
1

The variable prices is a list containing a web element. You need to call the text method to extract the value.

print(prices[0].text)

'25,396.03'

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.