1

I was wondering how I could get a value, between some html tags, from some html code using python.

Say I wanted to get the price of a product in an amazon page:

I've got up to:

url = raw_input("Enter the url:\n")
sock = urllib.urlopen(url)
htmlsource = sock.read()
sock.close()

so now I got the html source as a string but I don't know how to extract the price. I've played around with re.search but can't get the right expression.

say the price is between <span class="price">£79.98</span>

What would be the best way to get var1 = 79.98?

3 Answers 3

4

You need to use a HTML Parsing Library. It provides better features than using standard regexs, where you can go wrong easily and it is hard to maintain. Python Standard Library comes with html.parse in py3k and HTMLParser in python2.x series which would help you parse the HTML file and get the values of the tags.

You may also use BeautifulSoup library which many have found easy to use.

from BeautifulSoup import BeautifulSoup
soup = BeautifulSoup('<span class="price">79.98</span>')
t = soup.find('span', attrs={"class":"price"})
print t.renderContents()
Sign up to request clarification or add additional context in comments.

1 Comment

Thanks, any clues on how to use it for that specific example?
2

Parsing html by regex is nasty, error-prone, and generally evil.

import lxml.html

url = raw_input("Enter the url:\n")
root = lxml.html.parse(url).getroot()
res = root.xpath('//span[@class="price"]/text()') or []

print res

returns something like

['\xc2\xa379.98', '\xc2\xa389.98', '\xc2\xa399.98']

Now we are dealing with plain strings and should use regex,

import re

def getPrice(s):
    res =  re.search(r'\d+\.\d+', s)
    if res is None:
        return 0.
    else:
        return float(res.group(0))

prices = map(getPrice, res)
print prices

results in

[79.98, 89.98, 99.98]

Comments

0

As an alternative to BeautifulSoup, you might try lxml. Here's a comparison of the two from the lxml website.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.