4

I'm trying to write a python script that checks money.rediff.com for a particular stock price and prints it. I know that this can be done easily with their API, but I want to learn how urllib2 works, so I'm trying to do this the old fashioned way. But, I'm stuck on how to use the urllib. Many tutorials online asked me to the "Inspect element" of the value I need to return and split the string to get it. But, all the examples in the videos have the values with easily to split HTML Tags, but mine has it in something like this:

<div class="f16">
<span id="ltpid" class="bold" style="color: rgb(0, 0, 0); background: rgb(255, 255, 255);">6.66</span> &nbsp; 
<span id="change" class="green">+0.50</span> &nbsp; 

<span id="ChangePercent" style="color: rgb(130, 130, 130); font-weight: normal;">+8.12%</span>
</div>

I only need the "6.66" in Line2 out. How do I go about doing this? I'm very very new to Urllib2 and Python. All help will be greatly appreciated. Thanks in advance.

3 Answers 3

2

You can certainly do this with just urllib2 and perhaps a regular expression, but I'd encourage you to use better tools, namely requests and Beautiful Soup.

Here's a complete program to fetch a quote for "Tata Motors Ltd.":

from bs4 import BeautifulSoup
import requests

html = requests.get('http://money.rediff.com/companies/Tata-Motors-Ltd/10510008').content

soup = BeautifulSoup(html, 'html.parser')
quote = float(soup.find(id='ltpid').get_text())

print(quote)

EDIT

Here's a Python 2 version just using urllib2 and re:

import re
import urllib2

html = urllib2.urlopen('http://money.rediff.com/companies/Tata-Motors-Ltd/10510008').read()

quote = float(re.search('<span id="ltpid"[^>]*>([^<]*)', html).group(1))

print quote
Sign up to request clarification or add additional context in comments.

7 Comments

Thank you so much for that! Could you explain to me like you would to a kid on how this code works? And how can I do this using urllib2 only? Its ok if you cannot answer my second question of this comment here, but if you could direct me to other sources that explain what to do in such situations, that could be really useful. Thanks a lot again!
BeautifulSoup Tutorial might be helpful
Is BS4 the only way to do this nicely? I'd like to know how complicated or complex the urllib2 method is. Any sources/references for that?
See my edit for an alternative just using urllib2 and a regular expression. I think Beautiful Soup is much nicer. :-)
Beautiful indeed! (Pun intended!). Thanks a lot!
|
1

BeautifulSoup is good for html parsing

from bs4 import BeautifulSoup

##Use your urllib code to get the source code of the page
source = (Your get code here)
soup = BeautifulSoup(source)
##This assumes the id 'ltpid' is the one you are looking for all the time
span = soup.find('span', id="ltpid")
float(span.text)  #will return 6.66

Comments

1

Use BeautifulSoup instead of regex to parse HTML.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.