Python retrieving value from URL

Question

I'm trying to write a python script that checks money.rediff.com for a particular stock price and prints it. I know that this can be done easily with their API, but I want to learn how urllib2 works, so I'm trying to do this the old fashioned way. But, I'm stuck on how to use the urllib. Many tutorials online asked me to the "Inspect element" of the value I need to return and split the string to get it. But, all the examples in the videos have the values with easily to split HTML Tags, but mine has it in something like this:

<div class="f16">
<span id="ltpid" class="bold" style="color: rgb(0, 0, 0); background: rgb(255, 255, 255);">6.66</span> &nbsp; 
<span id="change" class="green">+0.50</span> &nbsp; 

<span id="ChangePercent" style="color: rgb(130, 130, 130); font-weight: normal;">+8.12%</span>
</div>

I only need the "6.66" in Line2 out. How do I go about doing this? I'm very very new to Urllib2 and Python. All help will be greatly appreciated. Thanks in advance.

user94559 · Accepted Answer · 2016-08-26 03:56:49Z

2

You can certainly do this with just urllib2 and perhaps a regular expression, but I'd encourage you to use better tools, namely requests and Beautiful Soup.

Here's a complete program to fetch a quote for "Tata Motors Ltd.":

from bs4 import BeautifulSoup
import requests

html = requests.get('http://money.rediff.com/companies/Tata-Motors-Ltd/10510008').content

soup = BeautifulSoup(html, 'html.parser')
quote = float(soup.find(id='ltpid').get_text())

print(quote)

EDIT

Here's a Python 2 version just using urllib2 and re:

import re
import urllib2

html = urllib2.urlopen('http://money.rediff.com/companies/Tata-Motors-Ltd/10510008').read()

quote = float(re.search('<span id="ltpid"[^>]*>([^<]*)', html).group(1))

print quote

edited Aug 26, 2016 at 3:56

answered Aug 26, 2016 at 3:43

user94559

60.3k6 gold badges108 silver badges107 bronze badges

Sign up to request clarification or add additional context in comments.

7 Comments

DeA Over a year ago

Thank you so much for that! Could you explain to me like you would to a kid on how this code works? And how can I do this using urllib2 only? Its ok if you cannot answer my second question of this comment here, but if you could direct me to other sources that explain what to do in such situations, that could be really useful. Thanks a lot again!

Lost Over a year ago

BeautifulSoup Tutorial might be helpful

DeA Over a year ago

Is BS4 the only way to do this nicely? I'd like to know how complicated or complex the urllib2 method is. Any sources/references for that?

user94559 Over a year ago

See my edit for an alternative just using urllib2 and a regular expression. I think Beautiful Soup is much nicer. :-)

DeA Over a year ago

Beautiful indeed! (Pun intended!). Thanks a lot!

|

Lost · Accepted Answer · 2016-08-26 03:44:10Z

1

BeautifulSoup is good for html parsing

from bs4 import BeautifulSoup

##Use your urllib code to get the source code of the page
source = (Your get code here)
soup = BeautifulSoup(source)
##This assumes the id 'ltpid' is the one you are looking for all the time
span = soup.find('span', id="ltpid")
float(span.text)  #will return 6.66

answered Aug 26, 2016 at 3:44

Lost

1,00811 silver badges17 bronze badges

Comments

user6212423 · Accepted Answer · 2016-08-28 22:11:27Z

1

Use BeautifulSoup instead of regex to parse HTML.

answered Aug 28, 2016 at 22:11

user6212423

Collectives™ on Stack Overflow

Python retrieving value from URL

3 Answers 3

7 Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

7 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related