1

I am having difficulties scraping the address from the following weblink, please help me scrape the address.

http://www.salatomatic.com/d/Revesby+17154+Ahlus-Sunnah-Wal-Jamaah-Revesby

the source code for the weblink above is as follow

<td width="100%"><div class="titleBM">Bankstown Masjid </div>Meredith Street, Bankstown, New South Wales 2200</td>

I am trying to scrape the value immediatly after </div>

my current code is not completed but looks like follow

content1 = urllib2.urlopen(url1).read()
soup1 = BeautifulSoup(content1)
div1 = soup1.find('div', {'class':'titleBM'}) #get the div where it's located
span1 = div1.find('</div>')
pos1 = span1.text       

print datetime.datetime.now(), 'street address:  ' , pos1)

2 Answers 2

1

The text is the next sibling of the <div> element, so use next_sibling:

from bs4 import BeautifulSoup
import urllib2
import datetime

url1 = 'http://www.salatomatic.com/d/Revesby+17154+Ahlus-Sunnah-Wal-Jamaah-Revesby'

content1 = urllib2.urlopen(url1).read()
soup1 = BeautifulSoup(content1)
div1 = soup1.find('div', {'class':'titleBM'}) #get the div where it's located
pos1 = div1.next_sibling

print datetime.datetime.now(), 'street address:  ' , pos1

Run it like:

python2 script.py

It yields:

2013-12-03 12:55:41.306271 street address:   9-11 Mavis Street, Revesby, New South Wales 2212
Sign up to request clarification or add additional context in comments.

Comments

-1

This happening because of JavaScript, you should use selenium webdriver to solve this issue:

from selenium.webdriver import Firefox

Find more here Link

1 Comment

I think you were a little quick to jump to selenium on this one. The accepted answer shows how to complete without

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.