How to scrape dynamic webpages by Python

Question

[What I'm trying to do]

Scrape the webpage below for used car data.
http://www.goo-net.com/php/search/summary.php?price_range=&pref_c=08,09,10,11,12,13,14&easysearch_flg=1

[Issue]

To scrape the entire pages. In the url above, only first 30 items are shown. Those could be scraped by the code below which I wrote. Links to other pages are displayed like 1 2 3... but the link addresses seems to be in Javascript. I googled for useful information but couldn't find any.

from bs4 import BeautifulSoup
import urllib.request

html = urllib.request.urlopen("http://www.goo-net.com/php/search/summary.php?price_range=&pref_c=08,09,10,11,12,13,14&easysearch_flg=1")

soup = BeautifulSoup(html, "lxml")
total_cars = soup.find(class_="change change_01").find('em').string
tmp = soup.find(class_="change change_01").find_all('span')
car_start, car_end = tmp[0].string, tmp[1].string

# get urls to car detail pages
car_urls = []
heading_inners = soup.find_all(class_="heading_inner")
for heading_inner in heading_inners:
    href = heading_inner.find('h4').find('a').get('href')
    car_urls.append('http://www.goo-net.com' + href)

for url in car_urls:
    html = urllib.request.urlopen(url)
    soup = BeautifulSoup(html, "lxml")
    #title
    print(soup.find(class_='hdBlockTop').find('p', class_='tit').string)
    #price of car itself
    print(soup.find(class_='price1').string)
    #price of car including tax
    print(soup.find(class_='price2').string)

    tds = soup.find(class_='subData').find_all('td')
    # year
    print(tds[0].string)
    # distance
    print(tds[1].string)
    # displacement
    print(tds[2].string)
    # inspection
    print(tds[3].string)

[What I'd like to know]

How to scrape the entire pages. I prefer to use BeautifulSoup4 (Python). But if that is not the appropriate tool, please show me other ones.

[My environment]

Windows 8.1
Python 3.5
PyDev (Eclipse)
BeautifulSoup4

Any guidance would be appreciated. Thank you.

ahmad valipour · Accepted Answer · 2015-11-19 05:56:02Z

5

you can use selenium like below sample:

from selenium import webdriver
driver = webdriver.Firefox()
driver.get('http://example.com')
element = driver.find_element_by_class_name("yourClassName") #or find by text or etc
element.click()

answered Nov 19, 2015 at 5:56

ahmad valipour

3032 silver badges11 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

ChrisGuest Over a year ago

@dixhom, feel free to click the tick near any answer that comes close to answering the question. A proven history of accepting answers on StackOverflow will encourage more people to answer your subsequent questions.

Sitz Blogz Over a year ago

HI.. Do u think you can help me with stackoverflow.com/questions/43033378/…

ChrisGuest · Accepted Answer · 2015-11-19 05:28:54Z

4

The python module splinter may be a good starting point. It calls an external browser (such as Firefox) and access the browser's DOM rather than dealing with HTML only.

answered Nov 19, 2015 at 5:28

ChrisGuest

3,6284 gold badges37 silver badges56 bronze badges

1 Comment

dixhom Over a year ago

Thank you for your answer. It's my first time to know DOM and I can make it do things like "select this element" and "click that element" or something? Now I'm reading the splinter website.

Collectives™ on Stack Overflow

How to scrape dynamic webpages by Python

[What I'm trying to do]

[Issue]

[What I'd like to know]

[My environment]

2 Answers 2

2 Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

[What I'm trying to do]

[Issue]

[What I'd like to know]

[My environment]

2 Answers 2

2 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related