Simulating a JavaScript button click with Scrapy

Question

My intent is to run a scrapy crawler on this web page: http://visit.rio/en/o-que-fazer/outdoors/ . However, there's some resources on id="container" that load by a JavaScript button ("VER MAIS") click only. I've read some stuffs about selenium, but I've got nothing.

Community · Accepted Answer · 2017-05-23 12:06:46Z

10

You read right, your best bet would be scrapy + selenium using a Firefox browser or a headless one like PhantomJS for faster scraping.

Example adapted from https://stackoverflow.com/a/17979285/2781701

import scrapy
from selenium import webdriver

class ProductSpider(scrapy.Spider):
    name = "product_spider"
    allowed_domains = ['visit.rio']
    start_urls = ['http://visit.rio/en/o-que-fazer/outdoors']

    def __init__(self):
        self.driver = webdriver.Firefox()
    def parse(self, response):
        self.driver.get(response.url)

        while True:
            next = self.driver.find_element_by_xpath('//div[@id="show_more"]/a')

            try:
                next.click()

                # get the data and write it to scrapy items
            except:
                break

        self.driver.close()

edited May 23, 2017 at 12:06

CommunityBot

11 silver badge

answered Apr 27, 2016 at 20:56

Rafael Almeida

5,2602 gold badges23 silver badges33 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Simulating a JavaScript button click with Scrapy

1 Answer 1

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related