Scrapy for dynamic data website using Angular or VueJs

Question

How do I scrape data using Scrapy Framework from websites which loads data using javascript frameworks? Scrapy download the html from each page requests but some website uses js frameworks like Angular or VueJs which will load data separately.

I have tried using a combination of Scrapy,Selenium and chrome driver to retrieve the htmls which gives the rendered html with content. But when using this method I am not able to retain the session cookies set for selecting country and currency as each request is handled by a separate instance of selenium or chrome.

Please suggest if there is any alternative options to scrape the dynamic content while retaining the session.

Adding the code which i used to set the country and currency

import scrapy
from selenium import webdriver

class SettingSpider(scrapy.Spider):
    name = 'setting'
    allowed_domains = ['example.com']
    start_urls = ['http://example.com/']

    def __init__(self):
        self.driver = webdriver.Chrome()

    def start_requests(self):
        url = 'http://www.example.com/intl/settings'
        self.driver.get(response.url)
        yield scrapy.Request(url, self.parse)

    def parse(self, response):
        csrf = response.xpath('//input[@name="CSRFToken"]/@value').extract_first().strip()
        print('------------->' + csrf)
        url = 'http://www.example.com/intl/settings'

        form_data = {'shippingCountry': 'ARE', 'language': 'en', 'billingCurrency': 'USD', 'indicativeCurrency': '',
                     'CSRFToken:': csrf}
        yield scrapy.FormRequest(url, formdata=form_data, callback=self.after_post)

parik · Accepted Answer · 2018-03-20 14:51:55Z

1

what you said

as each request is handled by a separate instance of selenium or chrome

is not correct,

You can continue to use Selenium and i suggest you to use phantomJS instead of chrome. i can't help more because you didn't put your code.

one example for phantomJS:

from selenium import webdriver
driver = webdriver.PhantomJS()
driver.set_window_size(1120, 800)
driver.get("https://example.com/")
driver.close()

and if you don't want to use Selenium, you can use Splash

Splash is a javascript rendering service with an HTTP API. It's a lightweight browser with an HTTP API, implemented in Python 3 using Twisted and QT5

as @Granitosaurus said in this question

Bonus points for it being developed by the same guys who are developing Scrapy.

edited Mar 20, 2018 at 14:51

answered Mar 20, 2018 at 14:35

parik

2,41614 gold badges43 silver badges70 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Scrapy for dynamic data website using Angular or VueJs

1 Answer 1

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related