How do I scrape data using Scrapy Framework from websites which loads data using javascript frameworks? Scrapy download the html from each page requests but some website uses js frameworks like Angular or VueJs which will load data separately.
I have tried using a combination of Scrapy,Selenium and chrome driver to retrieve the htmls which gives the rendered html with content. But when using this method I am not able to retain the session cookies set for selecting country and currency as each request is handled by a separate instance of selenium or chrome.
Please suggest if there is any alternative options to scrape the dynamic content while retaining the session.
Adding the code which i used to set the country and currency
import scrapy
from selenium import webdriver
class SettingSpider(scrapy.Spider):
name = 'setting'
allowed_domains = ['example.com']
start_urls = ['http://example.com/']
def __init__(self):
self.driver = webdriver.Chrome()
def start_requests(self):
url = 'http://www.example.com/intl/settings'
self.driver.get(response.url)
yield scrapy.Request(url, self.parse)
def parse(self, response):
csrf = response.xpath('//input[@name="CSRFToken"]/@value').extract_first().strip()
print('------------->' + csrf)
url = 'http://www.example.com/intl/settings'
form_data = {'shippingCountry': 'ARE', 'language': 'en', 'billingCurrency': 'USD', 'indicativeCurrency': '',
'CSRFToken:': csrf}
yield scrapy.FormRequest(url, formdata=form_data, callback=self.after_post)