How to scrape a website with multiple pages with the same url adress using scrapy-playwright

Question

I am trying to scrape a website with multiple pages with the same url using scrapy-playwright. the following script returned only the data of the second page and did not continue to the rest of the pages.

can anyone suggest how I can fix it?

import scrapy
from scrapy_playwright.page import PageMethod
from scrapy.crawler import CrawlerProcess


class AwesomeSpideree(scrapy.Spider):
    name = "awesome"

    def start_requests(self):
        # GET request

            yield scrapy.Request(
                url=f"https://www.cia.gov/the-world-factbook/countries/" ,
                callback = self.parse,
                meta=dict(
                playwright = True,
                playwright_include_page = True,
                playwright_page_methods =  {
                    "click" : PageMethod('click',selector = 'xpath=//div[@class="pagination-controls col-lg-6"]//span[@class="pagination__arrow-right"]'),
                    "screenshot": PageMethod("screenshot", path=f"step1.png", full_page=True)
                    
                },

                )
            
            )


    async def parse(self, response):


        page = response.meta["playwright_page"]
        await page.close()
        print("-"*80)

        CountryLst  = response.xpath("//div[@class='col-lg-9']")

        for Country in CountryLst:

            yield {
                "country_link": Country.xpath(".//a/@href").get()

            }

Muhammad Fahim · Accepted Answer · 2022-12-27 08:01:07Z

0

I see you are trying to fetch URLs of countries from above mentioned URL. if you inspect the Network tab you can see there is one request to one JSON data API. You can fetch all countries URL's from this url

after that if you still want scrap more data from scraped URL's then you can easily scrap because that data is static so there will be no need to use playwright.

Have a good day :)

answered Dec 27, 2022 at 8:01

Muhammad Fahim

538 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Kfir Ben simon Over a year ago

Thank you very much! that helped. but If I still want to do it with a scrapy-playwright , could you tell me what I need to change in the my code?

Collectives™ on Stack Overflow

How to scrape a website with multiple pages with the same url adress using scrapy-playwright

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related