0

I am trying to scrape a website with multiple pages with the same url using scrapy-playwright. the following script returned only the data of the second page and did not continue to the rest of the pages.

can anyone suggest how I can fix it?

import scrapy
from scrapy_playwright.page import PageMethod
from scrapy.crawler import CrawlerProcess


class AwesomeSpideree(scrapy.Spider):
    name = "awesome"

    def start_requests(self):
        # GET request

            yield scrapy.Request(
                url=f"https://www.cia.gov/the-world-factbook/countries/" ,
                callback = self.parse,
                meta=dict(
                playwright = True,
                playwright_include_page = True,
                playwright_page_methods =  {
                    "click" : PageMethod('click',selector = 'xpath=//div[@class="pagination-controls col-lg-6"]//span[@class="pagination__arrow-right"]'),
                    "screenshot": PageMethod("screenshot", path=f"step1.png", full_page=True)
                    
                },

                )
            
            )


    async def parse(self, response):


        page = response.meta["playwright_page"]
        await page.close()
        print("-"*80)

        CountryLst  = response.xpath("//div[@class='col-lg-9']")

        for Country in CountryLst:

            yield {
                "country_link": Country.xpath(".//a/@href").get()

            }

1 Answer 1

0

I see you are trying to fetch URLs of countries from above mentioned URL. if you inspect the Network tab you can see there is one request to one JSON data API. You can fetch all countries URL's from this url

after that if you still want scrap more data from scraped URL's then you can easily scrap because that data is static so there will be no need to use playwright.

Have a good day :)

Sign up to request clarification or add additional context in comments.

1 Comment

Thank you very much! that helped. but If I still want to do it with a scrapy-playwright , could you tell me what I need to change in the my code?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.