KeyError: 'playwright_page'

Question

async def errback_close_page(self, failure):
        page = failure.request.meta["playwright_page"]
        await page.close()

def start_requests(self):
        if not self.start_urls and hasattr(self, "start_url"):
            raise AttributeError(
                "Crawling could not start: 'start_urls' not found "
                "or empty (but found 'start_url' attribute instead, "
                "did you miss an 's'?)"
            )
        for url in self.start_urls:
            npo = self.npos[url]
            logging.info("### crawl: %s", url)
            yield scrapy.Request(
                url, callback=self.my_parse, dont_filter=True,meta={"playwright": True, "playwright_include_page": True, 'start_time': datetime.utcnow()}, cb_kwargs={"npo": npo},errback= self.errback_close_page
                
            )

Why am I getting this error and how can I fix this ? i have added the code used to parse as well

async def my_parse(self, response, npo): page = response.meta["playwright_page"]

Traceback (most recent call last):
  File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/twisted/internet/defer.py", line 1065, in adapt
    extracted = result.result()
  File "/home/ec2-user/SageMaker/xx", line 50, in errback_close_page
    page = failure.request.meta["playwright_page"]
KeyError: 'playwright_page'

Ahmed Ellaban · Accepted Answer · 2024-05-29 18:08:51Z

0

I think you didn't pass the playwright_page to the meta

        for url in self.start_urls:
            npo = self.npos[url]
            logging.info("### crawl: %s", url)
            page = #please add the page object
            yield scrapy.Request(
                url, callback=self.my_parse, dont_filter=True,meta={"playwright": True,"playwright_page":page, "playwright_include_page": True, 'start_time': datetime.utcnow()}, cb_kwargs={"npo": npo},errback= self.errback_close_page
                
            )

and not sure from where you will get the page object in case you have it in init method or in the object attributes do this

            page = self.page

also from where did you got this part?

        if not self.start_urls and hasattr(self, "start_url"):
            raise AttributeError(
                "Crawling could not start: 'start_urls' not found "
                "or empty (but found 'start_url' attribute instead, "
                "did you miss an 's'?)"
            )

it looks like you copied it from the library in that case; you don't have to do that, just delete it why shall you notify you and the other script users from using start_url instead of start_urls? you aren't building a library sir

answered May 29, 2024 at 18:08

Ahmed Ellaban

1567 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Nekender Shekhawat Over a year ago

I am new to scrapy , but doesn't setting "playwright_include_page": True makes page object available in meta?

Collectives™ on Stack Overflow

KeyError: 'playwright_page'

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related