0
async def errback_close_page(self, failure):
        page = failure.request.meta["playwright_page"]
        await page.close()

def start_requests(self):
        if not self.start_urls and hasattr(self, "start_url"):
            raise AttributeError(
                "Crawling could not start: 'start_urls' not found "
                "or empty (but found 'start_url' attribute instead, "
                "did you miss an 's'?)"
            )
        for url in self.start_urls:
            npo = self.npos[url]
            logging.info("### crawl: %s", url)
            yield scrapy.Request(
                url, callback=self.my_parse, dont_filter=True,meta={"playwright": True, "playwright_include_page": True, 'start_time': datetime.utcnow()}, cb_kwargs={"npo": npo},errback= self.errback_close_page
                
            )

Why am I getting this error and how can I fix this ? i have added the code used to parse as well

async def my_parse(self, response, npo): page = response.meta["playwright_page"]

Traceback (most recent call last):
  File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/twisted/internet/defer.py", line 1065, in adapt
    extracted = result.result()
  File "/home/ec2-user/SageMaker/xx", line 50, in errback_close_page
    page = failure.request.meta["playwright_page"]
KeyError: 'playwright_page'
0

1 Answer 1

0

I think you didn't pass the playwright_page to the meta

        for url in self.start_urls:
            npo = self.npos[url]
            logging.info("### crawl: %s", url)
            page = #please add the page object
            yield scrapy.Request(
                url, callback=self.my_parse, dont_filter=True,meta={"playwright": True,"playwright_page":page, "playwright_include_page": True, 'start_time': datetime.utcnow()}, cb_kwargs={"npo": npo},errback= self.errback_close_page
                
            )

and not sure from where you will get the page object in case you have it in init method or in the object attributes do this

            page = self.page

also from where did you got this part?

        if not self.start_urls and hasattr(self, "start_url"):
            raise AttributeError(
                "Crawling could not start: 'start_urls' not found "
                "or empty (but found 'start_url' attribute instead, "
                "did you miss an 's'?)"
            )

it looks like you copied it from the library in that case; you don't have to do that, just delete it why shall you notify you and the other script users from using start_url instead of start_urls? you aren't building a library sir

Sign up to request clarification or add additional context in comments.

1 Comment

I am new to scrapy , but doesn't setting "playwright_include_page": True makes page object available in meta?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.