Site doesn't load in AWS Lambda using playwright

Question

I'm trying to scrape a website using Playwright and Node.js. On hitting the base URL the site redirects to a new URL(contains some auth values and nonce code) and opens the login page. This scraping code works in local system. But when deployed to AWS lambda, the page doesn't redirect to new URL. On taking page screenshots, it shows white blank screen. We are using playwright-aws-lambda(https://www.npmjs.com/package/playwright-aws-lambda) package for playwright to work with AWS. It uses only chromium based browser. Tried few approaches:

waitForLoadState()
waitForURL()
Added custom wait time for site to load.

Some observations after debugging: On printing out responses from network using,

page.on('response', data => {
  console.log(data.url() + " ,, " + data.status())
})

In AWS, some of the responses are not received that are visible in local. Some files like jwks, openid-configuration, app-config are not received, all being xhr requests. After these the new URL should be returned, which doesn't return either.

What could be the issue causing this blockage? TIA.

user17991685 · Accepted Answer · 2024-01-09 08:28:54Z

0

The issue was with chromium browser version. In AWS lambda, the version being downloaded was v83.x which, I guess, was not supported by the site being scraped. Therefore, we used package '@sparticuz/chromium' for chromium binary that was using v119.x and worked fine.

answered Jan 9, 2024 at 8:28

user17991685

591 silver badge11 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Site doesn't load in AWS Lambda using playwright

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related