0

I'm trying to scrape a website using Playwright and Node.js. On hitting the base URL the site redirects to a new URL(contains some auth values and nonce code) and opens the login page. This scraping code works in local system. But when deployed to AWS lambda, the page doesn't redirect to new URL. On taking page screenshots, it shows white blank screen. We are using playwright-aws-lambda(https://www.npmjs.com/package/playwright-aws-lambda) package for playwright to work with AWS. It uses only chromium based browser. Tried few approaches:

  • waitForLoadState()
  • waitForURL()
  • Added custom wait time for site to load.

Some observations after debugging: On printing out responses from network using,

page.on('response', data => {
  console.log(data.url() + " ,, " + data.status())
})

In AWS, some of the responses are not received that are visible in local. Some files like jwks, openid-configuration, app-config are not received, all being xhr requests. After these the new URL should be returned, which doesn't return either.

What could be the issue causing this blockage? TIA.

1 Answer 1

0

The issue was with chromium browser version. In AWS lambda, the version being downloaded was v83.x which, I guess, was not supported by the site being scraped. Therefore, we used package '@sparticuz/chromium' for chromium binary that was using v119.x and worked fine.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.