I'm trying to scrape a website using Playwright and Node.js. On hitting the base URL the site redirects to a new URL(contains some auth values and nonce code) and opens the login page. This scraping code works in local system. But when deployed to AWS lambda, the page doesn't redirect to new URL. On taking page screenshots, it shows white blank screen. We are using playwright-aws-lambda(https://www.npmjs.com/package/playwright-aws-lambda) package for playwright to work with AWS. It uses only chromium based browser. Tried few approaches:
- waitForLoadState()
- waitForURL()
- Added custom wait time for site to load.
Some observations after debugging: On printing out responses from network using,
page.on('response', data => {
console.log(data.url() + " ,, " + data.status())
})
In AWS, some of the responses are not received that are visible in local. Some files like jwks, openid-configuration, app-config are not received, all being xhr requests. After these the new URL should be returned, which doesn't return either.
What could be the issue causing this blockage? TIA.