0

I have an array, one would return an invalid search result on the website, the other will return a valid search.

["sakdjlkasjda", "Assassin's Creed Origins"]

I then map over the array and pass the value to an async function

const cex = games.map((game) => cexSearch(game));

return Promise.all(cex)
  .then(function(g) {
    console.log(g);
    res.send(g);
  });

In the async function, I create a Puppeteer instance, navigate to the URL. The website has an element (without a class or id) that is only displayed where there are no results. For valid results noRecordsDisplay should equal none, where there are no valid results noRecordsDisplay should equal "". However, a few times I've noticed that for a search that should be invalid, noRecordsDisplay equals none, so not sure where I am going wrong here that it works most of the time, but not all the time? Any help would be greatly appreciated.

async function cexSearch(game) {
  const url = 'https://uk.webuy.com/search?stext=' + game;
  const browser = await puppeteer.launch();
  const page = await browser.newPage();
  await page.setUserAgent('Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Safari/537.36');
  await page.goto(url, {
    timeout: 3000000
  });
  const content = '.content-area';
  await page.waitForSelector(content);
  await page.waitForSelector('.content-area > div:not(.searchRcrd)');
  const noRecordsDisplay = await page.evaluate(() => document.querySelector('.content-area > div:not(.searchRcrd)').style.display);
  console.log("display = " + noRecordsDisplay);
  if (noRecordsDisplay === "") {
    return "No Search Results";
  } else {
    //When there is an invalid search it sometimes reaches here and .searchRcrd does not exist so it timesout
    const selector = '.searchRcrd';
    await page.waitForSelector(selector);

    // DO logic

    await browser.close();

    return records;
  }
} 

1 Answer 1

1

There are multiple ways to resolve your problem and be more precise to get results.

To see if there are results,

!!document.querySelector('.searchRcrd') // => Returns true if results are available

Usage:

const noRecordsDisplay = await page.evaluate(() => !!document.querySelector('.searchRcrd'));

Another way is to waitForResponse instead of waitForSelector.

For example,

  • The ajax request used in the search has this part /v3/boxes?q=
  • and the result has response.data, it returns data otherwise null.

Usage:

const finalResponse = await page.waitForResponse(response => response.url().includes('/v3/boxes?q=') && response.status() === 200);
const data = (await finalResponse.json()).data;

EDIT:

Your code does not wait till the page loads completely. To wait for page to load completely, you should use waitUntil options.

Here is full working code.

const puppeteer = require("puppeteer");

const games = ["Does not Exist", "Assassin's Creed Origins"];
const cex = games.map(game => cexSearch(game));

Promise.all(cex).then(function(g) {
  console.log(g);
});

async function cexSearch(game) {
  const url = "https://uk.webuy.com/search?stext=" + game;
  const browser = await puppeteer.launch({ headless: false });
  const page = await browser.newPage();
  await page.goto(url, { waitUntil: "networkidle0" }); // <-- wait for page to load properly

  await page.waitForSelector(".content-area > div:not(.searchRcrd)");
  const noRecordsDisplay = await page.evaluate(
    () => !!document.querySelector(".searchRcrd")
  );
  if (!noRecordsDisplay) {
    console.log(game, ">> No Search Results");
    await browser.close();
    return false;
  }
  console.log(game, ">> Result Exists");
  await browser.close();
  return true;
}

Result:

➜ node app.js
No Search Results
Result Exists
[ false, true ]

EDIT 2:

If you pass 6 elements in that array, the app will try to open 6 instances/chrome windows(!!) at once and most likely hang up due to resources.

It worked 100% fine for me on a machine with 16GB ram though :D . You are opening 6 pages at once, that's a whole different problem. See here for an answer with concurrency.

More tests:

Quantam Break >> No Search Results
FIFA 19 >> Result Exists
asdhsuah >> No Search Results
asucinuasu >> No Search Results
No Man's Sky >> Result Exists
Overcooked 2 >> Result Exists
[ false, true, true, false, true, false ]

Look how the final result has a different order than the console log. It's because of async nature.

You have to see overall picture. If you pass 6 elements, it will open 6 windows, it must wait for the pages to load completely, some will have navigation problem if it's not a good server/computer, or poor internet.

For your future try, you need to study Async Await and Queue, if you want to build something like that which goes thru 100 links and returns results. If you pass 100 elements, it will freeze instantly because it will try to open 100 chrome windows at once. Keep that in mind.

Sign up to request clarification or add additional context in comments.

5 Comments

After trying both these solutions out, neither really were the solution, still getting very intermittent results. The first solution, it always returns false for me, regardless if it's a valid or invalid search. The second solution, sometimes works, but the waitForResponse takes a long, long time, it goes through lots of responses, sometimes quickly, sometimes, it takes 10 seconds to go through each response before even getting to /v3/boxes?q= as there seems to be a lot of responses on this site, so neither solution really works for me at this point. Thanks for the effort though :)
I added updated solution. And the reason for failing was also listed. :)
I tried the above with two values in an array and it works but the following array, with 6 values, it doesn't, again, very intermittent results. ["asdhsuah","Overcooked 2", "No Man's Sky", "asucinuasu", "FIFA 19", "Quantam Break"]` Most of the time I'm getting No Search Results No Search Results, No Search Results, No Search Results, No Search Results, Result Exists in the console. Any idea why?
Check again for EDIT 2.
Thank you very much, I think I may set a limit of 5 now I know how much of a performance issue this will be. I will look into using Queue.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.