1

I have a small code written in JavaScript that get the content of html pages then do a process on them (crawler). the problem is request causes asynchronous execution. I tried to use Promises and async & await but still got the same problem asynchronous execution , the reason is i want to crawl multiple pages at once in order to move to the next objective. Here is a similar code of what i have here :

const rootlink= 'https://jsonplaceholder.typicode.com/posts/';

async function f (){
    await f1()
    f3()
}

async function f1(){
    return new Promise(async (resolve,reject)=>{
        log('f1 start');
         for(let i=1;i<11;i++){
            await request(rootlink+i,(err, res,html)=>{
                if(!err && res.statusCode==200){
                    log('link '+i +' done');
                    resolve();
                }
                else reject()
            })
        }
    })
}

function f3(){
    console.log('f3')
}

f()

the result should be : f1 start link 1 done link 2 done link 3 done link 4 done link 5 done link 6 done link 7 done link 8 done link 9 done link 10 done f3

instead of f1 start link 1 done f3 link 2 done link 3 done link 4 done link 5 done link 6 done link 7 done link 8 done link 9 done link 10 done

4
  • 1
    It seems that what you want to achieve is not really asynchronous, you might want to perform a synchronous request instead Commented Oct 28, 2019 at 17:35
  • yes that's what i really want Commented Oct 28, 2019 at 17:37
  • return new Promise(async (... is a bit weird. Why not just return (async (...) => {...})(). Probably won't actually change anything, but removing unnecessary code & indentation is never bad for debugging. Commented Oct 28, 2019 at 17:38
  • take a look at this: stackoverflow.com/questions/37576685/… Commented Oct 28, 2019 at 17:42

3 Answers 3

1

NOTE: I would use an isomorphic fetch package like node-fetch to create code that could be used in multiple environments. Even if you don't plan to use this in a browser, becoming familiar with the API is very beneficial for future use. At very least, this idea allowed me to write a code snippet that you can actually run on StackOverflow.

Promise.all() is your answer no matter what package you use, though. You can simply wait for ALL the promises to resolve, then do your logic:

// const fetch = require('node-fetch')

const fetchData = (...args) => fetch(...args).then(r => {
  if (!r.ok) throw new Error('Error!')
  return r.json()
})

const getAllPostsAsync = (postIds) => Promise.all(
  postIds.map(postId => fetchData(`https://jsonplaceholder.typicode.com/posts/${postId}`))
)

;(async () => {
  const posts = await getAllPostsAsync([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
  
  // TODO: Your logic here, after waiting for all posts to load
  console.log(posts)
})()

Sign up to request clarification or add additional context in comments.

Comments

0

Given that you want to do multiple asychronous operations in parallel, you actually don't want to await them, as this would block your function.

First, I would say it's better to find a HTTP library that uses promises. The one you use has callbacks, but I believe the request project also has a request-promise package that's much easier to use.

Here's a fixed version of your f1 function that uses promises more correctly. Note that this is not parallelized yet.

const request = require('request-promise');

async function f1(){
  log('f1 start');
  for(let i=1;i<11;i++){
     const res = await request(rootlink+i);
     if(res.statusCode==200){
        log('link '+i +' done');
     }
  }
}

Here is another version of this function, except now it's fully parallelized.

async function f1(){
  log('f1 start');
  const promises = [];
  for(let i=1;i<11;i++){
     promises.push(
       request(rootlink+i).then( (res) => {
         if(res.statusCode==200){
           log('link '+i +' done');
         }
       })
     );
  }

  await Promise.all(promises);
}

This can be made a bit more elegant if this is split up in multiple functions:

async function f1(){
  log('f1 start');
  const promises = [];
  for(let i=1;i<11;i++){
     promises.push(checkLink(i));
  }

  await Promise.all(promises);
}

async function checkLink(i) {
  const res = await request(rootlink+i);
  if (res.statusCode==200){
     log('link '+i +' done');
  }
}

2 Comments

Thank you Sir for the answer
The problem is in my case i have to do it in a synchronous operations in order to store results later in a file. reality my second function (f2) crawl according to result given by the first function (f1). then there is another function that store results when the 2 previews functions finish doing they job.
0

The answer of my question is to use an synchronous request 'sync-request' from https://www.npmjs.com/package/sync-request

2 Comments

This package warns "you should not be using this in a production application". I would advise against settling with sync-request as your solution
Yes i saw that warning. Its my only solution for the moment till i figure out a better solution ( i ll think of another way to do this job with avoiding synchronous operations)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.