18

I'm having a control flow problem with an application loading a large array of URLs. I'm using Caolan Async and the NPM request module.

My problem is that the HTTP response starts as soon as the function is added to the queue. Ideally I want to build my queue and only start making the HTTP requests when the queue starts. Otherwise the callbacks start firing before the queue starts - causing the queue to finish prematurely.

var request = require('request') // https://www.npmjs.com/package/request
    , async = require('async'); // https://www.npmjs.com/package/async

var myLoaderQueue = []; // passed to async.parallel
var myUrls = ['http://...', 'http://...', 'http://...'] // 1000+ urls here

for(var i = 0; i < myUrls.length; i++){
    myLoaderQueue.push(function(callback){

        // Async http request
        request(myUrls[i], function(error, response, html) {

            // Some processing is happening here before the callback is invoked
            callback(error, html);
        });
    });
}

// The loader queue has been made, now start to process the queue
async.parallel(queue, function(err, results){
    // Done
});

Is there a better way of attacking this?

0

3 Answers 3

34

Using for loops combined with asynchronous calls is problematic (with ES5) and may yield unexpected results (in your case, the wrong URL being retrieved).

Instead, consider using async.map():

async.map(myUrls, function(url, callback) {
  request(url, function(error, response, html) {
    // Some processing is happening here before the callback is invoked
    callback(error, html);
  });
}, function(err, results) {
  ...
});

Given that you have 1000+ url's to retrieve, async.mapLimit() may also be worth considering.

Sign up to request clarification or add additional context in comments.

7 Comments

Works perfectly. Thanks.
How would that be less problematic in ES6?
@xShirase ES6 has block scoping using let, which would scope i to the for block; in the existing code, with var, you'd run into scoping issues when referencing myUrls[i]
damn, you're right, and that's awesome! thanks for taking the time to answer
@Jonny depends on your exact requirements, but I guess any Promise-based HTTP library (got, http.min, axios) combined with Promise.allSettled() (or a Promise-based rate limiter/queue)
|
9

If you're willing to start using Bluebird and Babel to utilize promises and ES7 async / await you can do the following:

let Promise = require('bluebird');
let request = Promise.promisify(require('request'));

let myUrls = ['http://...', 'http://...', 'http://...'] // 1000+ urls here

async function load() {
  try {
    // map myUrls array into array of request promises
    // wait until all request promises in the array resolve
    let results = await Promise.all(myUrls.map(request));
    // don't know if Babel await supports syntax below
    // let results = await* myUrls.map(request));
    // print array of results or use forEach 
    // to process / collect them in any other way
    console.log(results)
  } catch (e) {
    console.log(e);
  }
}

4 Comments

Can't use this answer since my project is ES5 based and don't want to rewrite the entire program
...but, cudos for posting Node.js ES7!
You can also do await* myUrls.map(request), instead of Promise.all.
@MichaelVayvala That syntax might still work with Babel, but it is no longer part of the spec.
0

I'm pretty confident you experiencing the results of a different error. By the time your queued functions are evaluating, i has been redefined, which might result in it appearing like you missed the first URLs. Try a little closure when you are queing the functions.

var request = require('request') // https://www.npmjs.com/package/request
    , async = require('async'); // https://www.npmjs.com/package/async

var myLoaderQueue = []; // passed to async.parallel
var myUrls = ['http://...', 'http://...', 'http://...'] // 1000+ urls here

for(var i = 0; i < myUrls.length; i++){
    (function(URLIndex){
       myLoaderQueue.push(function(callback){

           // Async http request
           request(myUrls[URLIndex], function(error, response, html) {

               // Some processing is happening here before the callback is invoked
               callback(error, html);
           });
       });
    })(i);
}

// The loader queue has been made, now start to process the queue
async.parallel(queue, function(err, results){
    // Done
});

2 Comments

My answer may be correct, but robertklep's is better. Wish he had posted 5 minutes earlier or I was 5 minutes slower.
Yeah familiar problem :-) BTW, myUrls[i] should be myUrls[URLIndex].

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.