4

Hello,

I use Node.js to provide an API for storing data on a MongoDB database.

I ran multiple tests on a read method, which takes ids and returns the corresponding documents. The point is that I must return these documents in the specified order. To ensure that, I use the following code:

// Sequentially fetch every element
function read(ids, callback) {
    var i = 0;
    var results = [];
    function next() {
        db.findOne(ids[i], function (err, doc) {
            results.push(err ? null : doc);
            if (ids.length > ++i) {
                return next();
            }
            callback(results);
        });
    }
    next();
}

This way, documents are fetched one-by-one, in the right order. It takes about 11s on my laptop to retrieve 27k documents.

However, I thought that it was possible to improve this method:

// Asynchronously map the whole array
var async = require('async');

function read(ids, callback) {
    async.map(ids, db.findOne.bind(db), callback):
}

After running a single test, I was quite satisfied seeing that the 27k documents were retrieved in only 8s using simpler code.

The problem happens when I repeat the same request: the response time keeps growing (proportionally to the number of elements retrieved): 9s 10s 11s 12s.... This problem does not happen in the sequential version.

I tried two versions of Node.js, v6.2.0 and v0.10.29. The problem is the same. What causes this latency and how could I suppress it?

5
  • 1
    Why don't you use find? Commented Jun 7, 2016 at 16:28
  • See here: stackoverflow.com/questions/8303900/… Commented Jun 7, 2016 at 16:30
  • 1
    try to use async.mapLimit to prevent overload. But find({_id: {$in: list}}) is always better, because single database request instead of multiple. Commented Jun 7, 2016 at 16:30
  • @vp_arth async.mapLimit is exactly what I was looking for! I would like to just find() but I must provide the documents in the input order (Mongo's find returns them sorted by their id). Thanks a lot! Commented Jun 7, 2016 at 18:10
  • May be js-side order restore will be much more efficient. Just try it. I think you can achieve subsecond time for 30k documents. Commented Jun 7, 2016 at 18:17

1 Answer 1

4

Try to use async.mapLimit to prevent overload. You need some tests to tune limit value with your environment.


But find({_id: {$in: list}}) is always better, because single database request instead of multiple.

I suggest you to try to perform restore of original order client-side.
Something like this:

function read(ids, cb) {
  db.find(
    {_id: {$in: ids.map(id => mongoose.Types.ObjectId(id))}},
    process
  );

  function process(err, docs) {
    if (err) return cb(err);
    return cb(null, docs.sort(ordering))
  }
  function ordering(a, b) {
    return ids.indexOf(b._id.toString()) - ids.indexOf(a._id.toString());
  }
}

May be, find query needs to be corrected, I can't to know what exact mongodb driver you use.

This code is first-try, more manual sorting can improve performance alot. [].indexOf is heavy too(O(n)).
But I'm almost sure, even as-is now, it will work much faster.


Possible ordering replacement:

var idHash = {};
for(var i = 0; i < ids.length; i++)
  idHash[ids[i]] = i;
function ordering(a, b) {
  return idHash[b._id.toString()] - idHash[a._id.toString()];
}

Any sort algorithm has O(nlogn) in best case, but we already know result position of each found document, so, we can restore original order by O(n):

var idHash = ids.reduce((c, id, i) => (c[id] = i, c), {});
function process(err, docs) {
  if (err) return cb(err);
  return cb(null, 
    docs.reduce(
      (c, doc) => (c[idHash[doc._id.toString()]] = doc, c),
      ids.map(id => null))) //fill not_found docs by null
}

Functional style makes code flexier. For example this code can be easy modified to use async.reduce to be less sync-blocking.

Sign up to request clarification or add additional context in comments.

6 Comments

Thanks! I used your find solution with the idHash object, but in a new array (more time-efficient than in-place sorting, and I need to fill the empty values). It is not as memory efficient as the async.mapLimit solution, but gives ~1s response times instead of ~7s (with the test array of 27k ids).
Yeah, you actually don't need any sorting, when you already know each element position.
@GnxR, take a look how to implement it in functional style :)
Sorry for the offtopic question but what kind of assignment is this (c[id] = i, c) ? Destructuring? I think I've never seen. Thanks in advance!
no, it just comma operator, it means literally (c, id, i) => {c[id] = i; return c;} Docs
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.