0

I need to remove all documents from my mongo db, which dont exists in new array with objects. So I have array with objects like :

var items = [
{product_id:15, pr_name: 'a', description : 'desc'},
{product_id:44, pr_name: 'b', description : 'desc2'}
{product_id:32, pr_name: 'c', description : 'desc3'}];

and I have array with db values which I get by calling Model.find({}). So now I do it in a 'straight' way:

async.each(products, function (dbProduct, callback) { //cycle for products removing
    var equals = false;

    async.each(items, function(product, callback){
        if (dbProduct.product_id === product.product_id){
            product.description = dbProduct.description;// I need to save desc from db product to new product
            equals = true;
        }
        callback();
    });

    if (!equals) {
        log.warn("REMOVE PRODUCT " + dbProduct.product_id);
        Product.remove({ _id: dbProduct._id }, function (err) {
            if (err) return updateDBCallback(err);
            callback();
        });
    }

});

But its blocks the whole app and its very slow, because I have around 5000 values in my items array and in database too. So its very huge cycle numbers. Maybe there can be a faster way?

UPDATE1 Using code below, from TbWill4321 answer:

var removeIds = [];

// cycle for products removing
async.each(products, function (dbProduct, callback) {
    for ( var i = 0; i < items.length; i++ ) {
        if (dbProduct.product_id === product.product_id) {
            // I need to save desc from db product to new product
            product.description = dbProduct.description;
            // Return early for performance
            return callback();
        }
    }

    // Mark product to remove.
    removeIds.push( dbProduct._id );
    log.warn("REMOVE PRODUCT " + dbProduct.product_id);
    return callback();
}, function() {
    Product.remove({ _id: { $in: removeIds } }, function (err) {
        if (err) return updateDBCallback(err);
        // Continue Here.
        // TODO
    });
});

Its takes around 11 sec(blocks whole web-app) and takes 12 362 878 cycles for me. So maybe somebody can advise me something?

1
  • You have two arrays, one option would be to turn the items array to a hash, every key would be the id. So instead of iterating over the items array all for every dbProduct, you would do something like items[dbProduct.product.id] and get the item with that id Commented Feb 23, 2016 at 19:25

2 Answers 2

1

The Async library does not execute synchronous code in an asynchronous fashion.

5000 items is not a huge number for JavaScript, as I've worked on Big Data set's with 5 million+ points and it doesn't take long. You can get better performance by structuring like this:

var removeIds = [];

// cycle for products removing
async.each(products, function (dbProduct, callback) {
    for ( var i = 0; i < items.length; i++ ) {
        if (dbProduct.product_id === product.product_id) {
            // I need to save desc from db product to new product
            product.description = dbProduct.description;
            // Return early for performance
            return callback();
        }
    }

    // Mark product to remove.
    removeIds.push( dbProduct._id );
    log.warn("REMOVE PRODUCT " + dbProduct.product_id);
    return callback();
}, function() {
    Product.remove({ _id: { $in: removeIds } }, function (err) {
        if (err) return updateDBCallback(err);
        // Continue Here.
        // TODO
    });
});
Sign up to request clarification or add additional context in comments.

9 Comments

sorry, at first look I didnt mentioned return callback();. Yep. it works well.
But will it block the whole web-app for 'some' time?
I don't have the ability to check what it does on your system. If you test it out and it's still slow, make sure to come back with some timings (which commands take how long, etc)
so, I made some checks: its works faster then was, but its takes around 11 sec and does 12 362 878 cycles. so its blocks my app for 11 sec:(
Does it take that much time in the code, or doing the Product.remove call?
|
1

Among the many problems you may have, off the top of my head you may want to start off by changing this bit:

Product.remove({ _id: dbProduct._id }, function (err) {
        if (err) return updateDBCallback(err);
        callback();
});

Being within a .each() call, you'll make one call to the database for each element you want to delete. It's better to store all the ids in one array and then make a single query to delete all elements that have an _id that is in that array. Like this

Product.remove({ _id: {$in: myArrayWithIds} }, function (err) {
        if (err) return updateDBCallback(err);
        callback();
});

On another note, since async will execute synchronously, node.js does offer setImmediate() (docs here), that will execute the function from within the event loop. So basically you can "pause" execution of new elements and serve any incoming requests to simulate "non-blocking" processing.

1 Comment

@MeetJoeBlack added a minor note so you know how you can simulate non-blocking execution so it doesn't halt processing of new requests

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.