2

Have read this doc, it states that index can optimize update operation. Then, I am adding an index to my collection to optimize update operation I am using.

Records in the collection have object as _id, and a timestamp:

{_id: {userId: "sample"}, firstTimestamp: 123, otherField: "abc"}

What I want to do is operate update using query below:

db.userFirstTimestamp.update(
{_id: {userId: "sample"}, firstTimestamp: {$gt: 100}},
{_id: {userId: "sample"}, firstTimestamp: 100, otherField2: "efg"})

I want to store 'first document' based on 'firstTimestamp', field of old and new document can be different, hence it cannot be $set query, it should rewrite document instead. For sample below "otherField" should not be exist, it should be "otherField2" instead.

Based on my understanding on MongoDB doc and this article, I created index as per below

db.sample.createIndex({_id:1, timestamp:1})

Then I try to benchmark the query on an isolated experimental node using MongoDB 3.0.4 with spec below:

  • MongoDB 3.0.4
  • Machine is empty, no other operation, only mongo
  • RAM ~30GB
  • Disk is RAID 0 stripped
  • Collection has 60 million record
  • Average object size 1001 bytes
  • Index size 5.34 gig

When I check the log, many update query take more than 100ms, and when I do mongotop, top of the query is write query which takes ~1000ms. It is a bit slow since it takes that long to do one query.

When I do mongostat, throughput is only 400-500 query per second.

Then I try to do query explain using find query (since update does not support explain)

  • When I am not using projection, it is using default index {_id:1}.
  • When I am using projection for _id and timestamp only, it is using {_id:1, timestamp:1} index.

My question is:

  1. Does index I have created help this update query?
  2. If it is not helping, then how the index should be?
  3. Any other way to optimize this update query?

1 Answer 1

2
  1. Somewhat. But not optimally.

  2. Should be this really, so index on the "element" of the object in the _id key:

    db.sample.createIndex({ "_id.userId": 1, "timestamp": 1 })
    
  3. Use the $set operator and stop overwiting your documents:

    db.sample.update(
        { 
            "_id.userId": "sample", 
            "firstTimestamp": { "$gt": 100 }
        },
        {
            "$set": { "otherfield": "cfg"  }
        }
    )
    

But really your data "should" look like this:

{
    "_id": "sample", 
    "firstTimestamp": 200,
    "otherfield2": "sam"
}

And update like:

    db.sample.update(
        { 
            "_id.userId": "sample", 
            "firstTimestamp": { "$gt": 100 }
        },
        {
            "$set": { 
                "fistTimetamp": 100,
                "otherfield2": "efg"
            }
        }
    )

Or if you insist that fields other than "_id" and "firstTimestamp" are going to change a lot, then rather do this:

{
    "_id": "sample", 
    "firstTimestamp": 200,
    "data": {
        "otherfield2": "sam"
    }
}

When if you just want to replace data then do:

    db.sample.update(
        { 
            "_id.userId": "sample", 
            "firstTimestamp": { "$gt": 100 }
        },
        {
            "$set": { 
                "fistTimetamp": 100,
                "data": {
                   "overwritingField": "efg"
                }
            }
        }
    )

Since "data" can be replaced as an entire object if you wish, or just update a single key:

    db.sample.update(
        { 
            "_id.userId": "sample", 
            "firstTimestamp": { "$gt": 100 }
        },
        {
            "$set": { 
                "fistTimetamp": 100,
                "data.newfield": "efg"
            }
        }
    )

In all cases, try to use the operators rather than replacing the whole object as it typically works out as more traffic and more load to the server.

But overall, what makes sense here is that the "userId" part "should" be the portion of the index that narrows down the results the most. So it definately goes before the timestamp, of which there should be a lot more possible values.

Compound primary keys are fine, but make sure you actually use them. A singular value would not make any sense and could just be assigned to _id. If you can just query on one field of they key as you are here, then you probably don't need a compound object as the primary key.

Your _id in the update suggests that you are getting exact matches for the _id therefore it is not a compound field with other keys. With this being the case, it should just a value in the _id itself.

Also a "range" is okay, but again consider that you are trying to match a single document ( well you don't mention "multi" anywhere ), so again questin why is it needed and either then go for an exact match or at "least" an upper limit.

The $set will "only" update the fields that you specifiy. I think you made a mistake in typing your question though, as the syntax for the "update" portion would not be valid. But use update operators anyway, as they send less traffic by sending a single field, or just the fields you intend to update.

Sign up to request clarification or add additional context in comments.

7 Comments

This downvote campaign is getting a bit old. Whoever you are, find something better to do.
Hi, thank you for your reply! Forget to mention that it must be inplace, it cannot be $set since new and old data might have different fields. Range, in this case, I think is inevitable since I have to store "firstTimestamp" so I have to check the condition during update.
@rendybjunior The main things I am saying here are 1. See your "object as an _id", It appears to only have the userId key and nothing else. If that is the case then you don't need that. 2. I don't think you understand $set here given your reply. What data needs to change in your updates? Your question just lists what looks like another "query" portion where the "update" statement should be.
1. I think I will optimize that part as well, but now I am trying to add index you have suggested to see the improvement 2. I am saying that all the fields need to be replaced, so I can't do $set to certain field only
@rendybjunior $set can use multiple fields. I was really asking you to at least "correct" the upate statement in your question as right now it just looks like you did a copy/paste of the query argument and added a field. If the "fields to be replaced" are really that fluid, then that part should be the embedded object like this (in brief) { "_id": 1, "ts": 10, "data": { "k1": 1, "k2" }} where the keys in "data" are changing. If anything. I personally don't like changing keys all the time in documents, and prefer arrays for such things.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.