4

I have a large set of JSON docs which I am willing to store in a MongoDB.

However, given I am searching and retrieving only against few fields, I was wondering from performance-wise which way it would be better.

One option is to store the large object as JSON/BSON so the doc will look like:

{
    "key_1": "Value1",
    "key_2": "Value2",
    "external_data": {
        "large": {
            "data": [
                "comes",
                "here"
            ]
        }
    }
}

Or alternatively,

{
    "key_1": "Value1",
    "key_2": "Value2",
    "external_data": '{"large":{"data":["comes","here"]}}'
}
4
  • Or store it as minified JSON? Commented Feb 13, 2013 at 14:04
  • 4
    JSON is a "string" - it's a serialization, just like XML. If you're concerned about performance, measure. Commented Feb 13, 2013 at 14:05
  • @MattBall Come on, don't wake him up from his nice dream! :P Commented Feb 13, 2013 at 14:05
  • @MattBall, that's correct, yet, I am not familiar with internal processes when data is being transformed from one format to another, i.e. BSON, JSON string, and for instance Python dict. My question is whether loading a BSON into a python dict with many levels will be bad or better. Another aspect, if internally, Mongo will be better off storing data as "flat-string" rather than BSON object. Commented Feb 13, 2013 at 14:21

3 Answers 3

4
Interesting question, so i took the trouble to check it.


Sort answer is no significant performance difference in writes
here is the code i used for test it using pymongo driver along the results:

    docdict=dict(zip (["key" + str(i) for i in range (1,101)],[ "a"*i for i in range(1,101)]))
    docstr=str(docdict)
    def addIdtoStr(s,id):return {'_id':id,'payload':s} 
    def addIdtoDict(d,id): d.update({'_id':id});return d
    cProfile.run("for i in range(0,100000):x=dbcl.client.tests.test2.insert(addIdtoDict(docdict,i),w=0,j=0)")
     **12301152 function calls (12301128 primitive calls) in 56.089 second**
    dbcl.client.tests.test2.remove({},multi= True)
    cProfile.run("for i in range(0,100000):x=dbcl.client.tests.test2.insert(addIdStr(docstr,i),w=0,j=0)")
     **12201194 function calls (12115631 primitive calls) in 54.665 seconds**

Sign up to request clarification or add additional context in comments.

2 Comments

Your benchmark is, atm, language dependant. You will want to run the databse profiler on that if you want any hope of understanding which is faster.
Sure it is language and driver dependant, but the original poster mentioned using python and still it gives a fair idea on the execution speed and a look at mongo logs do confirm that, also the actual data stored is a factor as well as mongo configuration, so I would suggest bechmarking using actual data and driver of choise.
3

I believe that storing the data in BSON is both performance and space-efficient. And by that you "invest" in future: if you store the data as BSON now, then it'll be possible to db-query it later if such requirement appears.

But anyway, if your concern is performance - you do have to profile it in the production environment, there is no way to tell that "it'll be faster or not".

Comments

0

MongoDB is a BSON document store not a JSON one. MongoDB cannot directly query JSON.

This is a fundamental flaw in your idea here, if you wish to query anything in that doc in a performant manner that can use indexes etc you will want to store it as a BSON document and not as a JSON string in a BSON document.

However if you were to use:

{
    "key_1": "Value1",
    "key_2": "Value2",
    "external_data": '{"large":{"data":["comes","here"]}}'
}

And you would only ever need to query against key_1 and key_2 you could actually find that JSON is not only more space conservative here but also easier to store, being a string (so long as there is no index on that field).

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.