1

We are currently investigating MongoDB as a possible solution for a highly distributed database for scientific data. Given our query requirements we have chosen to go for a single collection consisting of documents, each document representing an object and its properties which number ~450. A typical document would be structured as follows:

d = {'patch': '12345-1,1',
     'X': { (120 key-value pairs) },
     'A': { (64 key-value pairs) },
     ...
     (4 more such embedded documents)
    }

Within X, there is an integer flag. The flag is a 32 bit integer, each bit representing a Boolean flag. This is a common method of storing boolean flags when their number is rather large. There is a lookup table which shows which position corresponds to what Boolean property. There is a 15th bit which is of relevance to our specific set of queries. The total number of documents are 600,000, sharded across 3 desktops (8GB RAM and i7 CPU, standard 5400 RPM spinning hard drives).

The query being written is a simple - we want a count of all documents for which the 15th bit of a particular flag integer is set to 1.

db.coll.find(
    {'X.flag1': {$bitsAllSet: [14]}}
).count()

The average time taken for this query is 19,783 ms. This is not an acceptable time for us. We tried to improve this using an aggregation instead of a standard find() based query.

db.coll.aggregate([
    '$match': {
        'X.flag1': {$bitsAllSet: [14]}
    },
    '$group': {
        _id: 0,
        count: {$sum: 1}
    }
])

This takes about 10,000 ms. While this is an improvement (which I think is because of the highly efficient C++ implementation of the Aggregation framework), it is still beyond the kind of performance we desire. The next step was to actually isolate the flag hidden in the 15th bit and make it a separate key in the document. This would result in the same queries as above but instead of using $bitsAllSet: [14], we would use X.is_primary: 1. For find() and aggregate() the respective times were 19,000 ms and 8,500 ms respectively. There is very little improvement.

So, my two questions which I hope people may help with are:

  • Is this the final performance I can expect from MongoDB Community Edition? I am aware that there is an Enterprise Edition which will come with an In-Memory Engine. But my question is more specific to the Community Edition. Is there any trick that I could use to speed up the query?
  • I am slowly finding that atleast for complex server side analytics and querying that we need, MongoDB is proving to be hard to use both in terms of the complexity of queries we are writing as well as the performance bottle necks. Any advice on what other databases I may consider.

Edit: As suggested, I am sharing the output of the .explain(). The output is for a collection where is_primary is not indexed. But as discussed in the comments section, for a Boolean value, the presence of an index should not make a difference to the performance a query based on Boolean flags.

Pastebin Link (expiry of 2 weeks)

8
  • You can try explain() to get more in-depth details of your query, also creating an index on X.is_primary to speed up your $match in pipeline ? Commented Sep 26, 2019 at 5:34
  • Thanks. We did try to go through the explain() command but I'm not sure how to make it better. Indexing has no effect. This is not surprising to me because a Boolean quantity (a low cardinality quantity) cannot really benefit from indexing. Commented Sep 26, 2019 at 7:02
  • That is true, I forgot that it was just a boolean, can you add the explain() result into your question as well? Might be useful Commented Sep 26, 2019 at 7:03
  • Thanks for your interest in helping out. Modified my post with a Pastebin link to the output from explain(). Commented Sep 26, 2019 at 7:42
  • Can you rerun the explain()? It has a different way with aggregate method it seems Commented Sep 26, 2019 at 8:30

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.