1

I'm learning MongoDB and am considering moving a very data-heavy (MySQL) app of mine to it. I've worked hard to improve performance of the MySQL side.

With roughly 90k records in each database, I've run the same query in MySQL and MongoDB. I've read up on MongoDB indexes and since they work very similarly, I've added the same "main" index to MongoDB as we use in MySQL.

However, Mongo is almost three times slower. I've heard a ton about how mongo is generally a lot faster, and even if it's not, it shouldn't be three times slower.

Is there something I'm missing?

This query in MongoDB returns 1000 records in 0.085 seconds:

db.prismData.find(
{
   "x":{
      "$gt":306,
      "$lt":366
   },
   "y":{
      "$gt":35,
      "$lt":95
   },
   "z":{
      "$gt":122,
      "$lt":182
   },
   "epoch":{
      "$gte":1396226195
   },
   "world":"world"
})
.sort( { "epoch" : -1 , "x" : 1 , "z" : 1 , "y" : 1 , "id" : -1} )
.limit(1000);

The explain for the above query:

{
    "cursor" : "BtreeCursor world_1_x_1_z_1_y_1_epoch_1_action_1",
    "isMultiKey" : false,
    "n" : 1000,
    "nscannedObjects" : 7773,
    "nscanned" : 8041,
    "nscannedObjectsAllPlans" : 7881,
    "nscannedAllPlans" : 8149,
    "scanAndOrder" : true,
    "indexOnly" : false,
    "nYields" : 0,
    "nChunkSkips" : 0,
    "millis" : 84,
    "indexBounds" : {
        "world" : [ 
            [ 
                "world", 
                "world"
            ]
        ],
        "x" : [ 
            [ 
                306, 
                366
            ]
        ],
        "z" : [ 
            [ 
                122, 
                182
            ]
        ],
        "y" : [ 
            [ 
                35, 
                95
            ]
        ],
        "action" : [ 
            [ 
                {
                    "$minElement" : 1
                }, 
                {
                    "$maxElement" : 1
                }
            ]
        ]
    },
    "server" : "removed"
}

MySQL runs the entire in query, returns 1000 records in 0.03 seconds.

SELECT id,
       epoch,
       action_id,
       player,
       world_id,
       x,
       y,
       z,
       block_id,
       block_subid,
       old_block_id,
       old_block_subid,
       DATA
FROM prism_data
INNER JOIN prism_players p ON p.player_id = prism_data.player_id
LEFT JOIN prism_data_extra ex ON ex.data_id = prism_data.id
WHERE world_id =
    (SELECT w.world_id
     FROM prism_worlds w
     WHERE w.world = 'world')
  AND (prism_data.x BETWEEN 427 AND 487)
  AND (prism_data.y BETWEEN 36 AND 96)
  AND (prism_data.z BETWEEN -14 AND 46)
  AND prism_data.epoch >= 1396225265
ORDER BY prism_data.epoch DESC,
         x ASC,
         z ASC,
         y ASC,
         id DESC LIMIT 1000;

An explain for this sql:

+----+-------------+------------+--------+----------------+----------+---------+----------------------------------+-------+----------------------------------------------------+
| id | select_type | table      | type   | possible_keys  | key      | key_len | ref                              | rows  | Extra                                              |
+----+-------------+------------+--------+----------------+----------+---------+----------------------------------+-------+----------------------------------------------------+
|  1 | PRIMARY     | prism_data | ref    | epoch,location | location | 4       | const                            | 43925 | Using index condition; Using where; Using filesort |
|  1 | PRIMARY     | p          | eq_ref | PRIMARY        | PRIMARY  | 4       | prism_daily.prism_data.player_id |     1 | NULL                                               |
|  1 | PRIMARY     | ex         | ref    | data_id        | data_id  | 4       | prism_daily.prism_data.id        |     1 | NULL                                               |
|  2 | SUBQUERY    | w          | const  | world          | world    | 767     | const                            |     1 | Using index                                        |
+----+-------------+------------+--------+----------------+----------+---------+----------------------------------+-------+----------------------------------------------------+

The only difference in schema is that some repetitive data like people names, and event names are being stored in the document rather than normalized out using foreign keys in mysql. Based on what I've read this is not really needed in mongo, unless there's more of a many-to-many relationship.

1 Answer 1

3

Hard to provide anything very subjective since we don't have you data and on a large match size like that, it is going to hard to share, even as a sample.

The two things that spring to mind are indicated in your sort and selection. It would seem that the biggest possible reducers of the set are the "world" and "epoch" fields. As such they should be first in the index as in:

db.prismData.ensureIndex({
    "epoch" -1,
    "world": 1,
    "x": 1,
    "z": 1,
    "y": 1,
    "id": -1
})

Then your query should more or less reflect that order, along with the sort, though possibly not even required given the index order:

db.prismData.find(
{
   "epoch":{
      "$gte":1396226195
   },
   "world":"world",
   "x":{
      "$gt":306,
      "$lt":366
   },
   "z":{
      "$gt":122,
      "$lt":182
   },
   "y":{
      "$gt":35,
      "$lt":95
   }
})
.sort( { "epoch" : -1 , "world": 1, "x" : 1 , "z" : 1 , "y" : 1 , "id" : -1 } )
.limit(1000);

So really you are trying to constrain this to using the "smallest" set of data in the index, so looking for things after a specific timestamp "first" makes sense, then constrain by your next logical key, being "world", and then scan the remainder of the set for the ranges.

I would hope at least that then the "epoch" field actually was showing in the indexBounds then, which ( I could be wrong with your data ) does seem to the most likely constraint needed.

Sign up to request clarification or add additional context in comments.

5 Comments

Perfect. I didn't realize that the order of the conditions was this important but that's a big improvement. I've given this a try - the new index the explain, and the 1000 results return in 6ms.
@helion3 Thanks for the comment, I was intrigued. I also presume the scanned numbers were down a lot as well. The index should be listing in the desired order so the sort should not be required.
They are down, around 1300-1600. That's great. The complete query (from the java app) takes 31ms, whereas the mysql query takes 303ms, that's an excellent increase in speed and I'm excited to continue migrating.
So with 100k records this index was amazing, queries returned in 8ms or so. However, with a database of 2.7 million records, it falls over - 5 seconds for the query. Looks like I'll need to analyze the explain a bit and see what improvements I can make.
Nevermind, the query I tried didn't use the epoch in the conditions so the index couldn't even be used.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.