6

My current aggregation is:

db.group_members.aggregate({
  $match: { user_id: { $in: [1,2,3] } }
}, {
  $group: { _id: "$group_id" }
}, {
  $sort: { last_post_at: -1 }
}, {
  $limit: 5
})

For a document structure of:

{
  _id: '...',
  user_id: '...',
  group_id: '...',
  last_post_at: Date,
}

I've also got an index on {user_id: 1, last_post_at: -1}

Since my index is already on last_post_at is the sort useless? I'm not 100% sure how the ordering of this.

My end goal is to replicate this SQL:

SELECT DISTINCT ON (group_id)
FROM group_members
WHERE user_id in [1,2,3]
ORDER_BY last_post_at DESC
LIMIT 5

I'm wondering how to make it performant for a very large group_members and still return it in the right order.

UPDATE: I'm hoping to find a solution that will limit the number of documents loaded into memory. This will be a fairly large collection and accessed very frequently.

3
  • you're missing the grouping operation in your $group phase - you want last_post:{$max:"$last_post_at"} or something like that. Commented Jun 11, 2013 at 21:40
  • Wouldn't that still require that the entire subset of user_id: { $in: [1,2,3] } be stored in memory? Commented Jun 11, 2013 at 22:08
  • the group has to go through all the matched documents - since your sort and limit is based on aggregated value it can't be limited prior to group. Conceivably an optimization is possible that would sort and limit for each user_id value before group but that's not currently implemented in 2.4 MongoDB. Commented Jun 11, 2013 at 22:19

1 Answer 1

7

Put the $sort before the $group, otherwise MongoDB can't use the index to help with sorting.

However, in your query it looks like you want to query for a relatively small number of user_ids compared to the total size of your group_members collection. So I recommend an index on user_id only. In that case MongoDB will have to sort your results in memory by last_post_at, but this is worthwhile in exchange for using an index for the initial lookup by user_id.

Sign up to request clarification or add additional context in comments.

3 Comments

Doesn't sorting first load the entire collection into memory?
No, not if you have an index on the sorted field. If you do have such an index, MongoDB will just iterate it in the sorted order. Otherwise it'll try to sort everything in memory and abort if it uses more than 10% (I think) of RAM.
I ended up going with the second option, after some benchmarking it wound up being much faster than I'd have expected.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.