7

I have one big mongodb collection (3-million docs, 50 GigaBytes), and it would be very slow to query the data even I have created the indexs.

db.collection.find({"C123":1, "C122":2})

e.g. the query will be timeout or will be extreme slow (10s at least), even if I have created the separate indexes for C123 and C122.

Should I create more indexs or increase the physical memory to accelerate the querying?

4
  • 1
    What's up with the aggregation-framework tag? The query in the question does not use it. Commented Feb 17, 2017 at 9:24
  • 1
    Sorry, I assumed aggregate-$match is the same as find() Commented Feb 17, 2017 at 9:27
  • It is indexes rather than indexs. Commented Feb 12, 2021 at 8:54
  • This seems to be an old question. But did you find a good enough answer. Creating indexes doesn't seem to be a solution because in case of an Advanced Search feature, search can be applied on a lot of fields. Commented Dec 1, 2024 at 10:45

4 Answers 4

9

For such a query you should create compound indexes. One on both fields. And then it should be very efficient. Creating separate indexes won't help you much, because MongoDB engine will use first to get results of first part of query, but second if is used won't help much (or even can slow down in some cases your query because of lookup in indexes table and then in real data again). You can confirm used indexes by using .explain() on your query in shell.

See compound indexes:

https://docs.mongodb.com/manual/core/index-compound/

Also consider sorting directions on both your fields while making indexes.

Sign up to request clarification or add additional context in comments.

5 Comments

Mongodb can merge indexes for a couple of years now, I think. Still, compound should be better.
Good point @SergioTulentsev, I've made an edit, I knew about merging but in my experience it is not helping much in most cases. However to be honest we should say that.
It seems that I should have more thought on designing the compound indexes, as there are more than 400 keys in this collection.
@ppn029012: 400 keys?! In this case, don't bother with compound indexes. Only maybe for the most frequent combinations (if that's a thing in your app, some combinations being significantly more frequent than others). Just get more hardware.
You should use maximum as many as you have in your queries, not more. If you quering on 2 fields, don't create 400. Maximum indexes per collection is 64 anyway. You can create indexes in background if you don't want to put down servers.
2

The answer is really simple.

  1. You don't need to create more indexes, you need to create the right indexes. Index on field c124 won't help queries on field c123, so no point in creating it.

  2. Use better/more hardware. More RAM, more machines (sharding).

4 Comments

The problem is that mongodb can not finish the query even if I have created the right indexes for each key. Should I have to buy a better hardware to run this statement?
@ppn029012: The best index to serve this exact query is a compound index on two keys, as mentioned in Alan's answer. But it is quite likely, that even with it, your current hardware is just not up to the task.
How big the RAM do I need to operate this 50GB collection?
@ppn029012: Ideally, 50GB + whatever is needed for indexes. Say, another 10-15 GB. Ideally. Depending on specifics of your application, you could do with less.
2
  • Create Right indices and carefully use compound index. (You can have max. 64 indices per collection and 31 fields in compound index)
  • Use mongo side pagination
  • Try to find out most used queries and build compound index around that.
  • Compound index strictly follow sequence so read documentation and do trials
  • Also try covered query for 'summary' like queries

Learned it hard way..

Comments

1

Use skip and limit. Run a loop for 50000 data at once .

https://docs.mongodb.com/manual/reference/method/cursor.skip/

https://docs.mongodb.com/manual/reference/method/cursor.limit/ example :

[
  {
    $group: {
      _id: "$myDoc,homepage_domain",
      count: {$sum: 1},
      entry: {
        $push: {
          location_city: "$myDoc.location_city",
          homepage_domain: "$myDoc.homepage_domain",
          country: "$myDoc.country",
          employee_linkedin: "$myDoc.employee_linkedin",
          linkedin_url: "$myDoc.inkedin_url",
          homepage_url: "$myDoc.homepage_url",
          industry: "$myDoc.industry",
          read_at: "$myDoc.read_at"
        }
      }
    }
  }, {
    $limit : 50000
  }, {
    $skip: 50000
  }
],
{
  allowDiskUse: true
},
print(
  db.Or9.insert({
    "HomepageDomain":myDoc.homepage_domain,
    "location_city":myDoc.location_city
  })
)

1 Comment

Advice using "skip" and "limit" on large collection without proper indexing is a very bad idea.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.