0

I am working on a Golang project (db MongoDB). I have executed the below query but it is taking too much time to load the data. In this, I am getting data from 2 collections with multiple stages.

db.getCollection('Collection1').aggregate([
{
    "$lookup": {
        "localField": "uid",
        "from": "collection2",
        "foreignField": "_id",
        "as": "user_info"
    }
},
{
    "$unwind": "$user_info"
},
{
    "$lookup": {
        "localField": "cid",
        "from": "collection3",
        "foreignField": "_id",
        "as": "cust_info"
    }
},
{
    "$lookup": {
        "from": "logs",
        "let":  {"id": "$_id"},
        "pipeline": [
                {"$match": {"$expr": {"$eq": ["$$id", "$log_id"]}}},
                {"$sort": {"log_type": 1}}],
        "as": "logs_data"
    }
},
{
    "$sort": {"logs_data.logged_on":-1}
},
{
    "$skip": 1
},
{
    "$limit": 2
},
])

My requirement is to add 2 time sort within the same query:

  1. Within the logs array "$sort": {"log_type": 1}}
  2. For the end result "$sort": {"logs_data.logged_on":-1}

For this I have tries the following indexes:

{"logged_on" : -1}
{"log_id":1, "log_type":1}

But the query taking still 6-7 sec to execute.

If I remove "$sort": {"logs_data.logged_on":-1} then it works fast but with this sorting it is taking too much time.

how and what can I do to improve the response time.

2
  • Will the logs collection contain any documents that don't match an _id from collection1? Commented Aug 27, 2020 at 9:19
  • @Joe No It is not possible. Logs does not contain any document that don't match the "_id" from collection1 Commented Aug 27, 2020 at 9:40

1 Answer 1

0

What that aggregation is doing:

  1. retrieve all documents from collection1
  2. for each document in collection1, find a single document in collection2
  3. for each document in collection1, find a single document in collection3
  4. for each document in collection1, find all realated documents in logs
  5. for each document in collection1, perform an in-memory sort of the documents retrieved from logs
  6. perform an in-memory sort to order the documents
  7. keep 2 of these documents and discard the rest

For each document in collection1 that is 3 document fetches (plus the unknown number of fetches in #4), 2 index scans, and an in-memory sort.

If there is a non-trivial amount of documents in collection1, that is a ton of work, which is wasted for all but 2 of the documents.

If it is safe to assume that every document in logs contains a log_id that maps back to collection1, you could:

  • create an index on {logged_on:1, log_id:1}
  • start the aggregation on the logs collection
  • sort by logged_on: 1
  • project {logged_on:1, log_id:1, _id:0} (this makes the first part of the aggregation fully covered by the above index)
  • group by log_id, taking the $first value of logged_on
  • sort by logged_on: 1 (grouping distrubs the sort)
  • skip and limit as desired
  • lookup from collection1 with local log_id foreign _id
  • replaceRoot with the newRoot being the looked up document
  • execute the existing pipeline stages you were using - this time they will only be fetching/sorting for the 2 documents you want to return.
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.