0

Would like to query the following to obtain all item documents such that the last sale (ordered by soldDate) has a status of 2.

db.items.insertMany([
    { item: 1,
      sales: [
          { soldDate: ISODate("2021-10-04"), status: 1 },
          { soldDate: ISODate("2021-10-05"), status: 2 }
      ]
    },
    { item: 2,
      sales: [
          { soldDate: ISODate("2021-09-29"), status: 3 },
          { soldDate: ISODate("2021-09-24"), status: 1 }
      ]
    },
    { item: 3,
      sales: [
          { soldDate: ISODate("2021-06-01"), status: 3 },
          { soldDate: ISODate("2021-06-12"), status: 2 },
          { soldDate: ISODate("2021-06-07"), status: 1 }
      ]
    }
]);

So in this example, the query would return the following two documents:

    { item: 1,
      sales: [
          { soldDate: ISODate("2021-10-04"), status: 1 },
          { soldDate: ISODate("2021-10-05"), status: 2 }   // triggered by this
      ]
    },
    { item: 3,
      sales: [
          { soldDate: ISODate("2021-06-01"), status: 3 },
          { soldDate: ISODate("2021-06-12"), status: 2 },  // triggered by this
          { soldDate: ISODate("2021-06-07"), status: 1 }
      ]
    }

Thanks for any help.

1 Answer 1

1

You stated: ordered by soldDate which can actually mean two things. Perhaps you want the documents sorted by the array, or perhaps you mean the array is sorted. I assumed the later.

Solution (Array sorted)

db.items.aggregate([
    { $match: { "sales.status": 2} },
    { $unwind: "$sales" },
    { $sort: { "item": 1, "sales.soldDate": 1} },
    { $group: { "_id": "$_id", "item": { $first: "$item" }, "sales": { $push: "$sales" } } }
])

Results

Enterprise replSet [primary] barrydb> db.items.aggregate([
...     { $match: { "sales.status": 2} },
...     { $unwind: "$sales" },
...     { $sort: { "item": 1, "sales.soldDate": 1} },
...     { $group: { "_id": "$_id", "item": { $first: "$item" }, "sales": { $push: "$sales" } } }
... ])
[
  {
    _id: ObjectId("617064519be05d9f1cbab346"),
    item: 1,
    sales: [
      { soldDate: ISODate("2021-10-04T00:00:00.000Z"), status: 1 },
      { soldDate: ISODate("2021-10-05T00:00:00.000Z"), status: 2 }
    ]
  },
  {
    _id: ObjectId("617064519be05d9f1cbab348"),
    item: 3,
    sales: [
      { soldDate: ISODate("2021-06-01T00:00:00.000Z"), status: 3 },
      { soldDate: ISODate("2021-06-07T00:00:00.000Z"), status: 1 },
      { soldDate: ISODate("2021-06-12T00:00:00.000Z"), status: 2 }
    ]
  }
]

But, to be complete here is a solution if you want the documents sorted (and the array not necessarily sorted).

Solution (Documents sorted)

db.items.aggregate([
    { $match: { "sales.status": 2} },
    { $sort: { "sales.soldDate": 1} }
])

Results

Enterprise replSet [primary] barrydb> db.items.aggregate([
...     { $match: { "sales.status": 2} },
...     { $sort: { "sales.soldDate": 1} }
... ])
[
  {
    _id: ObjectId("617064519be05d9f1cbab348"),
    item: 3,
    sales: [
      { soldDate: ISODate("2021-06-01T00:00:00.000Z"), status: 3 },
      { soldDate: ISODate("2021-06-12T00:00:00.000Z"), status: 2 },
      { soldDate: ISODate("2021-06-07T00:00:00.000Z"), status: 1 }
    ]
  },
  {
    _id: ObjectId("617064519be05d9f1cbab346"),
    item: 1,
    sales: [
      { soldDate: ISODate("2021-10-04T00:00:00.000Z"), status: 1 },
      { soldDate: ISODate("2021-10-05T00:00:00.000Z"), status: 2 }
    ]
  }
]

EDIT - After re-reading I believe you want only where the record having a status of 2 is also has the greatest date in the array

Solution (Only last having status of value 2 - docs and array unsorted)

db.items.aggregate([
    { $unwind: "$sales" },
    { $sort: { "item": 1, "sales.soldDate": -1} },
    { $group: { "_id": "$_id", "item": { $first: "$item" }, "sales": { $push: "$sales" } } },
    { $match : { "sales.0.status" : 2 } }
])

Results

Enterprise replSet [primary] barrydb> db.items.aggregate([
...     { $unwind: "$sales" },
...     { $sort: { "item": 1, "sales.soldDate": -1} },
...     { $group: { "_id": "$_id", "item": { $first: "$item" }, "sales": { $push: "$sales" } } },
...     { $match : { "sales.0.status" : 2 } }
... ])
[
  {
    _id: ObjectId("617064519be05d9f1cbab346"),
    item: 1,
    sales: [
      { soldDate: ISODate("2021-10-05T00:00:00.000Z"), status: 2 },
      { soldDate: ISODate("2021-10-04T00:00:00.000Z"), status: 1 }
    ]
  },
  {
    _id: ObjectId("617064519be05d9f1cbab348"),
    item: 3,
    sales: [
      { soldDate: ISODate("2021-06-12T00:00:00.000Z"), status: 2 },
      { soldDate: ISODate("2021-06-07T00:00:00.000Z"), status: 1 },
      { soldDate: ISODate("2021-06-01T00:00:00.000Z"), status: 3 }
    ]
  }
]

EDIT - Add Self Referencing Lookup

db.items.aggregate([
    { $unwind: "$sales" },
    { $sort: { "item": 1, "sales.soldDate": -1} },
    { $group: { "_id": "$_id", "item": { $first: "$item" }, "sales": { $push: "$sales" } } },
    { $match : { "sales.0.status" : 2 } },
    { $lookup : {
            from: "items",
            localField: "_id",
            foreignField: "_id",
            as: "results"
        }
    },
    { $unwind: "$results" },
    { $replaceRoot: { "newRoot": "$results" } }
])

With the self-referencing lookup we are treating MongoDB as a relational database. We find the documents that meet our requirements, but in doing so we have destroyed the original shape and content. By performing a lookup on the same records we can restore the shape but at a performance penalty.

Retain Copy

Rather than performing a lookup, which has a performance concern, a different approach is to leverage memory on the server. Keep a copy of the original while moving through the pipeline and manipulating the original to identify desired records...

db.items.aggregate([
    { $addFields: { "_original": "$$ROOT" } },
    { $unwind: "$sales" },
    { $sort: { "item": 1, "sales.soldDate": -1} },
    { $group: { "_id": "$_id", "_original": { $first: "$_original" }, "sales_status": { $push: "$sales.status" } } },
    { $match : { "sales_status.0" : 2 } },
    { $replaceRoot: { "newRoot": "$_original" } }
])

In this example we keep a copy of the original in the field _original then once we have identified the records we want we pivot the root back to _original. This may put pressure on the WiredTiger cache as we are keeping a duplicate of all selected records in memory during the execution of the pipeline. A $lookup approach also has this memory concern. Two queries would eliminate the cache pressure issues, but behaves like a $lookup and would not perform as well.

Sign up to request clarification or add additional context in comments.

5 Comments

Yes, where the record having a status of 2 is also has the greatest date in the array. Your answer works great, thank you!
But how would I get all fields in the final result? In this example, there were just two fields: item and sales. But what if there 100 fields, and sales as well? How would I get the final result to just look like the full document, but with the filter applied as above?
@RedTailedHawk - I can see 3 different options for getting all the fields. The first approach is to construct the grouping to include all the fields, the second is to perform a lookup on the same collection after identifying the participating records. There are performance implications with the latter. The 3rd option is to try to retain the original documents intact as a payload in the pipeline while manipulating data to identify participating docs. If you can provide some sample data I can try to play with it and see what I come up with. Another idea is simply use 2 queries.
The last is working great (the one that retains the original) but I'm wondering if there's not a way to just use items.find() with a predicate that does similar logic for the qualifying documents. I'm following how the aggregate is working and how it's retaining the original document, but find() works on the original documents themselves. Since I just need the original document (and not a projection or aggregation), I'm wondering if it's possible to apply a more complex predicate to find() and get the same result.
@RedTailedHawk - the challenge using find() is that you want documents where a sorted array element by last date has a related specific value. find() does not incorporate a $unwind to decompose an array and thus the challenge to sort the contents of the array. It may be possible using find() but I am not clear on how to achieve finding the latest element to determine if it has a status of 2. If you could guarantee the array was always sorted descending on field sold_date then you could use find() - db.collection.find({"sales.[0].status": 2}).

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.