Group, sort and select first

Question

Let's say I have a collection like:

{
  couponId: "abc",
  state: "valid",
  date: "2015-11-01"
}
{
  couponId: "abc",
  state: "expired",
  date: "2015-10-01"
}
{
  couponId: "abc",
  state: "invalid",
  date: "2015-09-01"
}
{
  couponId: "xyz",
  state: "invalid",
  date: "2015-11-01"
}
{
  couponId: "xyz",
  state: "expired",
  date: "2015-10-01"
}
{
  couponId: "xyz",
  state: "expired",
  date: "2015-09-01"
}
...

a coupon can be valid/invalid/expired. Now, I want to fetch a list of coupons, where each coupon is selected based on this logic:

if a "valid" coupon exists, use that
else if an "expired" coupon exists, use that
else get the "invalid" coupon.

applying this logic to the above list should yield:

    {
      couponId: "abc", /* for "abc" "valid" exists */
      state: "valid",
      date: "2015-11-01"
    },
    {
      couponId: "xyz", /* for "xyz" "valid" does not exist, use the next best "expired" */
      state: "expired",
      date: "2015-11-01"
    }

basically, valid > expired > invalid

I have thought of using an aggregate operation, trying to emulate a SQL groupby+sort+selectFirst,

db.xyz.aggregate([
  {$sort : { couponId: 1, state: -1 } },
  {$group : { _id : "$couponId", document: {$first : "$$ROOT"} }}
])

And obviously this doesn't work, because the "state" field should have a custom sorting where valid>expired>invalid. So, can custom sorting be achieved in an aggregation?

Or is there a better way of doing what I'm trying to do here?

chridam · Accepted Answer · 2015-10-29 20:02:17Z

1

A better way, though not that clean/pretty (seems a bit hacky to me :P ), is to nest some $cond operator expressions with the above criteria/logic in your $group pipeline stage. Best explained by running the following aggregation pipeline:

db.xyz.aggregate([
    {
        "$group": {
            "_id": "$couponId",
            "state": {
                "$first": {
                    "$cond": [
                        { "$eq": ["$state", "valid"]},
                        "$state",
                        {
                           "$cond": [
                                { "$eq": ["$state", "invalid"]},
                                "$state",
                                "expired"
                            ]
                        }
                    ]
                }
            },
            "date": {
                "$first": {
                    "$cond": [
                        { "$eq": ["$state", "valid"]},
                        "$date",
                        {
                           "$cond": [
                                { "$eq": ["$state", "invalid"]},
                                "$date",
                                {
                                   "$cond": [
                                        { "$eq": ["$state", "expired"]},
                                        "$date", 
                                        "1970-01-01"
                                    ]
                                }
                            ]
                        }
                    ]
                }
            }
        }
    }
])

Sample Output

/* 0 */
{
    "result" : [ 
        {
            "_id" : "xyz",
            "state" : "invalid",
            "date" : "2015-11-01"
        }, 
        {
            "_id" : "abc",
            "state" : "valid",
            "date" : "2015-11-01"
        }
    ],
    "ok" : 1
}

answered Oct 29, 2015 at 20:02

chridam

104k26 gold badges246 silver badges243 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Rnet Over a year ago

Thanks! I was just looking up same thing, on how to translate values on the fly during a query. I tried this, while this works, there is significant impact on the performance. My db has around 1.5m docs, and the actual document is much bigger. Without the 'translation', with default sorting, the time taken is around ~ 0.6s, with the translation/flipping of the fields, the time taken is 4x the first query. As a last resort I may need to change the names or introduce another field...

chridam Over a year ago

I guess you could do with some optimisation to reshape the pipeline for improved performance by including the explain option in the aggregate() method. Throw in some $match steps at the beginning to filter out unwanted documents, e.g { $match: {$state: { $in: ["expired", "invalid", "valid" ] }} } etc

Woods Chen Over a year ago

I really don't understand how this solution solves the problem, what's the difference between the solution script and just db.xyz.aggregate([{"$group":{"_id":"$couponId","state":{"$first":"$state"},"date":{"$first":"$date"}}}]).

Nate Anderson · Accepted Answer · 2024-05-17 21:59:58Z

Inspired by this comment, and other answers. Also I don't know if OP considered this approach.

OP asked "So, can custom sorting be achieved in an aggregation?", but in fact $first requires the list already be sorted. In other words, $sort happens before $first, not "in an aggregation"

If you must $sort anyway, it's just a matter of sorting on custom logic, which is nicely achieved with $addField and $indexOfArray. Example runs on mongodb Playground, here

db.collection.aggregate([
{
    "$addFields": {
        sortorder: {
            "$indexOfArray": [
                ["valid", "expired", "invalid"],
                "$state"
            ]
        }
    }
},
{
    $sort: { sortorder: 1 }
},
{
    $group: {
        _id: { couponId: "$couponId" },
        document: { $first: "$$ROOT" }
    }
},
{
    $unset: "document.sortorder"
}
])

Output (note coupon "abc" is valid and coupon "xyz" is expired):

[
  {
    "_id": {
      "couponId": "abc"
    },
    "document": {
      "_id": ObjectId("..."),
      "couponId": "abc",
      "date": "2015-11-01",
      "state": "valid"
    }
  },
  {
    "_id": {
      "couponId": "xyz"
    },
    "document": {
      "_id": ObjectId("..."),
      "couponId": "xyz",
      "date": "2015-10-01",
      "state": "expired"
    }
  }
]

Collectives™ on Stack Overflow

Group, sort and select first

2 Answers 2

3 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related