2

Let's say I have a collection like:

{
  couponId: "abc",
  state: "valid",
  date: "2015-11-01"
}
{
  couponId: "abc",
  state: "expired",
  date: "2015-10-01"
}
{
  couponId: "abc",
  state: "invalid",
  date: "2015-09-01"
}
{
  couponId: "xyz",
  state: "invalid",
  date: "2015-11-01"
}
{
  couponId: "xyz",
  state: "expired",
  date: "2015-10-01"
}
{
  couponId: "xyz",
  state: "expired",
  date: "2015-09-01"
}
...

a coupon can be valid/invalid/expired. Now, I want to fetch a list of coupons, where each coupon is selected based on this logic:

  • if a "valid" coupon exists, use that
  • else if an "expired" coupon exists, use that
  • else get the "invalid" coupon.

applying this logic to the above list should yield:

    {
      couponId: "abc", /* for "abc" "valid" exists */
      state: "valid",
      date: "2015-11-01"
    },
    {
      couponId: "xyz", /* for "xyz" "valid" does not exist, use the next best "expired" */
      state: "expired",
      date: "2015-11-01"
    }

basically, valid > expired > invalid

I have thought of using an aggregate operation, trying to emulate a SQL groupby+sort+selectFirst,

db.xyz.aggregate([
  {$sort : { couponId: 1, state: -1 } },
  {$group : { _id : "$couponId", document: {$first : "$$ROOT"} }}
])

And obviously this doesn't work, because the "state" field should have a custom sorting where valid>expired>invalid. So, can custom sorting be achieved in an aggregation?

Or is there a better way of doing what I'm trying to do here?

2 Answers 2

1

A better way, though not that clean/pretty (seems a bit hacky to me :P ), is to nest some $cond operator expressions with the above criteria/logic in your $group pipeline stage. Best explained by running the following aggregation pipeline:

db.xyz.aggregate([
    {
        "$group": {
            "_id": "$couponId",
            "state": {
                "$first": {
                    "$cond": [
                        { "$eq": ["$state", "valid"]},
                        "$state",
                        {
                           "$cond": [
                                { "$eq": ["$state", "invalid"]},
                                "$state",
                                "expired"
                            ]
                        }
                    ]
                }
            },
            "date": {
                "$first": {
                    "$cond": [
                        { "$eq": ["$state", "valid"]},
                        "$date",
                        {
                           "$cond": [
                                { "$eq": ["$state", "invalid"]},
                                "$date",
                                {
                                   "$cond": [
                                        { "$eq": ["$state", "expired"]},
                                        "$date", 
                                        "1970-01-01"
                                    ]
                                }
                            ]
                        }
                    ]
                }
            }
        }
    }
])

Sample Output

/* 0 */
{
    "result" : [ 
        {
            "_id" : "xyz",
            "state" : "invalid",
            "date" : "2015-11-01"
        }, 
        {
            "_id" : "abc",
            "state" : "valid",
            "date" : "2015-11-01"
        }
    ],
    "ok" : 1
}
Sign up to request clarification or add additional context in comments.

3 Comments

Thanks! I was just looking up same thing, on how to translate values on the fly during a query. I tried this, while this works, there is significant impact on the performance. My db has around 1.5m docs, and the actual document is much bigger. Without the 'translation', with default sorting, the time taken is around ~ 0.6s, with the translation/flipping of the fields, the time taken is 4x the first query. As a last resort I may need to change the names or introduce another field...
I guess you could do with some optimisation to reshape the pipeline for improved performance by including the explain option in the aggregate() method. Throw in some $match steps at the beginning to filter out unwanted documents, e.g { $match: {$state: { $in: ["expired", "invalid", "valid" ] }} } etc
I really don't understand how this solution solves the problem, what's the difference between the solution script and just db.xyz.aggregate([{"$group":{"_id":"$couponId","state":{"$first":"$state"},"date":{"$first":"$date"}}}]).
0

Inspired by this comment, and other answers. Also I don't know if OP considered this approach.

OP asked "So, can custom sorting be achieved in an aggregation?", but in fact $first requires the list already be sorted. In other words, $sort happens before $first, not "in an aggregation"

If you must $sort anyway, it's just a matter of sorting on custom logic, which is nicely achieved with $addField and $indexOfArray. Example runs on mongodb Playground, here

db.collection.aggregate([
{
    "$addFields": {
        sortorder: {
            "$indexOfArray": [
                ["valid", "expired", "invalid"],
                "$state"
            ]
        }
    }
},
{
    $sort: { sortorder: 1 }
},
{
    $group: {
        _id: { couponId: "$couponId" },
        document: { $first: "$$ROOT" }
    }
},
{
    $unset: "document.sortorder"
}
])

Output (note coupon "abc" is valid and coupon "xyz" is expired):

[
  {
    "_id": {
      "couponId": "abc"
    },
    "document": {
      "_id": ObjectId("..."),
      "couponId": "abc",
      "date": "2015-11-01",
      "state": "valid"
    }
  },
  {
    "_id": {
      "couponId": "xyz"
    },
    "document": {
      "_id": ObjectId("..."),
      "couponId": "xyz",
      "date": "2015-10-01",
      "state": "expired"
    }
  }
]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.