0

I have thousands of items in a table. Each item has a numeric attribute score. What I want to do:

  1. Given some targetScore, select all items where targetScore - n <= score <= targetScore + n (I got this part working).
  2. Return only the first j elements where j := ABS(targetScore - score) in ASC order.

Since the table is somewhat large, and will only grow over time, I'm hoping to have a way to delegate step #2 above to the DynamoDB engine. I'm currently selecting all item IDs from step #1, doing the computation for step #2 in process, then selecting the resulting items on a subsequent query.

1 Answer 1

1

You don't really need step 1 to get step 2. I'm assuming that score is a sort key in your setup.

In step 2, you want to sort the items by abs(targetScore - score) and take the j lowest ones.

Although DynamoDB doesn't have any direct way to do that, the definition of the absolute value makes it quite easy to do. First, observe that `abs(targetScore - score) is defined as follows: It is:

  1. targetScore - score if score <= targetScore
  2. score - targetScore if score >= targetsScore

So the items with the smallest abs are either items with a score higher than targetScore but as low as possible, or items with a score lower than targetScore, but as high as possible.

So, you can do two separate DynamoDB queries:

  1. Query for the first j items with score >= targetScore, sorted in increasing score.
  2. Query for the first j items with score < targetScore, sorted in decreasing score.

DynanoDB can do both queries for you efficiently and cheaply - it involves a Query with a KeyConditions parameter and a Limit, and ScanIndexForward to do decreasing sorting. But no expensive and inefficient FilterExpression!

Once you have the j smallest items of each type 1 and type 2, all that remains is to pick the overall j smallest items from these 2*j items we got.

Sign up to request clarification or add additional context in comments.

6 Comments

How can I do a Query without a primary key? Also, score in my case is not a sort key, just some arbitrary attribute.
If score is not a sort key you can do what I said with FilterExpression but pay for reading the entire database... You need it to be a sort key for cheap requests...
Gotcha. Thanks. About Query, though. Don't I need to specify a specific partition key in order to Query? What I want to do is fetch all possible items (across many partitions) where score is within a range.
Well, you have Scan for that, but that will be even more expensive - it's basically reading the entire database! I don't know if that's really what you want...
Exactly my question. I'm currently doing a Scan because I can't think of a practical way to achieve what I want. Should I have a secondary index where everything has the same partition key so I can do the two queries you suggest?
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.