1

This article (https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/bp-indexes-gsi-sharding.html) talks about a technique for sharding global secondary index values across multiple partitions, by introducing a random integer as the partition key.

That makes sense to me, but the article does not clearly explain how to then query that index. Let's say I'm using a random integer from 1-10 as the partition key, and a number as the sort key, and I want to fetch the 3 records with the highest sort key value (from all partitions).

Would I need to do 10 separate queries, sorting each one, with a limit of 3 items, then do an in-memory sort of the resulting 30 items and pick the first 3? That seems needlessly complicated, and not very efficient for the client.

Is there some way to do a single DynamoDB operation that queries all 10 partitions, does the sorting, and just returns the 3 records with the highest vavlue?

1 Answer 1

3

Would I need to do 10 separate queries

Yes. This is called a scatter read in the Dynamo docs...

Normally the client would do so with multiple threads...so while it adds complexity, efficiency is usually good.

Why the limit 3? That requirement seems to be the bigger cause of inefficiency.

Is there some way to do a single DynamoDB operation that queries all 10 partitions, does the sorting, and just returns the 3 records with the highest vavlue?

The only way to query all partitions is with a full table Scan. But that doesn't provide sorting & ordering. You'd still need to do it in your app. The scan would be a lot less efficient than the scatter read.

If this is a "Top 3 sellers" type list...I believe the recommended practice to to (periodically) calculate & store the results. Rather than having to constantly derive the results. Take a look here: Using Global Secondary Indexes for Materialized Aggregation Queries

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks, that answers my question. This is for generating the top three most recent blog articles, thus the need to limit to 3 items per query. In practical terms, I can easily store all blog articles in a single partition, but I'd like to know how to solve this problem if I needed to scale larger.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.