How to query a DynamoDB global secondary index across multiple shards?

Question

This article (https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/bp-indexes-gsi-sharding.html) talks about a technique for sharding global secondary index values across multiple partitions, by introducing a random integer as the partition key.

That makes sense to me, but the article does not clearly explain how to then query that index. Let's say I'm using a random integer from 1-10 as the partition key, and a number as the sort key, and I want to fetch the 3 records with the highest sort key value (from all partitions).

Would I need to do 10 separate queries, sorting each one, with a limit of 3 items, then do an in-memory sort of the resulting 30 items and pick the first 3? That seems needlessly complicated, and not very efficient for the client.

Is there some way to do a single DynamoDB operation that queries all 10 partitions, does the sorting, and just returns the 3 records with the highest vavlue?

Charles · Accepted Answer · 2019-01-04 18:27:27Z

3

Would I need to do 10 separate queries

Yes. This is called a scatter read in the Dynamo docs...

Normally the client would do so with multiple threads...so while it adds complexity, efficiency is usually good.

Why the limit 3? That requirement seems to be the bigger cause of inefficiency.

Is there some way to do a single DynamoDB operation that queries all 10 partitions, does the sorting, and just returns the 3 records with the highest vavlue?

The only way to query all partitions is with a full table Scan. But that doesn't provide sorting & ordering. You'd still need to do it in your app. The scan would be a lot less efficient than the scatter read.

If this is a "Top 3 sellers" type list...I believe the recommended practice to to (periodically) calculate & store the results. Rather than having to constantly derive the results. Take a look here: Using Global Secondary Indexes for Materialized Aggregation Queries

answered Jan 4, 2019 at 18:27

Charles

24.2k3 gold badges23 silver badges50 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Jesse Barnum Over a year ago

Thanks, that answers my question. This is for generating the top three most recent blog articles, thus the need to limit to 3 items per query. In practical terms, I can easily store all blog articles in a single partition, but I'd like to know how to solve this problem if I needed to scale larger.

Collectives™ on Stack Overflow

How to query a DynamoDB global secondary index across multiple shards?

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related