18

So, so I have a dynamodb table with a primary partition key column, foo_id and no primary sort key. I have a list of foo_id values, and want to get the observations associated with this list of ids.

I figured the best way to do this (?) is to use batch_get_item(), but it's not working out for me.

    # python code
    import boto3
    client = boto3.client('dynamodb')

    # ppk_values = list of `foo_id` values (strings) (< 100 in this example)
    x = client.batch_get_item(
        RequestItems={
            'my_table_name':
                {'Keys': [{'foo_id': {'SS': [id for id in ppk_values]}}]}
        })

I'm using SS because I'm passing a list of strings (list of foo_id values), but I'm getting:

ClientError: An error occurred (ValidationException) when calling the
BatchGetItem operation: The provided key element does not match the
schema

So I assume that means it's thinking foo_id contains list values instead of string values, which is wrong.

--> Is that interpretation right? What's the best way to batch query for a bunch of primary partition key values?

5 Answers 5

25

Boto3 now has a version of batch_get_item that lets you pass in the keys in a more natural Pythonic way without specifying the types.

You can find a complete and working code example in https://github.com/awsdocs/aws-doc-sdk-examples. That example deals with some additional nuances around retries, but here's a digest of the parts of the code that answer this question:

import logging
import boto3

dynamodb = boto3.resource('dynamodb')
logger = logging.getLogger(__name__)

movie_table = dynamodb.Table('Movies')
actor_table = dyanmodb.Table('Actors')

batch_keys = {
    movie_table.name: {
        'Keys': [{'year': movie[0], 'title': movie[1]} for movie in movie_list]
    },
    actor_table.name: {
        'Keys': [{'name': actor} for actor in actor_list]
    }
}

response = dynamodb.batch_get_item(RequestItems=batch_keys)

for response_table, response_items in response.items():
    logger.info("Got %s items from %s.", len(response_items), response_table)
Sign up to request clarification or add additional context in comments.

3 Comments

Thank you I was looking for resources wise query and using table.name feature
What are some reasonable read/write provisioned parameters if you expected to read 1k-10k items per API request and need low latency but only expect <1000 requests per day. (No write operations in production.)
Is there such thing for client?
17

The keys should be given as mentioned below. It can't be mentioned as 'SS'.

Basically, you can compare the DynamoDB String datatype with String (i.e. not with SS). Each item is handled separately. It is not similar to SQL in query.

'Keys': [
            {
                'foo_id': key1
            },
            {
                'foo_id': key2
            }
], 

Sample code:-

You may need to change the table name and key values.

from __future__ import print_function # Python 2/3 compatibility
import boto3
import json
import decimal
from boto3.dynamodb.conditions import Key, Attr
from botocore.exceptions import ClientError

# Helper class to convert a DynamoDB item to JSON.
class DecimalEncoder(json.JSONEncoder):
    def default(self, o):
        if isinstance(o, decimal.Decimal):
            if o % 1 > 0:
                return float(o)
            else:
                return int(o)
        return super(DecimalEncoder, self).default(o)

dynamodb = boto3.resource("dynamodb", region_name='us-west-2', endpoint_url="http://localhost:8000")

email1 = "[email protected]"
email2 = "[email protected]"

try:
    response = dynamodb.batch_get_item(
        RequestItems={
            'users': {
                'Keys': [
                    {
                        'email': email1
                    },
                    {
                        'email': email2
                    },
                ],            
                'ConsistentRead': True            
            }
        },
        ReturnConsumedCapacity='TOTAL'
    )
except ClientError as e:
    print(e.response['Error']['Message'])
else:
    item = response['Responses']
    print("BatchGetItem succeeded:")
    print(json.dumps(item, indent=4, cls=DecimalEncoder))

3 Comments

The above answer works only after we change dynamodb.batch_get_item to dynamodb.meta.client.batch_get_item, as the method batch_get_item exists only on a client not on a resource.
The above answer actually no longer works at all. I tried the equivalent and got an error. The inner payloads need type information: {'Keys': [{'email': {'S': email1}}, {'email': {'S': email2}}]}
For anyone struggling with this, there is an important point to note, which I had missed, and that is the use of the word 'Primary Key' so if your Primary Key consists of a Partition Key and Sort Key, you have to provide both! (which rendered my use case useless sadly). Else you'll get a ValidationException The provided key element does not match the schema error.
8

The approved answer no longer works.

For me the working call format was like so:

import boto3
client = boto3.client('dynamodb')

# ppk_values = list of `foo_id` values (strings) (< 100 in this example)
x = client.batch_get_item(
    RequestItems={
        'my_table_name': {
            'Keys': [{'foo_id': {'S': id}} for id in ppk_values]
        }
    }
)

The type information was required. For me it was "S" for string keys. Without it I got an error saying the libraries found a str but expected a dict. That is, they wanted {'foo_id': {'S': id}} instead of the simpler {'foo_id': id} that I tried first.

1 Comment

That is because he is using the boto3.resource and you're using the client.
1

If you have a Primary Key which consists of a partition key and a sort key, you will need to provide both. This code works for me:

keys = [{'review_id':  id, 'place_id': place_id} for id in review_ids]
print(keys)
# Set up the batch_get_item request
request_items = {
    table_name: {
        'Keys': keys,
        'ConsistentRead': True
    }
}
response = dynamodb.batch_get_item(RequestItems=request_items)

Comments

-2

Here is the java solution with dynamodb version 2.15.0. assuming foo_id is string type and keys are less than 100. you can break list into batches of required size

private void queryTable(List<String> keys){

    List<Map<String, AttributeValue>> keysBatch = keys.stream()
            .map(key -> singletonMap("foo_id", AttributeValue.builder().s(key).build()))
            .collect(toList());
    KeysAndAttributes keysAndAttributes = KeysAndAttributes.builder()
            .keys(keysBatch)
            .build();
    Map<String, KeysAndAttributes> requestItems = new HashMap<>();
    requestItems.put("tableName", keysAndAttributes);
    BatchGetItemRequest batchGet = BatchGetItemRequest.builder()
            .requestItems(requestItems)
            .build();
    Map<String, List<Map<String, AttributeValue>>> responses = dbClient.batchGetItem(batchGet).responses();
    responses.entrySet().stream().forEach(entry -> {
        System.out.println("Table : " + entry.getKey());
        entry.getValue().forEach(v -> {
            System.out.println("value: "+v);
        });
    });
}

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.