37

I have written some python code, I want to query dynamoDB data by sort key. I remember I can use follow-up code successful:

 table.query(KeyConditionExpression=Key('event_status').eq(event_status))

My table structure column

primary key:event_id
sort key: event_status
1
  • 1
    I believe this can be done by creating a Global Secondary Index in the form of an Inverted Index. Commented Nov 5, 2021 at 6:43

5 Answers 5

40

You have to create a global secondary index (GSI) for the sort key in order to query on it alone.

Sign up to request clarification or add additional context in comments.

1 Comment

would it be a good solution to set the event_status as primary key to avoid creating the gsi ?
13

If you don't want to scan (and maybe you shouldn't), you will need to create a GSI (Global Secondary Index) for that, and set event_status as the GSIPK.

so your table config will be:

 table = dynamodb.create_table(
        TableName="your_table",
        KeySchema=[
            {"AttributeName": "event_id", "KeyType": "HASH"},  # Partition key
            {"AttributeName": "event_status", "KeyType": "RANGE"},  # Sort key
        ],
        AttributeDefinitions=[
            {"AttributeName": "event_id, "AttributeType": "S"},
            {"AttributeName": "event_status", "AttributeType": "S"},
            {"AttributeName": "gsi_event_status", "AttributeType": "S"},
            {"AttributeName": "gsi_event_id", "AttributeType": "S"},
        ],
        GlobalSecondaryIndexes=[
            {
                "IndexName": "gsiIndex",
                "KeySchema": [
                    {"AttributeName": "gsi_event_status", "KeyType": "HASH"},
                    {"AttributeName": "gsi_event_id", "KeyType": "RANGE"},
                ],
                "Projection": {"ProjectionType": "ALL"},
            },
        ],
        BillingMode="PAY_PER_REQUEST",
    )

Be mindful that GSIs can be expensive and you might wanna change the ProjectionType if you don't need all attributes.

Now you can query by pk:

table.query(KeyConditionExpression=Key('event_id').eq(event_id))

or by the GSI PK which is set to your sk:

lookup.query(
        IndexName="gsiIndex",
        KeyConditionExpression=Key("gsi_event_status").eq(event_status),
    )

2 Comments

The event_id and event_status are listed twice in the AttributeDefinitions. Is there a reason for that or doesn't it work the same with the duplicate entries removed?
I think it was meant to define attributes for main table and GSI, will update with different names so it's more obvious
9

The scan API should be used if you would like to get data from DynamoDB without using Hash Key attribute value.

Example:-

fe = Attr('event_status').eq("new");

response = table.scan(
        FilterExpression=fe        
    )

for i in response['Items']:

print(json.dumps(i, cls=DecimalEncoder))

while 'LastEvaluatedKey' in response:
    response = table.scan(        
        FilterExpression=fe,        
        ExclusiveStartKey=response['LastEvaluatedKey']
        )

    for i in response['Items']:
        print(json.dumps(i, cls=DecimalEncoder))

1 Comment

Note that scanning is not the same as querying. You will be iterating over all the values so it is much less efficient than a query. If you really need to efficiently query on event_status you should consider creating a GSI for that field.
1

According to the main concept of the sort key, it is part of main cluster in the partition key to define some filter expression with partition key in query. so there is no ability to search on the sort key alone and without partition key. unless to define a global secondary index on the sort key.

Comments

0

By using FilterExpression we can scan the table using Sort key

NOTE: here LastUpdated is sortkey

Example:

from_date = "fromdate"
to_date = "todate"

dynamodb = boto3.resource('dynamodb', region_name='ap-south-1')
table = dynamodb.Table("your-tablename")
response =table.scan(
    FilterExpression=Attr('LastUpdated').between(from_date,to_date))
    )
result = response['Items']

1 Comment

Be careful, this only filters up to 1MB of data that is returned from the original query. From the docs: "A filter expression is applied after a Query finishes, but before the results are returned"

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.