Using Lambda to add Json to DynamoDB

Question

I am trying to load this big Json file (over 8k transactions) with the structure below into DynamoDB using the Lambda function.

{
    "transactions": [
        {
            "customerId": "abc",
            "transactionId": "123",
            "transactionDate": "2020-09-01",
            "merchantId": "1234",
            "categoryId": "3",
            "amount": "5",
            "description": "McDonalds"
        },
        {
            "customerId": "def",
            "transactionId": "456",
            "transactionDate": "2020-09-01",
            "merchantId": "45678",
            "categoryId": "2",
            "amount": "-11.70",
            "description": "Tescos"
        },
        {
            "customerId": "jkl",
            "transactionId": "gah",
            "transactionDate": "2020-09-01",
            "merchantId": "9081",
            "categoryId": "3",
            "amount": "-139.00",
            "description": "Amazon"
        },
    ...

The lambda function I am trying to use is going to be triggered upon uploading the Json file into the S3 bucket. That should then automatically load data into DynamoDB. The lambda function currently has the following code:

import json
s3_client = boto3.client('s3')
dynamodb = boto3.resource('dynamodb')

def lambda_handler(event, context):
    bucket = event['Records'][0]['s3']['bucket']['name']
    json_file_name = event['Records'][0]['s3']['object']['key']
    print(bucket)
    print(json_file_name)
    print(str(event))
    json_object = s3_client.get_object(Bucket=bucket,Key=json_file_name)
    jsonFileReader = json_object ['Body'].read()
    jsonDict = json.loads(jsonFileReader)
    table = dynamodb.Table('CustomerEvents')
    table.put_item(Item=jsonDict)
    return 'Hello from Lambda'

This works fine if I try to upload one unique transaction into DynamoDB, i.e, if the structure of the file is simply the below:

{
            "customerId": "abc",
            "transactionId": "123",
            "transactionDate": "2020-09-01",
            "merchantId": "1234",
            "categoryId": "3",
            "amount": "5",
            "description": "McDonalds"
 }

How can I go about tweaking the lambda function to load all the transactions (> 8k) into DynamoDB as per above?

Try batches of 25 records ,this is the max records per DynamoDB requestt. — Traycho Ivanov
– Traycho Ivanov, Commented Sep 4, 2020 at 7:37
Hi @TraychoIvanov how can I set the max number of records for 25 using the code below from @Marcin? — Estrobelai
– Estrobelai, Commented Sep 4, 2020 at 8:26

Marcin · Accepted Answer · 2020-09-04 01:52:13Z

0

You can use batch_writer to write multiple transactions from your file.

An example is:

import json
import boto3

s3_client = boto3.client('s3')
dynamodb = boto3.resource('dynamodb')

table = dynamodb.Table('CustomerEvents')

def lambda_handler(event, context):

    bucket = event['Records'][0]['s3']['bucket']['name']
    json_file_name = event['Records'][0]['s3']['object']['key']

    print(bucket)
    print(json_file_name)
    print(str(event))

    json_object = s3_client.get_object(Bucket=bucket,Key=json_file_name)
    jsonFileReader = json_object['Body'].read()
    jsonDict = json.loads(jsonFileReader)
    
    with table.batch_writer() as batch:
        for transaction in jsonDict['transactions']:
            print(transaction)
            batch.put_item(Item=transaction)

    return 'Hello from Lambda'

edited Sep 4, 2020 at 1:52

answered Sep 4, 2020 at 1:08

Marcin

241k16 gold badges315 silver badges368 bronze badges

Sign up to request clarification or add additional context in comments.

8 Comments

Estrobelai Over a year ago

Thank you I’ll try this out. I see you commented out the line “#table = dynamodb.Table('CustomerEvents')” however how does the Lambda know which table from DynamoDB it should pick to load the data into?

Marcin Over a year ago

@ERR Sorry. Just corrected. I was testing it on my own lambda function, so I had to change the table to my one. Forgot to uncomment it back to yours. I see there was also my test bucket name. Also changed that.

Estrobelai Over a year ago

No worries at all. Also this line “bucket='my-bucket-for-custom-objects361'” - is that your testing bucket? Can I remove that and simply keep “bucket = event['Records'][0]['s3']['bucket']['name']”? Or you recommend assigning the exact name of the bucket to the variable bucket?

Marcin Over a year ago

@ERR Yes, it was my test bucket. Forgot to remove it as well. Already modified answer to rectify this.

Estrobelai Over a year ago

I came across this issue on CloudWatch when trying to run the code above: An error occurred (ValidationException) when calling the BatchWriteItem operation: Provided list of item keys contains duplicates. Do you know how I can get around this? Thank you

|

Collectives™ on Stack Overflow

Using Lambda to add Json to DynamoDB

1 Answer 1

8 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

8 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related