0

I am trying to load this big Json file (over 8k transactions) with the structure below into DynamoDB using the Lambda function.

{
    "transactions": [
        {
            "customerId": "abc",
            "transactionId": "123",
            "transactionDate": "2020-09-01",
            "merchantId": "1234",
            "categoryId": "3",
            "amount": "5",
            "description": "McDonalds"
        },
        {
            "customerId": "def",
            "transactionId": "456",
            "transactionDate": "2020-09-01",
            "merchantId": "45678",
            "categoryId": "2",
            "amount": "-11.70",
            "description": "Tescos"
        },
        {
            "customerId": "jkl",
            "transactionId": "gah",
            "transactionDate": "2020-09-01",
            "merchantId": "9081",
            "categoryId": "3",
            "amount": "-139.00",
            "description": "Amazon"
        },
    ...

The lambda function I am trying to use is going to be triggered upon uploading the Json file into the S3 bucket. That should then automatically load data into DynamoDB. The lambda function currently has the following code:

import json
s3_client = boto3.client('s3')
dynamodb = boto3.resource('dynamodb')

def lambda_handler(event, context):
    bucket = event['Records'][0]['s3']['bucket']['name']
    json_file_name = event['Records'][0]['s3']['object']['key']
    print(bucket)
    print(json_file_name)
    print(str(event))
    json_object = s3_client.get_object(Bucket=bucket,Key=json_file_name)
    jsonFileReader = json_object ['Body'].read()
    jsonDict = json.loads(jsonFileReader)
    table = dynamodb.Table('CustomerEvents')
    table.put_item(Item=jsonDict)
    return 'Hello from Lambda'

This works fine if I try to upload one unique transaction into DynamoDB, i.e, if the structure of the file is simply the below:

{
            "customerId": "abc",
            "transactionId": "123",
            "transactionDate": "2020-09-01",
            "merchantId": "1234",
            "categoryId": "3",
            "amount": "5",
            "description": "McDonalds"
 }

How can I go about tweaking the lambda function to load all the transactions (> 8k) into DynamoDB as per above?

4
  • 1
    You want to run in a loop? Commented Sep 4, 2020 at 0:51
  • @Marcin Yes please, how can I go about doing this? Commented Sep 4, 2020 at 0:55
  • Try batches of 25 records ,this is the max records per DynamoDB requestt. Commented Sep 4, 2020 at 7:37
  • Hi @TraychoIvanov how can I set the max number of records for 25 using the code below from @Marcin? Commented Sep 4, 2020 at 8:26

1 Answer 1

0

You can use batch_writer to write multiple transactions from your file.

An example is:

import json
import boto3

s3_client = boto3.client('s3')
dynamodb = boto3.resource('dynamodb')

table = dynamodb.Table('CustomerEvents')

def lambda_handler(event, context):

    bucket = event['Records'][0]['s3']['bucket']['name']
    json_file_name = event['Records'][0]['s3']['object']['key']

    print(bucket)
    print(json_file_name)
    print(str(event))

    json_object = s3_client.get_object(Bucket=bucket,Key=json_file_name)
    jsonFileReader = json_object['Body'].read()
    jsonDict = json.loads(jsonFileReader)
    
    with table.batch_writer() as batch:
        for transaction in jsonDict['transactions']:
            print(transaction)
            batch.put_item(Item=transaction)

    return 'Hello from Lambda'
Sign up to request clarification or add additional context in comments.

8 Comments

Thank you I’ll try this out. I see you commented out the line “#table = dynamodb.Table('CustomerEvents')” however how does the Lambda know which table from DynamoDB it should pick to load the data into?
@ERR Sorry. Just corrected. I was testing it on my own lambda function, so I had to change the table to my one. Forgot to uncomment it back to yours. I see there was also my test bucket name. Also changed that.
No worries at all. Also this line “bucket='my-bucket-for-custom-objects361'” - is that your testing bucket? Can I remove that and simply keep “bucket = event['Records'][0]['s3']['bucket']['name']”? Or you recommend assigning the exact name of the bucket to the variable bucket?
@ERR Yes, it was my test bucket. Forgot to remove it as well. Already modified answer to rectify this.
I came across this issue on CloudWatch when trying to run the code above: An error occurred (ValidationException) when calling the BatchWriteItem operation: Provided list of item keys contains duplicates. Do you know how I can get around this? Thank you
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.