Insert data from AWS Lambda to AWS Aurora

Question

In my application, I am using AWS S3 to upload and store files. Whenever a file is uploaded to S3, an event is created which triggers a specific lambda function λ. Then, my lambda function λ should perform an SQL INSERT (with the event data of the S3 event) to my running AWS Aurora instance. I expect that λ will be invoked about 10 - 50 times per second.

Summarising: S3 EVENT → TRIGGERS λ → AURORA INSERT

I found various posts claiming that accessing Aurora (or RDS in general) from a lambda function can result in problems due to missing connection pooling and stateless container architecture of AWS Lambda (e.g. AWS Lambda RDS Connection Pooling).

My λ can be written in any language so the question is, what language/framework to use to not get in trouble with the AWS Lambda connection pooling problem or in other words, is it possible to perform 10 - 50 inserts per second to Aurora with an Aurora MySQL compatible db.t2.small instance? Or are there any alternatives to perform INSERTS to Aurora with another service than Lambda (e.g. SNS) without writing and running my own EC2 instance?

Update 2017-12-10: AWS recently announced Serverless AWS Aurora as preview which looks promising regarding serverless architectures.

What kind of files are uploading? Do they really have to be INSERTed to a database? If you store JSON files in S3, you can use AWS Athena to query them in an SQL-like manner. — Noel Llevares
– Noel Llevares, Commented Nov 9, 2017 at 15:17
I don't insert the file in Aurora directly. I only insert some metadata and the filepath on S3, etc. There is no way around that. — Tom
– Tom, Commented Nov 9, 2017 at 17:24

Noel Llevares · Accepted Answer · 2017-11-09 15:15:47Z

3

The connection pooling problem is not language-specific. It is caused by the approach that you used in your code to connect and disconnect from your database.

Basically, the best way to avoid it is to connect and disconnect from the database during your lambda invocation. This is not optimal from the performance perspective but this is the least error-prone.

It is possible, to reuse a database connection (for performance reasons), but this may or may not have connection problems depending on how your database is configured to handle idle connections. This requires some trial-and-error and some database configuration tweaking. On top of that, tweaks that work on development may not work on production (since production traffic is different).

Now, to your questions:

Is it possible to perform 10 - 20 inserts per second to Aurora with an Aurora MySQL compatible db.t2.small instance?

I don't see why not. 50 inserts per second isn't really high.

Are there any alternatives to perform INSERTS to Aurora with another service than Lambda (e.g. SNS) without writing and running my own EC2 instance?

I don't think there's any. SQL INSERTs use a schema so you have to be aware of that schema when INSERT-ing data, so that means you have to code it yourself using Lambda.

answered Nov 9, 2017 at 15:15

Noel Llevares

16.2k5 gold badges60 silver badges87 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Tom Over a year ago

Thanks for your answer! The connection pooling problem is of course not language specific but I was wondering if there exists some abstraction that could reduce that pain ;) According connect/disconnect: that was also my first thought on that. So I will give it a try.

Collectives™ on Stack Overflow

Insert data from AWS Lambda to AWS Aurora

1 Answer 1

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related