5

In my application, I am using AWS S3 to upload and store files. Whenever a file is uploaded to S3, an event is created which triggers a specific lambda function λ. Then, my lambda function λ should perform an SQL INSERT (with the event data of the S3 event) to my running AWS Aurora instance. I expect that λ will be invoked about 10 - 50 times per second.

Summarising: S3 EVENT → TRIGGERS λ → AURORA INSERT

I found various posts claiming that accessing Aurora (or RDS in general) from a lambda function can result in problems due to missing connection pooling and stateless container architecture of AWS Lambda (e.g. AWS Lambda RDS Connection Pooling).

My λ can be written in any language so the question is, what language/framework to use to not get in trouble with the AWS Lambda connection pooling problem or in other words, is it possible to perform 10 - 50 inserts per second to Aurora with an Aurora MySQL compatible db.t2.small instance? Or are there any alternatives to perform INSERTS to Aurora with another service than Lambda (e.g. SNS) without writing and running my own EC2 instance?

Update 2017-12-10: AWS recently announced Serverless AWS Aurora as preview which looks promising regarding serverless architectures.

2
  • What kind of files are uploading? Do they really have to be INSERTed to a database? If you store JSON files in S3, you can use AWS Athena to query them in an SQL-like manner. Commented Nov 9, 2017 at 15:17
  • I don't insert the file in Aurora directly. I only insert some metadata and the filepath on S3, etc. There is no way around that. Commented Nov 9, 2017 at 17:24

1 Answer 1

3

The connection pooling problem is not language-specific. It is caused by the approach that you used in your code to connect and disconnect from your database.

Basically, the best way to avoid it is to connect and disconnect from the database during your lambda invocation. This is not optimal from the performance perspective but this is the least error-prone.

It is possible, to reuse a database connection (for performance reasons), but this may or may not have connection problems depending on how your database is configured to handle idle connections. This requires some trial-and-error and some database configuration tweaking. On top of that, tweaks that work on development may not work on production (since production traffic is different).

Now, to your questions:

Is it possible to perform 10 - 20 inserts per second to Aurora with an Aurora MySQL compatible db.t2.small instance?

I don't see why not. 50 inserts per second isn't really high.

Are there any alternatives to perform INSERTS to Aurora with another service than Lambda (e.g. SNS) without writing and running my own EC2 instance?

I don't think there's any. SQL INSERTs use a schema so you have to be aware of that schema when INSERT-ing data, so that means you have to code it yourself using Lambda.

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks for your answer! The connection pooling problem is of course not language specific but I was wondering if there exists some abstraction that could reduce that pain ;) According connect/disconnect: that was also my first thought on that. So I will give it a try.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.