Change CSV file In S3 With AWS Lambda

Question

Is there a way to have the dynamodb rows for each user, backed up in s3 with a csv file.

Then using streams, when a row is mutated, change that row in s3 in the csv file.

The csv readers that are currently out there are geared towards parsing the csv for use within the lambda.

Whereas I would like to find a specific row, given by the stream and then replace it with another row without having to load the whole file into memory as it may be quite big. The reason I would like a backup on s3, is because in the future I will need to do batch processing on it and reading 300k files from dynamo within a short period of time, is not preferable.

you could use a lambda triggered on dynamodb update: docs.aws.amazon.com/amazondynamodb/latest/developerguide/… — avigil
– avigil, Commented Mar 11, 2018 at 1:43
@avigil The problem I am having, is having that lambda update the file. As in a way to read it from s3, find the line and update it. I have used fast-csv for example and it only allowed me to parse the line and not update it. — WeCanBeFriends
– WeCanBeFriends, Commented Mar 11, 2018 at 1:45
You will need to read in the contents of the S3 object, parse it and update as necessary, then overwrite the object with your updated version. See boto3 documentation for S3 put or upload_fileobj — avigil
– avigil, Commented Mar 11, 2018 at 2:14
@avigil I was hoping to avoid reading the whole file into lambda, to just update one file — WeCanBeFriends
– WeCanBeFriends, Commented Mar 11, 2018 at 13:32
you sadly can't do that if you are using S3. Consider switching to a database for easy incremental updates. — avigil
– avigil, Commented Mar 11, 2018 at 15:55

avigil · Accepted Answer · 2018-03-11 02:33:08Z

4

Read the data from S3, parse as csv using your favorite library and update, then write back to S3:

import io
import boto3

s3 = boto3.resource('s3')
bucket = s3.Bucket('mybucket')

with io.BytesIO() as data:
    bucket.download_fileobj('my_key', data)

    # parse csv data and update as necessary
    # then write back to s3

    bucket.upload_fileobj(data, 'my_key')

Note that S3 does not support object append or update if that was what you were hoping for- see here. You can only read and overwrite. You might take this into account when designing your system.

edited Mar 11, 2018 at 2:33

answered Mar 11, 2018 at 2:27

avigil

2,25614 silver badges19 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

WeCanBeFriends Over a year ago

With this way, if the file is large, then I would need to read and rewrite the whole file back to s3?

avigil Over a year ago

yes but thats the only way to do it in S3. Make many small objects and it won't be a problem.

Collectives™ on Stack Overflow

Change CSV file In S3 With AWS Lambda

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related