using aws lambda /tmp - python

Question

for <some-condition>:
    g.to_csv(f'/tmp/{k}.csv')

This example makes use of /tmp/. When /tmp/ not used in g.to_csv(f'/tmp/{k}.csv') then it gives Read only file system error from here https://stackoverflow.com/a/42002539/13016237, so question is if AWS lambda clears /tmp/ on its own or is it to be done manually. Is there any workaround for this within the scope of boto3. Thanks!

what should the lambda do? save a csv (coming from pandas) to s3? — balderman
– balderman, Commented Sep 14, 2020 at 7:39
To ensure the file is cleaned up, you can use the tempfile module, either NamedTemporaryFile or TemporaryDirectory. — Jiří Baum
– Jiří Baum, Commented Sep 14, 2020 at 7:47

Marcin · Accepted Answer · 2020-09-14 07:41:37Z

0

/tmp, as the name suggest, is only a temporary storage. It should not be relied upon for any long term data storage. The files in /tmp persist for as long as lambda execution context is kept alive. The time is not defined and varies.

To overcome the size limitation (512 MB) and to ensure long term data storage there are two solutions employed:

The use of the EFS is easier (but not cheaper), as this will present a regular filesystem to your function which you can write and read directly. You can also re-use the same filesystem across multiple lambda functions, instances, containers and more.

The S3 will be cheaper but there is some extra work required from you to seamlessly use in lambda. Pandas does support S3, but for seamless integration you would have to include S3FS in your deployment package (or layer) if not already present. The S3 can also be accessed from different functions, instances and containers.

answered Sep 14, 2020 at 7:41

Marcin

241k16 gold badges315 silver badges368 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

pc_pyr Over a year ago

Thanks @Marcin, so is there a way for clearing /tmp manually? Is it recommended

Marcin Over a year ago

@pc_pyr Yes, you can just use regular python tools for that. For example shutil.rmtree to remove folders that you create in /tmp. If you have some confidential data, you can shred the files yourself, or not store them in /tmp at all.

Marcin Over a year ago

@pc_pyr Yes, you can upload to s3, but you need S3Fs for seamless integration. I don't know specifics of pandas, so can't give more details on that.

balderman · Accepted Answer · 2020-09-14 07:44:06Z

0

g.to_csv('s3://my_bucket/my_data.csv') should work if you will package s3fs with your lambda.

Another option is to save the csv into memory and use boto3 to create an object in s3

answered Sep 14, 2020 at 7:44

balderman

24k8 gold badges39 silver badges60 bronze badges

Collectives™ on Stack Overflow

using aws lambda /tmp - python

2 Answers 2

3 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related