-1
for <some-condition>:
    g.to_csv(f'/tmp/{k}.csv')
    

This example makes use of /tmp/. When /tmp/ not used in g.to_csv(f'/tmp/{k}.csv') then it gives Read only file system error from here https://stackoverflow.com/a/42002539/13016237, so question is if AWS lambda clears /tmp/ on its own or is it to be done manually. Is there any workaround for this within the scope of boto3. Thanks!

2
  • what should the lambda do? save a csv (coming from pandas) to s3? Commented Sep 14, 2020 at 7:39
  • To ensure the file is cleaned up, you can use the tempfile module, either NamedTemporaryFile or TemporaryDirectory. Commented Sep 14, 2020 at 7:47

2 Answers 2

0

/tmp, as the name suggest, is only a temporary storage. It should not be relied upon for any long term data storage. The files in /tmp persist for as long as lambda execution context is kept alive. The time is not defined and varies.

To overcome the size limitation (512 MB) and to ensure long term data storage there are two solutions employed:

The use of the EFS is easier (but not cheaper), as this will present a regular filesystem to your function which you can write and read directly. You can also re-use the same filesystem across multiple lambda functions, instances, containers and more.

The S3 will be cheaper but there is some extra work required from you to seamlessly use in lambda. Pandas does support S3, but for seamless integration you would have to include S3FS in your deployment package (or layer) if not already present. The S3 can also be accessed from different functions, instances and containers.

Sign up to request clarification or add additional context in comments.

3 Comments

Thanks @Marcin, so is there a way for clearing /tmp manually? Is it recommended
@pc_pyr Yes, you can just use regular python tools for that. For example shutil.rmtree to remove folders that you create in /tmp. If you have some confidential data, you can shred the files yourself, or not store them in /tmp at all.
@pc_pyr Yes, you can upload to s3, but you need S3Fs for seamless integration. I don't know specifics of pandas, so can't give more details on that.
0

g.to_csv('s3://my_bucket/my_data.csv') should work if you will package s3fs with your lambda.

Another option is to save the csv into memory and use boto3 to create an object in s3

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.