Read file from S3 into Python memory

Question

I have a large csv file stored in S3, I would like to download, edit and reupload this file without it ever touching my hard drive, i.e. read it straight into memory from S3. I am using the python library boto3, is this possible?

whenever any program runs on the machine it would implicitly or explicitly "touch" the hard drive. I am pretty sure this must be hypothetical question. You can use Pandas library for reading the CSV file into the memory and then process it in memory and then save it back to the file system. — Mantosh Kumar
– Mantosh Kumar, Commented Nov 20, 2019 at 5:19
@MantoshKumar I think the way you suggested will load file in RAM wont save it on disk. So how it will touch the hard drive? do you mean to_csv will do that? — Yugandhar Chaudhari
– Yugandhar Chaudhari, Commented Nov 20, 2019 at 5:43

Abhinav_A · Accepted Answer · 2019-11-20 05:48:01Z

2

You should look into the io module

Depending on how you want to read the file, you can create a StringIO() or BytesIO() object and download your file to this stream.

You should check out these answers:

answered Nov 20, 2019 at 5:48

Abhinav_A

587 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

David · Accepted Answer · 2024-09-03 13:46:49Z

I have an S3 helper class that does exactly what you require:

import boto3
import botocore
from io import BytesIO

class S3Helper:    
    @staticmethod
    def download_fileobj(bucket: str, key: str) -> BytesIO:
        s3_client = boto3.client("s3")
        file_obj = BytesIO()
        s3_client.download_fileobj(bucket, key, file_obj)
        file_obj.seek(0)
        return file_obj

    @staticmethod
    def upload_fileobj(bucket: str, key: str, fileobj: BytesIO):
        if fileobj is None:
            raise ValueError("fileobj cannot be None.")
        
        s3_client = boto3.client('s3')
        s3_client.put_object(
            Bucket=bucket,
            Key=key,
            Body=fileobj.getvalue()
        )

Example usage:

from io import BytesIO
from s3_helper import S3Helper

bucket_name = 'your-bucket-name'
download_key = 'path/to/your/file.txt'
upload_key = 'path/to/your/modified_file.txt'

file_obj = S3Helper.download_fileobj(bucket=bucket_name, key=download_key)

file_content = file_obj.getvalue().decode('utf-8')
file_content += "\nAppended text"
file_obj = BytesIO(file_content.encode('utf-8'))

S3Helper.upload_fileobj(bucket=bucket_name, key=upload_key, fileobj=file_obj)

Collectives™ on Stack Overflow

Read file from S3 into Python memory

2 Answers 2

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related