I have a large csv file stored in S3, I would like to download, edit and reupload this file without it ever touching my hard drive, i.e. read it straight into memory from S3. I am using the python library boto3, is this possible?
2 Answers
You should look into the io module
Depending on how you want to read the file, you can create a StringIO() or BytesIO() object and download your file to this stream.
You should check out these answers:
Comments
I have an S3 helper class that does exactly what you require:
import boto3
import botocore
from io import BytesIO
class S3Helper:
@staticmethod
def download_fileobj(bucket: str, key: str) -> BytesIO:
s3_client = boto3.client("s3")
file_obj = BytesIO()
s3_client.download_fileobj(bucket, key, file_obj)
file_obj.seek(0)
return file_obj
@staticmethod
def upload_fileobj(bucket: str, key: str, fileobj: BytesIO):
if fileobj is None:
raise ValueError("fileobj cannot be None.")
s3_client = boto3.client('s3')
s3_client.put_object(
Bucket=bucket,
Key=key,
Body=fileobj.getvalue()
)
Example usage:
from io import BytesIO
from s3_helper import S3Helper
bucket_name = 'your-bucket-name'
download_key = 'path/to/your/file.txt'
upload_key = 'path/to/your/modified_file.txt'
file_obj = S3Helper.download_fileobj(bucket=bucket_name, key=download_key)
file_content = file_obj.getvalue().decode('utf-8')
file_content += "\nAppended text"
file_obj = BytesIO(file_content.encode('utf-8'))
S3Helper.upload_fileobj(bucket=bucket_name, key=upload_key, fileobj=file_obj)
to_csvwill do that?