2

I have a large csv file stored in S3, I would like to download, edit and reupload this file without it ever touching my hard drive, i.e. read it straight into memory from S3. I am using the python library boto3, is this possible?

2
  • whenever any program runs on the machine it would implicitly or explicitly "touch" the hard drive. I am pretty sure this must be hypothetical question. You can use Pandas library for reading the CSV file into the memory and then process it in memory and then save it back to the file system. Commented Nov 20, 2019 at 5:19
  • @MantoshKumar I think the way you suggested will load file in RAM wont save it on disk. So how it will touch the hard drive? do you mean to_csv will do that? Commented Nov 20, 2019 at 5:43

2 Answers 2

2

You should look into the io module

Depending on how you want to read the file, you can create a StringIO() or BytesIO() object and download your file to this stream.

You should check out these answers:

  1. How to read image file from S3 bucket directly into memory?
  2. How to read a csv file from an s3 bucket using Pandas in Python
Sign up to request clarification or add additional context in comments.

Comments

1

I have an S3 helper class that does exactly what you require:

import boto3
import botocore
from io import BytesIO

class S3Helper:    
    @staticmethod
    def download_fileobj(bucket: str, key: str) -> BytesIO:
        s3_client = boto3.client("s3")
        file_obj = BytesIO()
        s3_client.download_fileobj(bucket, key, file_obj)
        file_obj.seek(0)
        return file_obj

    @staticmethod
    def upload_fileobj(bucket: str, key: str, fileobj: BytesIO):
        if fileobj is None:
            raise ValueError("fileobj cannot be None.")
        
        s3_client = boto3.client('s3')
        s3_client.put_object(
            Bucket=bucket,
            Key=key,
            Body=fileobj.getvalue()
        )

Example usage:

from io import BytesIO
from s3_helper import S3Helper

bucket_name = 'your-bucket-name'
download_key = 'path/to/your/file.txt'
upload_key = 'path/to/your/modified_file.txt'

file_obj = S3Helper.download_fileobj(bucket=bucket_name, key=download_key)

file_content = file_obj.getvalue().decode('utf-8')
file_content += "\nAppended text"
file_obj = BytesIO(file_content.encode('utf-8'))

S3Helper.upload_fileobj(bucket=bucket_name, key=upload_key, fileobj=file_obj)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.