How to zip files on s3 using lambda and python

Question

I need to archive multiply files that exists on s3 and then upload the archive back to s3. I am trying to use lambda and python. As some of the files have more than 500MB, downloading in the '/tmp' is not an option. Is there any way to stream files one by one and put them in archive?

Yes, there is. What did you search for, and what did you find? What did you try, and how did it fail? If 500MB is too much for your /tmp, increasing the space there seems like the easiest way forward; if you don't have a lot of disk, what are the chances you have enough memory to keep the file in RAM entirely? — tripleee
– tripleee, Commented Jun 21, 2021 at 10:00
Since this Question was written, AWS Lambda has added the ability to request larger /tmp/ storage. — John Rotenstein
– John Rotenstein, Commented Jan 11, 2023 at 22:46

Anilkumar Kalyane · Accepted Answer · 2021-06-21 14:34:46Z

7

Do not write to disk, stream to and from S3

Stream the Zip file from the source bucket and read and write its contents on the fly using Python back to another S3 bucket.

This method does not use up disk space and therefore is not limited by size.

The basic steps are:

Read the zip file from S3 using the Boto3 S3 resource Object into a BytesIO buffer object
Open the object using the zipfile module
Iterate over each file in the zip file using the namelist method
Write the file back to another bucket in S3 using the resource meta.client.upload_fileobj method

The Code Python 3.6 using Boto3

s3_resource = boto3.resource('s3')
zip_obj = s3_resource.Object(bucket_name="bucket_name_here", key=zip_key)
buffer = BytesIO(zip_obj.get()["Body"].read())

z = zipfile.ZipFile(buffer)
for filename in z.namelist():
    file_info = z.getinfo(filename)
    s3_resource.meta.client.upload_fileobj(
        z.open(filename),
        Bucket=bucket,
        Key=f'{filename}'
    )

Note: AWS Execution time limit has a maximum of 15 minutes so can you process your HUGE files in this amount of time? You can only know by testing.

answered Jun 21, 2021 at 14:34

Anilkumar Kalyane

1192 silver badges8 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Quest Mar 25 at 8:28

I think this OP asked for the other way around. He needs to zip files not unzip them.

macieks · Accepted Answer · 2021-09-10 17:47:10Z

6

AWS Lambda code: create zip from files by ext in bucket/filePath.


def createZipFileStream(bucketName, bucketFilePath, jobKey, fileExt, createUrl=False):
    response = {} 
    bucket = s3.Bucket(bucketName)
    filesCollection = bucket.objects.filter(Prefix=bucketFilePath).all() 
    archive = BytesIO()

    with zipfile.ZipFile(archive, 'w', zipfile.ZIP_DEFLATED) as zip_archive:
        for file in filesCollection:
            if file.key.endswith('.' + fileExt):   
                with zip_archive.open(file.key, 'w') as file1:
                    file1.write(file.get()['Body'].read())  

    archive.seek(0)
    s3.Object(bucketName, bucketFilePath + '/' + jobKey + '.zip').upload_fileobj(archive)
    archive.close()

    response['fileUrl'] = None

    if createUrl is True:
        s3Client = boto3.client('s3')
        response['fileUrl'] = s3Client.generate_presigned_url('get_object', Params={'Bucket': bucketName,
                                                                                    'Key': '' + bucketFilePath + '/' + jobKey + '.zip'},
                                                              ExpiresIn=3600)

    return response

answered Sep 10, 2021 at 17:47

macieks

4918 silver badges14 bronze badges

1 Comment

rholdberh Over a year ago

Just saw your answer. I have implemented it in the same way and it works fine for my case

John Rotenstein · Accepted Answer · 2021-06-21 10:31:32Z

0

The /tmp/ directory is limited to 512MB for AWS Lambda functions.

If you search StackOverflow, you'll see some code from people who have created Zip files on-the-fly without saving files to disk. It becomes pretty complicated.

An alternative would be to attach an EFS filesystem to the Lambda function. It takes a bit of effort to setup, but the cost would be practically zero if you delete the files after use and you'll have plenty of disk space so your code will be more reliable and easier to maintain.

answered Jun 21, 2021 at 10:31

John Rotenstein

273k28 gold badges456 silver badges541 bronze badges

1 Comment

Crashalot Over a year ago

thanks for the response. is there a reason why the answer from @Anilkumar S.K is too complicated or insufficient? stackoverflow.com/a/68069842/144088

Sandip Wankhede · Accepted Answer · 2022-11-28 11:40:42Z

# For me below code worked for single file in Glue job to take single .txt file form AWS S3 and make it zipped and upload back to AWS S3. 
import boto3
import zipfile
from io import BytesIO
import logging
logger = logging.getLogger()

s3_client = boto3.client('s3')
s3_resource= boto3.resource('s3')

# ZipFileStream function declaration
self._createZipFileStream(
                    bucketName="My_AWS_S3_bucket_name",
                    bucketFilePath="My_txt_object_prefix", 
                    bucketfileobject="My_txt_Object_prefix + txt_file_name",
                    zipKey="My_zip_file_prefix")

# ZipFileStream function Defination
def _createZipFileStream(self, bucketName: str, bucketFilePath: str, bucketfileobject: str, zipKey: str, ) -> None:
    try:
        obj = s3_resource.Object(bucket_name=bucketName, key=bucketfileobject)
        archive = BytesIO()

        with zipfile.ZipFile(archive, 'w', zipfile.ZIP_DEFLATED) as zip_archive:
            with zip_archive.open(zipKey, 'w') as file1:
                file1.write(obj.get()['Body'].read())  

        archive.seek(0)

        s3_client.upload_fileobj(archive, bucketName, bucketFilePath + '/' + zipKey + '.zip')
        archive.close()
            
        # If you would like to delete the .txt after zipped from AWS S3 below code will work. 
        self._delete_object(
                bucket=bucketName, key=bucketfileobject)

    except Exception as e:
        logger.error(f"Failed to zip the txt file for {bucketName}/{bucketfileobject}: str{e}")

# Delete AWS S3 funcation defination.
def _delete_object(bucket: str, key: str) -> None:
        try:
            logger.info(f"Deleting: {bucket}/{key}")
            S3.delete_object(
                Bucket=bucket,
                Key=key
            )
        except Exception as e:
            logger.error(f"Failed to delete {bucket}/{key}: str{e}")`enter code here`

Please consider adding some explanation to the source code explaining how it solves the problem.

Collectives™ on Stack Overflow

How to zip files on s3 using lambda and python

4 Answers 4

1 Comment

1 Comment

1 Comment

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

1 Comment

1 Comment

1 Comment

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related