3

I have a few hundred of PDFs in a s3 Bucket and I want a lambda function that creates a zip file for all my PDFs.

Doing this on my local Python is obviously easy enough and I had assumed the logic would transfer over to AWS Lambda in a pretty straight forward way. But so far I haven't managed to get this working.

I have been using the zipfile Python library, as well as boto3. My logic is as simple as finding all the files, appending them to a list of 'files_to_zip' and then iterating through that list writing each one to the new zip file.

This however has kicked up a number of issues and I think this is due to my short falls in understanding how calling and loading files works in Lambda.

Here is the code I have tried so far

    import os
    import boto3
    from io import BytesIO, StringIO
    from zipfile import ZipFile, ZIP_DEFLATED

    def zipping_files(event, context):
        s3 = boto3.resource('s3')

        BUCKET = 'BUCKET NAME'
        PREFIX_1 = 'KEY NAME'
        new_zip = r'NEW KEY NAME' 
        s3_client = boto3.client('s3')
        files_to_zip = []
        response = s3_client.list_objects_v2(Bucket=BUCKET, Prefix=PREFIX_1)

        all = response['Contents']     
        for i in all:
            files_to_zip.append(str(i['Key']))



        with ZipFile(new_zip, 'w',  compression=ZIP_DEFLATED, allowZip64=True) as new_zip:
            for file in files_to_zip:
                new_zip.write(file) 

I am getting error messages such as my new_zip string does not exist (FileNotFoundError) and this is a read only action.

2 Answers 2

2

here how we can solve this

import os
import boto3
from io import BytesIO, StringIO
from zipfile import ZipFile, ZIP_DEFLATED

def zipping_files(event, context):
    s3 = boto3.resource('s3')

    BUCKET = 'BUCKET NAME'
    PREFIX_1 = 'KEY NAME'
    s3_client = boto3.client('s3')
    files_to_zip = []
    response = s3_client.list_objects_v2(Bucket=BUCKET, Prefix=PREFIX_1)

    all = response['Contents']     
    for i in all:
        files_to_zip.append(str(i['Key'])) 

    # we download all files to tmp directory of lambda for that we create directory structure in /tmp same as s3 files structure (subdirectory)

    for KEY in files_to_zip:
    try:
        local_file_name = '/tmp/'+KEY
        if os.path.isdir(os.path.dirname(local_file_name)):
          print(local_file_name)
        else:
          os.mkdir(os.path.dirname(local_file_name))

        s3_client.Bucket(bucket).download_file(KEY, local_file_name)
    except botocore.exceptions.ClientError as e:
        print(e.response)

    #now create empty zip file in /tmp directory use suffix .zip if you want 
    with tempfile.NamedTemporaryFile('w', suffix='.tar.gz', delete=False) as f:
      with ZipFile(f.name, 'w', compression=ZIP_DEFLATED, allowZip64=True) as zip:
        for file in files_to_zip:
          zip.write('/tmp/'+file)

  #once zipped in temp copy it to your preferred s3 location 
  s3_client.meta.client.upload_file(f.name, bucket, 'destination_s3_path ex. out/filename.tar.gz')
  print('All files zipped successfully!')
Sign up to request clarification or add additional context in comments.

1 Comment

I am getting NameError: name 'botocore' is not defined
0

This code sample attempts to create a local file NEW KEY NAME on the local filesystem of the Lambda function's container, in the default directory (which is /var/task afaik).

Step 1: make a decent file path in the /tmp directory, i.e. os.path.join('/tmp', target_filename).

Step 2: your code is not uploading the zipfile to S3. add a call to s3_client.put_object.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.