1

The problem I have is similar to this SO question but the answer doesn't work for me. I am trying to import Python library (let's say xgboost) from /tmp folder in AWS Lambda.

Library requests is added to Lambda layer and what I did is:

import json
import io
import os
import zipfile
import requests
import sys

sys.path.insert(0, '/tmp/')
sys.path.append('/tmp/')

os.environ["PYTHONPATH"] = "/var/task"

def get_pkgs(url):
    print("Getting Packages...")
    re = requests.get(url)
    z = zipfile.ZipFile(io.BytesIO(re.content))
    print("Extracting Packages...")
    z.extractall("/tmp/")
    print("Packages are downloaded and extracted.")
    
def attempt_import():
    print("="*50)
    print("ATTEMPT TO IMPORT DEPENDENCIES...")
    print("="*50)
    import xgboost
    print("IMPORTING DONE.")
    
def main():
    URL = "https://MY_BUCKET.s3.MY_REGION.amazonaws.com/MY_FOLDER/xgboost/xgboost.zip"

    get_pkgs(URL)
    attempt_import()
    
def lambda_handler(event, context):
    main()
    return "Hello Lambda"

The error I get is [ERROR] ModuleNotFoundError: No module named 'xgboost'. I gave my S3 bucket all necessary permissions, and I am positive that Lambda can access the .zip file since the requests.get works and variable z returns:

<zipfile.ZipFile file=<_io.BytesIO object at 0x7fddaf31c400> mode='r'>

5
  • Downloading packages in the lambda execution is wasting your $. Instead, you should either package your dependencies into the deployment package or build a lambda layer. Commented Mar 10, 2021 at 17:00
  • @jellycsc The issue is I already have multiple packages in 5 layers, that are close to 260MB which is the limit. Temporary lambda folder has additional space of 512MB so this solution can work to me. Commented Mar 10, 2021 at 17:12
  • Ok, I see. You can try EFS integration then. Commented Mar 10, 2021 at 17:14
  • @jellycsc Do you mean Sagemaker? Can you send some reference/material? Commented Mar 10, 2021 at 17:16
  • 1
    No, here is what I mean. aws.amazon.com/blogs/compute/… Commented Mar 10, 2021 at 17:32

2 Answers 2

1

You could try using the boto3 library to download the file from S3 to /tmp directory as explained in https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html#S3.Client.download_file

import boto3
s3 = boto3.resource('s3')
s3.meta.client.download_file('mybucket', 'hello.txt', '/tmp/hello.txt')
Sign up to request clarification or add additional context in comments.

2 Comments

This could be the solution, but the issue is when you try to import large python package (such as xgboost), downloaded .zip file and its extracted folder in tmp directory are larger than 500MB which results in: [ERROR] OSError: [Errno 28] No space left on device
tmp directory or ephemeral storage of Lambda function can be expanded up to 10GB now.
0

Actually, my code above works and I had a rather silly error. Instead of zipping the xgboost package folders (xgboost, xgboost.libs and xgboost.dist-info) I actually zipped their parental folder which I named package-xgoboost, and that didn't worked in AWS lambda. Be sure that you actually zip those 3 folders directly.

Also, make sure your xgboost library is up-to-date. Previously I used version 1.2.1 which didn't work either. Upgrading the library and zipping the newest xgboost version (in my case 1.3.3) finally worked.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.