2

I have been struggling on this for a few days. I have been following this page: https://docs.aws.amazon.com/lambda/latest/dg/python-image.html

What do I try to accomplish?

I want to have a lambda function in AWS to convert text into vectors in order to put those in a vector database. To do this, I would like to use all-MiniLM-L6-v2 model from sentence-transformers. (if there is an easier way, I'm all ears) Note: I can't define this lib as a layer in AWS as this lib is too big.

What do I want?

I want to install sentence-transformers in the /tmp folder as this seems to be the only writable folder in AWS Lambda. I need this as otherwise, I get errors because the package tries to write within the packages folder, even after defining the TRANSFORMERS_CACHE env variable.

Why am I stuck?

When I test this locally, it works well, running docker run -p 9000:8080 image:tag But once deployed, I get the following error:

Runtime.ImportModuleError: Unable to import module 'lambda_function': No module named 'sentence_transformers'

The right folder /tmp/packages is in the path as print(sys.path) gives:

['/var/task', '/var/runtime', '/var/task', '/tmp/packages', '/var/lang/lib/python311.zip', '/var/lang/lib/python3.11', '/var/lang/lib/python3.11/lib-dynload', '/var/lang/lib/python3.11/site-packages']

Dockerfile

FROM public.ecr.aws/lambda/python:3.11

# get the dependencies, sentence-transformers for instance
COPY requirements.txt ${LAMBDA_TASK_ROOT}

# The actual lambda function
COPY lambda_function.py ${LAMBDA_TASK_ROOT}
    
# Install the specified packages, target should be /tmp as it is the only writable directory
RUN pip install -r requirements.txt --target=/tmp/packages

# Add the new packages folder to the PYTHONPATH so it can be imported in the script
ENV PYTHONPATH "${PYTHONPATH}:/tmp/packages"

# set the handler function as the starting point for the lambda function
CMD [ "lambda_function.handler" ]

lambda_function.py

import sys
print(sys.path)
import os
# important to change the cache folder to a writable folder (only the tmp folder is writable on AWS Lambda)
# this must be before import SentenceTransformer
os.environ['TRANSFORMERS_CACHE'] = '/tmp/cache/huggingface/models'
os.environ['HF_DATASETS_CACHE'] = '/tmp/cache/huggingface/datasets'
os.environ['HF_HOME'] = '/tmp/cache/huggingface/home'
import json
from sentence_transformers import SentenceTransformer

model = SentenceTransformer('all-MiniLM-L6-v2')


def handler(event, context):
    return {
        "statusCode": 200,
        "body": "res"
    }

requirements.txt

sentence-transformers

What am I missing?

Edit: When I list folders in /tmp/packages, locally, I get all the expected dependencies, but on AWS, the folder does not exist.

1 Answer 1

1

When I list folders in /tmp/packages, locally, I get all the expected dependencies, but on AWS, the folder does not exist. I'm not 100 % sure why but it might be about the tmp folder being reset at each run or something like this. (TBC)

So I had to back-pedal. The issue is that it was missing one env variable XDG_CACHE_HOME leading to errors about read only folders etc.

So here is what worked:

lambda_function.py

import os
# important to change the cache folder to a writable folder (only the tmp folder is writable on AWS Lambda)
# this must be before import SentenceTransformer
os.environ['TRANSFORMERS_CACHE'] = '/tmp/cache/huggingface/models'
os.environ['HF_DATASETS_CACHE'] = '/tmp/cache/huggingface/datasets'
os.environ['HF_HOME'] = '/tmp/cache/huggingface/home'
os.environ['XDG_CACHE_HOME'] = '/tmp/cache/huggingface/xdghome'
import json
from sentence_transformers import SentenceTransformer

model = SentenceTransformer('all-MiniLM-L6-v2')

Here, you need to define all those environment variables.

Dockerfile

FROM public.ecr.aws/lambda/python:3.11

# get the dependencies, sentence-transformers for instance
COPY requirements.txt ${LAMBDA_TASK_ROOT}

# The actual lambda function
COPY lambda_function.py ${LAMBDA_TASK_ROOT}

# Install the specified packages, target should be /tmp as it is the only writable directory
RUN pip install -r requirements.txt

# set the handler function as the starting point for the lambda function
CMD [ "lambda_function.handler" ]

Normal Dockerfile, installing the packages in the default location.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.