I have been struggling on this for a few days. I have been following this page: https://docs.aws.amazon.com/lambda/latest/dg/python-image.html
What do I try to accomplish?
I want to have a lambda function in AWS to convert text into vectors in order to put those in a vector database. To do this, I would like to use all-MiniLM-L6-v2 model from sentence-transformers. (if there is an easier way, I'm all ears) Note: I can't define this lib as a layer in AWS as this lib is too big.
What do I want?
I want to install sentence-transformers in the /tmp folder as this seems to be the only writable folder in AWS Lambda. I need this as otherwise, I get errors because the package tries to write within the packages folder, even after defining the TRANSFORMERS_CACHE env variable.
Why am I stuck?
When I test this locally, it works well, running docker run -p 9000:8080 image:tag But once deployed, I get the following error:
Runtime.ImportModuleError: Unable to import module 'lambda_function': No module named 'sentence_transformers'
The right folder /tmp/packages is in the path as print(sys.path) gives:
['/var/task', '/var/runtime', '/var/task', '/tmp/packages', '/var/lang/lib/python311.zip', '/var/lang/lib/python3.11', '/var/lang/lib/python3.11/lib-dynload', '/var/lang/lib/python3.11/site-packages']
Dockerfile
FROM public.ecr.aws/lambda/python:3.11
# get the dependencies, sentence-transformers for instance
COPY requirements.txt ${LAMBDA_TASK_ROOT}
# The actual lambda function
COPY lambda_function.py ${LAMBDA_TASK_ROOT}
# Install the specified packages, target should be /tmp as it is the only writable directory
RUN pip install -r requirements.txt --target=/tmp/packages
# Add the new packages folder to the PYTHONPATH so it can be imported in the script
ENV PYTHONPATH "${PYTHONPATH}:/tmp/packages"
# set the handler function as the starting point for the lambda function
CMD [ "lambda_function.handler" ]
lambda_function.py
import sys
print(sys.path)
import os
# important to change the cache folder to a writable folder (only the tmp folder is writable on AWS Lambda)
# this must be before import SentenceTransformer
os.environ['TRANSFORMERS_CACHE'] = '/tmp/cache/huggingface/models'
os.environ['HF_DATASETS_CACHE'] = '/tmp/cache/huggingface/datasets'
os.environ['HF_HOME'] = '/tmp/cache/huggingface/home'
import json
from sentence_transformers import SentenceTransformer
model = SentenceTransformer('all-MiniLM-L6-v2')
def handler(event, context):
return {
"statusCode": 200,
"body": "res"
}
requirements.txt
sentence-transformers
What am I missing?
Edit: When I list folders in /tmp/packages, locally, I get all the expected dependencies, but on AWS, the folder does not exist.