To anyone curious, this is how I ended up solving this issue:
I ran a Jupyter notebook locally to create the model artifact. Once complete, I zipped the model artifact into a tar.gz file.
from transformers import AutoModel, AutoTokenizer
from os import makedirs
saved_model_dir = 'saved_model_dir'
makedirs(saved_model_dir, exist_ok=True)
# models were obtained from https://huggingface.co/models
tokenizer = AutoTokenizer.from_pretrained('sentence-transformers/multi-qa-MiniLM-L6-cos-v1')
model = AutoModel.from_pretrained('sentence-transformers/multi-qa-MiniLM-L6-cos-v1')
tokenizer.save_pretrained(saved_model_dir)
model.save_pretrained(saved_model_dir)
cd saved_model_dir && tar czvf ../model.tar.gz *
I included a script in my pipeline to then upload that artifact to S3.
aws s3 cp path/to/model.tar.gz s3://bucket-name/prefix
I also created a CloudFormation template that would stand up my SageMaker resources. The tricky part of this was finding a container image to use, and a colleague was able to point me to this repo that contained a massive list of AWS-maintained container images for deep learning and inference. From there, I just needed to select the one that fit my needs.
Resources:
SageMakerModel:
Type: AWS::SageMaker::Model
Properties:
PrimaryContainer:
Image: 763104351884.dkr.ecr.us-east-1.amazonaws.com/pytorch-inference:1.12.0-cpu-py38-ubuntu20.04-sagemaker # image resource found at https://github.com/aws/deep-learning-containers/blob/master/available_images.md
Mode: SingleModel
ModelDataUrl: s3://path/to/model.tar.gz
ExecutionRole:
ModelName: inference-model
SageMakerEndpointConfig:
Type: AWS::SageMaker::EndpointConfig
Properties:
EndpointConfigName: endpoint-config-name
ProductionVariants:
- ModelName: inference-model
InitialInstanceCount: 1
InstanceType: ml.t2.medium
VariantName: dev
SageMakerEndpoint:
Type: AWS::SageMaker::Endpoint
Properties:
EndpointName: endpoint-name
EndpointConfigName: !GetAtt SageMakerEndpointConfig.EndpointConfigName
Once the PyTorch model is created locally, this solution essentially automates the process of provisioning and deploying a SageMaker endpoint for inference. If I need to switch the model, I just need to run my notebook code and it will overwrite my existing model artifact. Then I can redeploy and my pipeline will re-upload the artifact to S3, modify the existing SageMaker resources, and the solution will begin operating using the new model.
This may not be the most elegant solution out there, so any suggestions or pointers would be much appreciated!