0

I want to deploy my trained tensorflow model to the amazon sagemaker, I am following the official guide here: https://aws.amazon.com/blogs/machine-learning/deploy-trained-keras-or-tensorflow-models-using-amazon-sagemaker/ to deploy my model using jupyter notebook.

But when I try to use code:

predictor = sagemaker_model.deploy(initial_instance_count=1, instance_type='ml.t2.medium')

It gives me the following error message:

ValueError: Error hosting endpoint sagemaker-tensorflow-2019-08-07-22-57-59-547: Failed Reason: The image '520713654638.dkr.ecr.us-west-1.amazonaws.com/sagemaker-tensorflow:1.12-cpu-py3 ' does not exist.

I think the tutorial does not tell me to create an image, and I do not know what to do.

import boto3, re
from sagemaker import get_execution_role

role = get_execution_role()

# make a tar ball of the model data files
import tarfile
with tarfile.open('model.tar.gz', mode='w:gz') as archive:
    archive.add('export', recursive=True)

# create a new s3 bucket and upload the tarball to it
import sagemaker

sagemaker_session = sagemaker.Session()
inputs = sagemaker_session.upload_data(path='model.tar.gz', key_prefix='model')

from sagemaker.tensorflow.model import TensorFlowModel
sagemaker_model = TensorFlowModel(model_data = 's3://' + sagemaker_session.default_bucket() + '/model/model.tar.gz',
                                  role = role,
                                  framework_version = '1.12',
                                  entry_point = 'train.py',
                                  py_version='py3')

%%time
#here I fail to deploy the model and get the error message
predictor = sagemaker_model.deploy(initial_instance_count=1,
                                   instance_type='ml.m4.xlarge')

1
  • For future readers, In my case I had to mention py_version='py2' to get it to work. Commented Dec 12, 2019 at 6:22

1 Answer 1

2

https://github.com/aws/sagemaker-python-sdk/issues/912#issuecomment-510226311

As mentioned in the issue

Python 3 isn't supported using the TensorFlowModel object, as the container uses the TensorFlow serving api library in conjunction with the GRPC client to handle making inferences, however the TensorFlow serving api isn't supported in Python 3 officially, so there are only Python 2 versions of the containers when using the TensorFlowModel object.

If you need Python 3 then you will need to use the Model object defined in #2 above. The inference script format will change if you need to handle pre and post processing. https://github.com/aws/sagemaker-tensorflow-serving-container#prepost-processing.

Sign up to request clarification or add additional context in comments.

2 Comments

Thank you very much! After I removed the py_version='py3', the deployment can now start, but I get this error then: UnexpectedStatusException: Error hosting endpoint sagemaker-tensorflow-2019-08-09-16-29-57-306: Failed. Reason: The primary container for production variant AllTraffic did not pass the ping health check. Please check CloudWatch logs for this endpoint.
That's a step in the right direction, but as the error says, I would try checking the logs in your CloudWatch. If you cannot discern what is going on, paste the logs here and maybe we can help. Or try reaching out to support...

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.