2

We are using AWS Sagemaker feature, bring your own docker, where we have inference model written in R. As I understood, batch transform job runs container in a following way:

docker run image serve

Also, on docker we have a logic to determine which function to invoke:

args <- commandArgs()
if (any(grepl('train', args))) {
    train()}
if (any(grepl('serve', args))) {
    serve()}

Is there a way, to override default container invocation so we can pass some additional parameters?

6
  • why don't you pass the additional parameters as hyper parameters? Commented Sep 7, 2020 at 7:43
  • Could you please elaborate your response in more details. Commented Sep 7, 2020 at 7:52
  • In the docker container your code has access to a SageMaker-created file named hyperparameter.json (docs.aws.amazon.com/sagemaker/latest/dg/…). This contains the hyperparameter value you give to the SDK when launching a training job. So you could use that placeholder to pass parameters needed at training time Commented Sep 7, 2020 at 8:56
  • you need to use entrypoint , if I understand what you mean exactly in your Post Commented Sep 7, 2020 at 8:56
  • So, to provide more information regarding our case. We create Batch Transform Job using CreateTransformJobRequest using Lambda. There we specify model for inference. This way, we have several models pointing out to several different images on ECR, and we just provide model name when creating batch transform job. Idea is to have one sagemaker model, pointing out to one image, that will contain all inference models in container. Then, somehow in runtime to choose which one to trigger. Initial idea is to check if we can pass additional param. @OlivierCruchant I will look into hyperparameter.json. Commented Sep 7, 2020 at 9:03

1 Answer 1

3
+50

As you said, and as indicated in the AWS documentation, Sagemaker will run your container with the following command:

docker run image serve

By issuing this command, Sagemaker will overwrite any CMD that you provide in your container Dockerfile, so you cannot use CMD to provide dynamic arguments to your program.

We considered using the Dockerfile ENTRYPOINT to consume some environment variables, but the documentation of AWS dictates that it is preferable to use the exec form of the ENTRYPOINT. Something like:

ENTRYPOINT ["/usr/bin/Rscript", "/opt/ml/mars.R", "--no-save"]

I think that, for analogy with model training, they need this kind of container execution to enable the container to receive termination signals:

The exec form of the ENTRYPOINT instruction starts the executable directly, not as a child of /bin/sh. This enables it to receive signals like SIGTERM and SIGKILL from SageMaker APIs.

To allow variable expansion, we need to use the ENTRYPOINT shell form. Imagine:

ENTRYPOINT ["sh", "-c", "/usr/bin/Rscript", "/opt/ml/mars.R", "--no-save", "$ENV_VAR1"]

If you try to do the same with the exec form, the variables provided will be treated as literal and will not be substituted for their actual values.

Please, see the approved answer to this Stack Overflow question for a great explanation of this subject.

But, one thing you can do is obtain the value of these variables in your R code, similar to when you process commandArgs:

ENV_VAR1 <- Sys.getenv("ENV_VAR1")

To pass environment variables to the container, as indicated in the AWS documentation, you can use the CreateModel and CreateTransformJob requests on your container.

You probably will need to include in your Dockerfile ENV definitions for every required environment variable on your container, and provide for these definitions default values with ARG:

ARG ENV_VAR1_DEFAULT_VALUE=VAL1
ENV_VAR1=$ENV_VAR1_DEFAULT_VALUE
Sign up to request clarification or add additional context in comments.

4 Comments

At the end I've done this using env variables passed to container, using CreateTransformJobRequest and property Environment. In R script I've read them using following line env_var1 <- Sys.getenv("ENV_VAR1"). This approach worked fine.
I am very happy to know that the answer was helpful.
@jccampanero Do you know of any way to add the --privileged flag to the docker run command?
Hi Austin. Honestly no, but I'm afraid you can't elevate the container in Sagemaker, for two reasons: first, they run your container and you have no control over it, that's the point of this specific question; and more importantly, they provide a managed environment your container will run in: if they gave you the ability to run your container in privileged mode, unless they remap your container root user - I think they won't o that - you have full control over the host environment in which the container is running with the security implications it could have.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.