There are several strategies I can think of:
Option A: Single stage inside the Dockerfile:
ADD ssh-private-key /root/.ssh/id_rsa
RUN git clone git@host:repo/path.git
This has the several significant downsides:
- Your private key is inside the docker image.
- The step will be cached from a previous build on later builds, even when your repo changes, unless you break the cache on an earlier step. That's because the
RUN line is unchanged.
Option B: Multi-stage inside the Dockerfile:
FROM base-image as clone
ADD ssh-private-key /root/.ssh/id_rsa
RUN git clone git@host:repo/path.git
RUN rm -rf /path/.git
FROM base-image as build
COPY --from=clone /path /path
...
By using the multi-stage, your ssh credentials are now only on the build host as long as you never push your "clone" stage layers anywhere. This is slightly better, but still has caching issues (see the tip at the end). By adding the rm step, the later COPY --from will no longer copy those files. Since the build image or later should be all you ship, being inefficient on the layers in the clone stage is less of a concern.
Option C: From your CI server:
Typically, the Dockerfile is in the code repo, and people tend to clone this first, before running the build (though it is possible to skip this by using a git repo as a build context). Therefore you'll often see CI servers perform the clone and update rather than the Dockerfile itself. The resulting Dockerfile is then just:
COPY path /path
This has several advantages:
- The credentials never get added to the docker image layers.
- Updating the repo doesn't rerunning the clone from scratch, the previous clone is already there and you can run a
git pull instead, which is much faster.
- Copying files into the image can include
.git inside of the .dockerignore to exclude all of the git internals. Therefore you only add the final state of the repo to your docker image, resulting in a much smaller image.
Admittedly, this option is saying "don't do that" to your question, but it's also the most popular option I've seen from people facing this challenge, for good reason.
Option D: With BuildKit:
BuildKit has several experimental features that may be useful. These require newer versions of Docker that may not be on every build host, and the syntax to inject the options is not backwards compatible. The main two options are secrets or ssh credential injection, and cache directories. Both of these can inject a file or directory into the build step that is not saved into the resulting image layers. Here's what that could look like (this is untested):
# syntax=docker/dockerfile:experimental
FROM base-image
ARG CACHE_BUST
RUN --mount=type=cache,target=/git-cache,id=git-cache,sharing=locked \
--mount=type=secret,id=ssh,target=/root/.ssh/id_rsa \
if [ ! -d /git-cache/path/.git ]; then \
git clone git@host:repo/path.git /git-cache/path; \
else \
(cd /git-cache/path && git pull --force); \
fi; \
tar -cC /git-cache/path --exclude .git . | tar -xC /path
And then the build would look like:
DOCKER_BUILDKIT=1 docker build \
--secret id=ssh,src=$HOME/.ssh/id_rsa \
--build-arg "CACHE_BUST=$(date +%s)" \
-t img:tag \
.
This is fairly convoluted, but has a few advantages:
- The cache directory keeps the git repo from the last build, saving a large clone for every build, only pulling the changes.
- The tar command was basically a copy that excluded the
.git directory from the final image, making your image smaller. This copy is needed since the cache directory is not saved into the resulting image layers.
- The ssh credentials were injected as a secret that appears similar to a single file read-only volume mount for that specific
RUN step, and the contents of that secret were not saved to the resulting image layer.
To read more about BuildKit's experimental features, see: https://github.com/moby/buildkit/blob/master/frontend/dockerfile/docs/experimental.md
Tip: Cache busting a specific line:
To bust the docker build cache on a specific line, you can inject a build arg that changes on every build right before the RUN line that you want to rerun. In the BuildKit example, there was the:
ARG CACHE_BUST
before the RUN line that I did not want to cache, and the build included:
--build-arg "CACHE_BUST=$(date +%s)"
to inject a unique variable for each build. This ensures the build always runs that step, even though the command is otherwise unchanged. The build arg is injected as an environment variable to the RUN so docker then sees this command has changed and cannot be reused from the cache.
Ideally, you would clone a specific tag or commit id, which allows you to cache builds that use that same git clone from previous builds. However, if you are cloning master, this cache busting technique will be needed.