The situation is the following: I have successfully, locally developed a super simple ETL process which extracts data from some remote location and then writes that unprocessed data into a MongoDB container on my local Windows machine. Now, I want to schedule this process with Apache-Airflow using the DockerOperator for every task, i.e. I want to create a docker image of my source code and then execute the source code in that image using the DockerOperator. Since I am working on a Windows machine, I can only use Airflow from within a docker container to actually trigger an Airflow DAG. Both the Airflow container (called webserver below) and Mongo container (called mongo below) are specified in a docker-compose.yml file, which you can see at the end.
To the best of my knowledge, every time my simple ETL DAG is triggered and the DockerOperator is executed, a new, "sibling" container for every ETL task is created by the webserver container, then the source code inside this new container is executed and after the task finished, this new container is deleted again. If my understanding is correct up to this point, the webserver container needs to be able to execute docker commands like e.g. docker build... to be able to create these sibling containers.
To test this theory, I added the volumes /var/run/docker.sock:/var/run/docker.sock and /usr/bin/docker:/usr/bin/docker to the definition of the webserver container in the docker-compose.yml file so that the webserver container can use the docker daemon on my host (windows) machine. Then, I started the webserver and mongo containers using docker-compose up -d, I entered into the webserver container using docker exec -it <name_of_webserver_container> /bin/bash and then I tried the simple docker command docker ps --all. However, the output of this command was bash: docker: command not found. So, it seems like Docker was not installed correctly inside the webserver container. How can I make sure Docker is installed inside the webserver container, so that other sibbling containers can be created?
Below you can find the relevant aspects of the docker-compose.yml file and the Dockerfile used for the webserver container.
docker-compose.yml located in the project root directory:
webserver:
build: ./docker-airflow
restart: always
privileged: true
depends_on:
- postgres # some other service I cut out from this post
- mongo
- mongo-express # some other service I cut out from this post
environment:
- LOAD_EX=n
- EXECUTOR=Local
- POSTGRES_USER=some_user
- POSTGRES_PASSWORD=some_pw
- POSTGRES_DB=airflowdb
volumes:
# DAG folder
- ./docker-airflow/dags:/usr/local/airflow/dags
# Add path for external python modules
- ./src:/home/python_modules
# Add path for airflow workspace folder
- ./docker-airflow/workdir:/home/workdir
# Mount the docker socket from the host (currently my laptop) into the webserver container
- //var/run/docker.sock:/var/run/docker.sock # double // are necessary for windows host
ports:
# Change port to 8081 to avoid Jupyter conflicts
- 8081:8080
command: webserver
healthcheck:
test: ["CMD-SHELL", "[ -f /usr/local/airflow/airflow-webserver.pid ]"]
interval: 30s
timeout: 30s
retries: 3
networks:
- mynet
Dockerfile for the webserver container located in the docker-airflow folder:
FROM puckel/docker-airflow:1.10.4
# Adds DAG folder to the PATH
ENV PYTHONPATH "${PYTHONPATH}:/home/python_modules:/usr/local/airflow/dags"
# Install the optional packages and change the user to airflow again
COPY requirements.txt requirements.txt
USER root
RUN pip install -r requirements.txt
# Install docker inside the webserver container
RUN pip install -U pip && pip install docker
ENV SHARE_DIR /usr/local/share
# Install simple text editor for debugging
RUN ["apt-get", "update"]
RUN ["apt-get", "-y", "install", "vim"]
USER airflow
EDIT/Update:
After incorporating Noe's comments, I changed the Dockerfile of the webserver container to the following:
FROM puckel/docker-airflow:1.10.4
# Adds DAG folder to the PATH
ENV PYTHONPATH "${PYTHONPATH}:/home/python_modules:/usr/local/airflow/dags"
# Install the optional packages and change the user to airflow again
COPY requirements.txt requirements.txt
USER root
RUN pip install -r requirements.txt
# Install docker inside the webserver container
RUN curl -sSL https://get.docker.com/ | sh
ENV SHARE_DIR /usr/local/share
# Install simple text editor for debugging
RUN ["apt-get", "update"]
RUN ["apt-get", "-y", "install", "vim"]
USER airflow
and I added docker==4.1.0 to the requirements.txt file (referenced in the above Dockerfile) which contains all to-be-installed packages inside the webserver container.
Now however, when I first start the services with docker-compose up --build -d, then enter into the webserver container like so docker exec -it <name_of_webserver_container> /bin/bash and then enter the simple docker command docker ps --all, I get the following output:
Got permission denied while trying to connect to the Docker daemon socket at unix:///var/run/docker.sock: Get http://%2Fvar%2Frun%2Fdocker.sock/v1.40/containers/json?all=1: dial unix /var/run/docker.sock: connect: permission denied
So, seems like I still need to grant some rights/privileges, which I find confusing, because in the webserver section of the docker-compose.yml file, I have already put privileged: true. So does anyone know the cause of this problem?
EDIT/UPDATE/ANSWER
After removing USER airlfow from the Dockerfile of the webserver container, I am able to docker commands inside the webserver container!