2

The situation is the following: I have successfully, locally developed a super simple ETL process which extracts data from some remote location and then writes that unprocessed data into a MongoDB container on my local Windows machine. Now, I want to schedule this process with Apache-Airflow using the DockerOperator for every task, i.e. I want to create a docker image of my source code and then execute the source code in that image using the DockerOperator. Since I am working on a Windows machine, I can only use Airflow from within a docker container to actually trigger an Airflow DAG. Both the Airflow container (called webserver below) and Mongo container (called mongo below) are specified in a docker-compose.yml file, which you can see at the end.

To the best of my knowledge, every time my simple ETL DAG is triggered and the DockerOperator is executed, a new, "sibling" container for every ETL task is created by the webserver container, then the source code inside this new container is executed and after the task finished, this new container is deleted again. If my understanding is correct up to this point, the webserver container needs to be able to execute docker commands like e.g. docker build... to be able to create these sibling containers.

To test this theory, I added the volumes /var/run/docker.sock:/var/run/docker.sock and /usr/bin/docker:/usr/bin/docker to the definition of the webserver container in the docker-compose.yml file so that the webserver container can use the docker daemon on my host (windows) machine. Then, I started the webserver and mongo containers using docker-compose up -d, I entered into the webserver container using docker exec -it <name_of_webserver_container> /bin/bash and then I tried the simple docker command docker ps --all. However, the output of this command was bash: docker: command not found. So, it seems like Docker was not installed correctly inside the webserver container. How can I make sure Docker is installed inside the webserver container, so that other sibbling containers can be created?

Below you can find the relevant aspects of the docker-compose.yml file and the Dockerfile used for the webserver container.

docker-compose.yml located in the project root directory:

webserver:
        build: ./docker-airflow
        restart: always
        privileged: true
        depends_on:
            - postgres  # some other service I cut out from this post
            - mongo
            - mongo-express  # some other service I cut out from this post
        environment:
            - LOAD_EX=n
            - EXECUTOR=Local
            - POSTGRES_USER=some_user
            - POSTGRES_PASSWORD=some_pw
            - POSTGRES_DB=airflowdb
        volumes:
            # DAG folder
            - ./docker-airflow/dags:/usr/local/airflow/dags
            # Add path for external python modules
            - ./src:/home/python_modules
            # Add path for airflow workspace folder
            - ./docker-airflow/workdir:/home/workdir
            # Mount the docker socket from the host (currently my laptop) into the webserver container
            - //var/run/docker.sock:/var/run/docker.sock  # double // are necessary for windows host
        ports:
            # Change port to 8081 to avoid Jupyter conflicts
            - 8081:8080
        command: webserver
        healthcheck:
            test: ["CMD-SHELL", "[ -f /usr/local/airflow/airflow-webserver.pid ]"]
            interval: 30s
            timeout: 30s
            retries: 3
        networks:
            - mynet

Dockerfile for the webserver container located in the docker-airflow folder:

FROM puckel/docker-airflow:1.10.4

# Adds DAG folder to the PATH
ENV PYTHONPATH "${PYTHONPATH}:/home/python_modules:/usr/local/airflow/dags"

# Install the optional packages and change the user to airflow again
COPY requirements.txt requirements.txt
USER root
RUN pip install -r requirements.txt

# Install docker inside the webserver container
RUN pip install -U pip && pip install docker
ENV SHARE_DIR /usr/local/share

# Install simple text editor for debugging
RUN ["apt-get", "update"]
RUN ["apt-get", "-y", "install", "vim"]

USER airflow

EDIT/Update:

After incorporating Noe's comments, I changed the Dockerfile of the webserver container to the following:

FROM puckel/docker-airflow:1.10.4

# Adds DAG folder to the PATH
ENV PYTHONPATH "${PYTHONPATH}:/home/python_modules:/usr/local/airflow/dags"

# Install the optional packages and change the user to airflow again
COPY requirements.txt requirements.txt
USER root
RUN pip install -r requirements.txt

# Install docker inside the webserver container
RUN curl -sSL https://get.docker.com/ | sh
ENV SHARE_DIR /usr/local/share

# Install simple text editor for debugging
RUN ["apt-get", "update"]
RUN ["apt-get", "-y", "install", "vim"]

USER airflow

and I added docker==4.1.0 to the requirements.txt file (referenced in the above Dockerfile) which contains all to-be-installed packages inside the webserver container.

Now however, when I first start the services with docker-compose up --build -d, then enter into the webserver container like so docker exec -it <name_of_webserver_container> /bin/bash and then enter the simple docker command docker ps --all, I get the following output:

Got permission denied while trying to connect to the Docker daemon socket at unix:///var/run/docker.sock: Get http://%2Fvar%2Frun%2Fdocker.sock/v1.40/containers/json?all=1: dial unix /var/run/docker.sock: connect: permission denied

So, seems like I still need to grant some rights/privileges, which I find confusing, because in the webserver section of the docker-compose.yml file, I have already put privileged: true. So does anyone know the cause of this problem?

EDIT/UPDATE/ANSWER

After removing USER airlfow from the Dockerfile of the webserver container, I am able to docker commands inside the webserver container!

3 Answers 3

3

What you're trying to do is called docker in docker.

You need to do these things:

  • install the docker client in the container

Add RUN curl -sSL https://get.docker.com/ | sh

  • mount the docker socket

You did it well mount //var/run/docker.sock:/var/run/docker.sock

  • run your container in priviliged mode

Add privileged: true to your container

In your specific case you need to do these things:

  • Remove RUN pip install -U pip && pip install docker because we already installed it
  • Remove USER airflow, you need to use the default user or root user
  • Add docker==4.1.0 to the requirements.txt
Sign up to request clarification or add additional context in comments.

5 Comments

Hi @Noe, in the Dockerfile for the webserver container, I replaced RUN pip install -U pip && pip install docker with RUN curl -sSL https://get.docker.com/ | sh and I added docker==4.1.0 to the requirements.txt file which contains all packages to be installed inside the webserver container (see edited question for details). However, I already had privileged: true in the definition of the webserver container in the docker-compose.yml, so I do not understand what you meant when you said "Add privileged: trueto your container. Could you elaborate?
With the following changes (including your help) I got it to work: 1. Replacing RUN pip install -U pip && pip install docker with RUN curl -sSL https://get.docker.com/ | sh, adding a second / to the docker socket volume, i.e. //var/run/docker.sock:/var/run/docker.sock, removing USER airflow from the Dockerfile of the webserver container and adding docker==4.1.0' to the requirements.txt` file. I will mark your answer as accepted now, because it really helped
Okay, nice it helps, i will accpet your modification as soon as you publish it
Hi Noe, I put some updates into the question. Is that what you meant?
I thought you were editing the answer. But i will do it thanks
1

The approach from @Noe worked from me as well. I also had to upgrade my WSL for Ubuntu from V1 to V2 with wsl --set-version Ubuntu-20.04 2

Here is the Dockerfile + docker compose for Airflow 2.1.1

Dockerfile

FROM apache/airflow:2.1.1

ENV PYTHONPATH "${PYTHONPATH}:/home/python_modules:/opt/airflow/dags"

COPY requirements.txt requirements.txt
USER root
RUN pip install -r requirements.txt

RUN curl -sSL https://get.docker.com/ | sh
ENV SHARE_DIR /usr/local/share

Docker Compose

---
    version: '3'
    x-airflow-common:
      &airflow-common
      build: .
      # image: ${AIRFLOW_IMAGE_NAME:-apache/airflow:2.1.1}
      # 
      # group_add:
      #   - 0
      environment:
        &airflow-common-env
        AIRFLOW__CORE__EXECUTOR: CeleryExecutor
        AIRFLOW__CORE__SQL_ALCHEMY_CONN: postgresql+psycopg2://airflow:airflow@postgres/airflow
        AIRFLOW__CELERY__RESULT_BACKEND: db+postgresql://airflow:airflow@postgres/airflow
        AIRFLOW__CELERY__BROKER_URL: redis://:@redis:6379/0
        AIRFLOW__CORE__FERNET_KEY: ''
        AIRFLOW__CORE__DAGS_ARE_PAUSED_AT_CREATION: 'true'
        AIRFLOW__CORE__LOAD_EXAMPLES: 'false'
        # Need as env var otherwise container crashes while exiting. Airflow Issue # 13487
        AIRFLOW__CORE__ENABLE_XCOM_PICKLING: 'true'
        AIRFLOW__SCHEDULER__DAG_DIR_LIST_INTERVAL: 5 # Just to have a fast load in the front-end. Do not use in prod w/ config 
        # Enable the Airflow API
        AIRFLOW__API__AUTH_BACKEND: 'airflow.api.auth.backend.basic_auth'
        # _PIP_ADDITIONAL_REQUIREMENTS: ${_PIP_ADDITIONAL_REQUIREMENTS:-snowflake-connector-python==2.3.10 boto3==1.15.18 botocore==1.18.18 paramiko==2.6.0 docker==5.0.0}
        # PYTHONPATH: "${PYTHONPATH}:/home/python_modules:/opt/airflow/dags"
      volumes:
        - ./dags:/opt/airflow/dags
        - ./logs:/opt/airflow/logs
        - ./plugins:/opt/airflow/plugins
        # Pass the Docker Daemon as a volume to allow the webserver containers to start docker images
        # Windows requires a leading double slash (//) to address the Docker socket on the host
        - //var/run/docker.sock:/var/run/docker.sock
      #user: "${AIRFLOW_UID:-50000}:${AIRFLOW_GID:-50000}" 
      #user: "${AIRFLOW_UID:-50000}:${AIRFLOW_GID:-0}" 
      depends_on:
        redis:
          condition: service_healthy
        postgres:
          condition: service_healthy
    
    services:
      postgres:
        image: postgres:13
        environment:
          POSTGRES_USER: airflow
          POSTGRES_PASSWORD: airflow
          POSTGRES_DB: airflow
        volumes:
          - postgres-db-volume:/var/lib/postgresql/data
        healthcheck:
          test: ["CMD", "pg_isready", "-U", "airflow"]
          interval: 5s
          retries: 5
        restart: always
    
      redis:
        image: redis:latest
        ports:
          - 6379:6379
        healthcheck:
          test: ["CMD", "redis-cli", "ping"]
          interval: 5s
          timeout: 30s
          retries: 50
        restart: always
    
      airflow-webserver:
        <<: *airflow-common
        # Give extended privileges to the container
        command: webserver
        ports:
          - 8080:8080
        healthcheck:
          test: ["CMD", "curl", "--fail", "http://localhost:8080/health"]
          interval: 10s
          timeout: 10s
          retries: 5
        restart: always
    
      airflow-scheduler:
        <<: *airflow-common
        command: scheduler
        healthcheck:
          test: ["CMD-SHELL", 'airflow jobs check --job-type SchedulerJob --hostname "$${HOSTNAME}"']
          interval: 10s
          timeout: 10s
          retries: 5
        restart: always
    
      airflow-worker:
        <<: *airflow-common
        # Give extended privileges to the container
        command: celery worker
        healthcheck:
          test:
            - "CMD-SHELL"
            - 'celery --app airflow.executors.celery_executor.app inspect ping -d "celery@$${HOSTNAME}"'
          interval: 10s
          timeout: 10s
          retries: 5
        restart: always
      
      # Runs airflow-db-init and airflow-db-upgrade
      # Creates a new user airflow/airflow
      airflow-init:
        <<: *airflow-common
        command: version
        environment:
          <<: *airflow-common-env
          _AIRFLOW_DB_UPGRADE: 'true'
          _AIRFLOW_WWW_USER_CREATE: 'true'
          _AIRFLOW_WWW_USER_USERNAME: ${_AIRFLOW_WWW_USER_USERNAME:-airflow}
          _AIRFLOW_WWW_USER_PASSWORD: ${_AIRFLOW_WWW_USER_PASSWORD:-airflow}
    
      flower:
        <<: *airflow-common
        command: celery flower
        ports:
          - 5555:5555
        healthcheck:
          test: ["CMD", "curl", "--fail", "http://localhost:5555/"]
          interval: 10s
          timeout: 10s
          retries: 5
        restart: always
    
    volumes:
      postgres-db-volume:

Comments

0

I just mounted docker binary itself into the container, not sure if recommended:

  volumes:
    - /var/run/docker.sock:/var/run/docker.sock
    - /usr/bin/docker:/usr/bin/docker

In this case I don't need to build new images just to install docker cli

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.