I am trying to construct a docker image containing Apache Spark. IT is built upon the openjdk-8-jre official image.
The goal is to execute Spark in cluster mode, thus having at least one master (started via sbin/start-master.sh) and one or more slaves (sbin/start-slave.sh). See spark-standalone-docker for my Dockerfile and entrypoint script.
The build itself actually goes through, the problem is that when I want to run the container, it starts and stops shortly after. The cause is that Spark master launch script starts the master in daemon mode and exits. Thus the container terminates, as there is no process running in the foreground anymore.
The obvious solution is to run the Spark master process in foreground, but I could not figure out how (Google did not turn up anything either). My "workaround-solution" is to run tails -f on the Spark log directory.
Thus, my questions are:
- How can you run Apache Spark Master in foreground?
- If the first is not possible / feasible / whatever, what is the preferred (i.e. best practice) solution to keeping a container "alive" (I really don't want to use an infinite loop and a sleep command)?