Docker Container with Apache Spark in standalone cluster mode - apache-spark

I am trying to construct a docker image containing Apache Spark. IT is built upon the openjdk-8-jre official image.
The goal is to execute Spark in cluster mode, thus having at least one master (started via sbin/start-master.sh) and one or more slaves (sbin/start-slave.sh). See spark-standalone-docker for my Dockerfile and entrypoint script.
The build itself actually goes through, the problem is that when I want to run the container, it starts and stops shortly after. The cause is that Spark master launch script starts the master in daemon mode and exits. Thus the container terminates, as there is no process running in the foreground anymore.
The obvious solution is to run the Spark master process in foreground, but I could not figure out how (Google did not turn up anything either). My "workaround-solution" is to run tails -f on the Spark log directory.
Thus, my questions are:
How can you run Apache Spark Master in foreground?
If the first is not possible / feasible / whatever, what is the preferred (i.e. best practice) solution to keeping a container "alive" (I really don't want to use an infinite loop and a sleep command)?

UPDATED ANSWER (for spark 2.4.0):
To start spark master on foreground, just set the ENV variable
SPARK_NO_DAEMONIZE=true on your environment before running ./start-master.sh
and you are good to go.
for more info, check $SPARK_HOME/sbin/spark-daemon.sh
# Runs a Spark command as a daemon.
#
# Environment Variables
#
# SPARK_CONF_DIR Alternate conf dir. Default is ${SPARK_HOME}/conf.
# SPARK_LOG_DIR Where log files are stored. ${SPARK_HOME}/logs by default.
# SPARK_MASTER host:path where spark code should be rsync'd from
# SPARK_PID_DIR The pid files are stored. /tmp by default.
# SPARK_IDENT_STRING A string representing this instance of spark. $USER by default
# SPARK_NICENESS The scheduling priority for daemons. Defaults to 0.
# SPARK_NO_DAEMONIZE If set, will run the proposed command in the foreground. It will not output a PID file.
##

How can you run Apache Spark Master in foreground?
You can use spark-class with Master.
bin/spark-class org.apache.spark.deploy.master.Master
and the same thing for workers:
bin/spark-class org.apache.spark.deploy.worker.Worker $MASTER_URL
If you're looking for production ready solution you should consider using a proper supervisor like dumb-init or tini.

Related

spark standalone running on docker cleanup not running

I'm running spark on standalone mode as a docker service where I have one master node and one spark worker. I followed the spark documentation instructions:
https://spark.apache.org/docs/latest/spark-standalone.html
to add the properties where the spark cluster cleans itself and I set those in my docker_entrypoint
export SPARK_WORKER_OPTS="-Dspark.worker.cleanup.enabled=true -Dspark.worker.cleanup.interval=900 -Dspark.worker.cleanup.appDataTtl=900
and verify that it was enables following the logs of the worker node service
My question is do we expect to get all directories located on SPARK_WORKER_DIR directory to be cleaned ? or does it only clean the application files
Because I still see some empty directories holding there

Starting multiple workers on a master node in Standalone mode

I have a machine with 80 cores. I'd like to start a Spark server in standalone mode on this machine with 8 executors, each with 10 cores. But, when I try to start my second worker on the master, I get an error.
$ ./sbin/start-master.sh
Starting org.apache.spark.deploy.master.Master, logging to ...
$ ./sbin/start-slave.sh spark://localhost:7077 -c 10
Starting org.apache.spark.deploy.worker.Worker, logging to ...
$ ./sbin/start-slave.sh spark://localhost:7077 -c 10
org.apache.spark.deploy.worker.Worker running as process 64606. Stop it first.
In the documentation, it clearly states "you can start one or more workers and connect them to the master via: ./sbin/start-slave.sh <master-spark-URL>". So why can't I do that?
A way to get the same parallelism is to start many workers.
You can do this by adding to the ./conf/spark-env.sh file:
SPARK_WORKER_INSTANCES=8
SPARK_WORKER_CORES=10
SPARK_EXECUTOR_CORES=10
In a single machine, it is quite complicated but you can try docker or Kubernetes.
Create multiple docker containers for spark workers.
Just specify a new identity for every new worker/master and then launch the start-worker.sh
export SPARK_IDENT_STRING=worker2
./spark-node2/sbin/start-worker.sh spark://DESKTOP-HSK5ETQ.localdomain:7077
thanks to https://stackoverflow.com/a/46205968/1743724

docker stop spark container from exiting

I know docker only listens to pid 1 and in case that pid exits (or turns into a daemon) it thinks the program exited and the container is shut down.
When apache-spark is started the ./start-master.sh script how can I kept the container running?
I do not think: while true; do sleep 1000; done is an appropriate solution.
E.g. I used command: sbin/start-master.sh to start the master. But it keeps shutting down.
How to keep it running when started with docker-compose?
As mentioned in "Use of Supervisor in docker", you could use phusion/baseimage-docker as a base image in which you can register scripts as "services".
The my_init script included in that image will take care of the exit signals management.
And the processes launched by start-master.sh would still be running.
Again, that supposes you are building your apache-spark image starting from phusion/baseimage-docker.
As commented by thaJeztah, using an existing image works too: gettyimages/spark/~/dockerfile/. Its default CMD will keep the container running.
Both options are cleaner than relying on a tail -f trick, which won't handle the kill/exit signals gracefully.
Here is another solution.
Create a file spark-env.sh with the following contents and copy it into the spark conf directory.
SPARK_NO_DAEMONIZE=true
If your CMD in the Dockerfile looks like this:
CMD ["/spark/sbin/start-master.sh"]
the container will not exit.
tail -f -n 50 /path/to/spark/logfile
This will keep the container alive and also provide useful info if you run -it interactive mode. You can run -d detached and it will stay alive.

what happens when I kill a yarn's node manager

Suppose there are 10 containers running on this machine(5 is mapreduce tasks, and 5 is spark on yarn executors).
And if I kill the node manager, what happens for these 10 containers process?
Before I restart the node manager ,what should I do first?
Killing nodemanager will only affect the containers of this particular node. All the running containers will get lost on restart/kill. They will get relaunched once the node comes up or the nodemanager process get start(if application/job still running).
NOTE: Jobs ApplicationMaster should not be running on this slave.
what happens when the node with the ApplicationMaster dies ?
In this case the yarn launchs a new ApplicationMaster on some other node. All the containers relaunched again in this case.
Answering according to hadoop 2.7.x dist: check this article: http://hortonworks.com/blog/resilience-of-yarn-applications-across-nodemanager-restarts/
If you don't have yarn.nodemanager.recovery.enabled set to true then you're container will be KILLED (spark or mapreduce or anything else) However you're job will most likely continue to run.
You need to check this property in your env using hadoop conf | grep yarn.nodemanager.recovery.dir . If it false which is for me by default then nothing you can do to prevent getting those container killed upon restart imo. However you can try to modify the flag and set other required property for future cases if you want containers to be recovered.
Look at this one too: http://www.cloudera.com/documentation/enterprise/5-4-x/topics/admin_ha_yarn_work_preserving_recovery.html

Hadoop daemons can't stop using proper command

Running Hadoop system would run some daemon jobs like namenode, journalnode, etc. I will use namenode as an example.
When we start namenode we can use command: hadoop-daemon.sh start namenode
When we stop namenode we can use command: hadoop-daemon.sh stop namenode.
But here comes the question, if I just start the namenode yesterday or couple of hours ago, the stop command would work fine. But if the namenode has been working for say 1 month. When I am using the stop command, it will show:
no namenode to stop.
But I can see the daemon NameNode running using command JPS. Then I have to use the kill command to kill the process.
Why would this happen? Any way to make sure the stop command can always work?
Thanks
The reason why hadoop-daemon.sh is not working after some time is because in hadoop-env.sh there are parameters called:
export HADOOP_PID_DIR
export HADOOP_SECURE_DN_PID_DIR
which stored pid number of those daemons. The default location of that directory is /tmp. The problem is /tmp folder will be automatically cleaned up after sometime(Red Hat Linux). In this case the pid file is deleted, so
when we run the daemon command, the command can't find the process id stored in the that file.
The same reson with yarn-daemon.sh command.
Modify hadoop-env.sh:
HADOOP_PID_DIR
HADOOP_SECURE_DN_PID_DIR
yarn-env.sh:
YARN_PID_DIR
mapred-env.sh:
HADOOP_MAPRED_PID_DIR
to other directories instead of using default /tmp folder should solve the problem.
After modification, restart all the processes related to that.
Also for security concern, folder contains pid should not be able to be accessed by other non-admin users.

Resources