Gracefully killing container and pods in kubernetes with spark exception - apache-spark

What is the recommended way to gracefully kill a container and driver pods in kubernetes when an application fails or reaches an exception. Currently, when my application runs into an exception my pods and executors continue to run and I noticed that my container doesn't get killed unless an explicit exit 1 is used.For some reason my spark application doesn't cause an exit 1 status or sigterm signal to be sent to the container or pod.
Tried to add following to yaml specs based on recommendations but still not getting pod driver and executors to terminate:
spec:
terminationGracePeriodSeconds: 0
driver:
lifecycle:
preStop:
exec:
command:
- /bin/bash
- -c
- touch /var/run/killspark && sleep 65

The preStop LiveCycle hook you added won't have any effect, since it is only triggered
before a container is terminated due to an API request or management
event such as liveness probe failure, preemption, resource contention
and others
https://kubernetes.io/docs/concepts/containers/container-lifecycle-hooks/
I suspect, what you really have to figure out is why your container main process keeps running despite the exception.

Related

How can I inspect per executor/node memory usage metrics of a pyspark job on Dataproc?

I'm running a PySpark job in Google Cloud Dataproc, in a cluster with half the nodes being preemptible, and seeing several errors in the job output (the driver output) such as:
...spark.scheduler.TaskSetManager: Lost task 9696.0 in stage 0.0 ... Python worker exited unexpectedly (crashed)
...
Caused by java.io.EOFException
...
...YarnSchedulerBackend$YarnSchedulerEndpoint: Requesting driver to remove executor 177 for reason Container marked as failed: ... Exit status: -100. Diagnostics: Container released on a *lost* node
...spark.storage.BlockManagerMasterEndpoint: Error try to remove broadcast 3 from block manager BlockManagerId(...)
Perhaps by coincidence, the errors mostly seem to be coming from preemptible nodes.
My suspicion is that these opaque errors are coming from the node or executors running out of memory, but there don't seem to be any granular memory related metrics exposed by Dataproc.
How can I determine why a node was considered lost? Is there a way I can inspect memory usage per node or executor to validate whether these errors are being caused by high memory usage? If YARN is the one which is killing containers / determining nodes are lost, then hopefully there's a way to introspect why?
Because you are using Preemptible VMs which are short-lived and guaranteed to last for up to 24 hours. This means that when GCE shutdowns Preemptible VMs you see errors like this:
YarnSchedulerBackend$YarnSchedulerEndpoint: Requesting driver to remove executor 177 for reason Container marked as failed: ... Exit status: -100. Diagnostics: Container released on a lost node
Open a secure shell from your machine to the cluster. You'll require gcloud sdk installed for that.
gcloud compute ssh ${HOSTNAME}-m --project=${PROJECT}
Then run the following commands in the cluster.
List all nodes in the cluster
yarn node -list
Then using ${NodeID} to get report on the node state.
yarn node -status ${NodeID}
You could also set up local port forwarding via SSH to Yarn WebUI server instead of running commands directly in the cluster.
gcloud compute ssh ${HOSTNAME}-m \
--project=${PROJECT} -- \
-L 8088:${HOSTNAME}-m:8088 -N
Then go to http://localhost:8088/cluster/apps in your browser.

Why my worker node fails after submitting topology?

Here is my minimal Multi container (storm 1.1.0) setup on Kubernetes on Azure using Azure container service
- 1 zookeeper container
- 1 nimbus container (UI is running on the same container)
- 1 worker container
All containers are connected properly,can see that in the Storm UI and zookeeper logs.
But When I deploy the test topology - worker node fails (supervisor process waits for worker to start and restarts).Seems like worker process is not started by supervisor.
Here is my storm.yaml config for the worker node:
storm.zookeeper.servers:
- "10.0.xx.xxx"
storm.zookeeper.port: 2181
nimbus.seeds: ["10.0.xxx.xxx"]
storm.local.dir: "/storm/datadir"
Could you please help me out ?

How to enable a Spark-Mesos job to be launched from inside a Docker container?

Summary:
Is it possible to submit a Spark job on Mesos from inside a Docker container with 1 Mesos master (no Zookeeper) and 1 Mesos agent also each running in separate Docker containers (on the same host for now)? The Mesos containerizer described at http://mesos.apache.org/documentation/latest/container-image/ seems to apply to the case where the Mesos application is simply encapsulated in a Docker container and run. My Docker application is more interactive with multiple PySpark Mesos jobs being instantiated at run-time based on user input. The driver program in the Docker container is not itself run as a Mesos app. Only the user-initiated job requests are handled as PySpark Mesos apps.
Specifics:
I have 3 Docker containers based on centos:7 linux, and running on the same host machine for now:
Container "Master" running a Mesos Master.
Container "Agent" running a Mesos Agent.
Container "Test" with Spark and Mesos installed where I run a bash shell and launch the following PySpark test program from the command line.
from pyspark import SparkContext, SparkConf
from operator import add
# Configure Spark
sp_conf = SparkConf()
sp_conf.setAppName("spark_test")
sp_conf.set("spark.scheduler.mode", "FAIR")
sp_conf.set("spark.dynamicAllocation.enabled", "false")
sp_conf.set("spark.driver.memory", "500m")
sp_conf.set("spark.executor.memory", "500m")
sp_conf.set("spark.executor.cores", 1)
sp_conf.set("spark.cores.max", 1)
sp_conf.set("spark.mesos.executor.home", "/usr/local/spark-2.1.0")
sp_conf.set("spark.executor.uri", "file://usr/local/spark-2.1.0-bin-without-hadoop.tgz")
sc = SparkContext(conf=sp_conf)
# Simple computation
x = [(1.5,100.),(1.5,200.),(1.5,300.),(2.5,150.)]
rdd = sc.parallelize(x,1)
tot = rdd.foldByKey(0,add).collect()
cnt = rdd.countByKey()
time = [t[0] for t in tot]
avg = [t[1]/cnt[t[0]] for t in tot]
print 'tot=', tot
print 'cnt=', cnt
print 't=', time
print 'avg=', avg
The relevant software versions I am using are as follows:
Hadoop: 2.7.3
Spark: 2.1.0
Mesos: 1.2.0
Docker: 17.03.1-ce, build c6d412e
The following works fine:
I can run the simple PySpark test program above from inside the Test container with Spark's MASTER=local[N] for N=1 or N=4.
I can see in the Mesos logs and in the Mesos user interface (UI) that the Mesos agent and master come up fine. The Mesos UI shows that the agent is connected with plenty of resources (cpu, memory, disk).
I can run the Mesos Python tests successfully from inside the Test container with /usr/local/mesos-1.2.0/build/src/examples/python/test-framework 127.0.0.1:5050. This seems to confirm that the Mesos containers can be accessed from within my Test container, but these tests are not using Spark.
This is the Failure:
With Spark's MASTER=mesos://127.0.0.1:5050, when I launch my PySpark test program from inside the Test container there is activity in the logs of both the Mesos Master and Agent, and in the couple seconds before failure, the Mesos UI shows resources assigned for the job that are well within what is available. However, the PySpark test program then fails with: WARN scheduler.TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources.
The steps I followed are as follows.
Start Mesos Master:
docker run -it --net=host -p 5050:5050 the_master
Relevant excerpts from the master's log shows:
I0418 01:05:08.540192 27 master.cpp:383] Master 15b354eb-6a20-4bc9-a13b-6533b1e91bd2 (localhost) started on 127.0.0.1:5050
I0418 01:05:08.540210 27 master.cpp:385] Flags at startup: --agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" --allocation_interval="1secs" --allocator="HierarchicalDRF" --authenticate_agents="false" --authenticate_frameworks="false" --authenticate_http_frameworks="false" --authenticate_http_readonly="false" --authenticate_http_readwrite="false" --authenticators="crammd5" --authorizers="local" --framework_sorter="drf" --help="false" --hostname_lookup="true" --http_authenticators="basic" --initialize_driver_logging="true" --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" --max_agent_ping_timeouts="5" --max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" --max_unreachable_tasks_per_framework="1000" --quiet="false" --recovery_agent_removal_limit="100%" --registry="replicated_log" --registry_fetch_timeout="1mins" --registry_gc_interval="15mins" --registry_max_agent_age="2weeks" --registry_max_agent_count="102400" --registry_store_timeout="20secs" --registry_strict="false" --root_submissions="true" --user_sorter="drf" --version="false" --webui_dir="/usr/local/mesos-1.2.0/build/../src/webui" --work_dir="/var/lib/mesos" --zk_session_timeout="10secs"
Start Mesos Agent:
docker run -it --net=host -e MESOS_AGENT_PORT=5051 the_agent
The agent's log shows:
I0418 01:42:00.234244 40 slave.cpp:212] Flags at startup: --appc_simple_discovery_uri_prefix="http://" --appc_store_dir="/tmp/mesos/store/appc" --authenticate_http_readonly="false" --authenticate_http_readwrite="false" --authenticatee="crammd5" --authentication_backoff_factor="1secs" --authorizer="local" --cgroups_cpu_enable_pids_and_tids_count="false" --cgroups_enable_cfs="false" --cgroups_hierarchy="/sys/fs/cgroup" --cgroups_limit_swap="false" --cgroups_root="mesos" --container_disk_watch_interval="15secs" --containerizers="mesos" --default_role="*" --disk_watch_interval="1mins" --docker="docker" --docker_kill_orphans="true" --docker_mesos_image="spark-mesos-agent-test" --docker_registry="https://registry-1.docker.io" --docker_remove_delay="6hrs" --docker_socket="/var/run/docker.sock" --docker_stop_timeout="0ns" --docker_store_dir="/tmp/mesos/store/docker" --docker_volume_checkpoint_dir="/var/run/mesos/isolators/docker/volume" --enforce_container_disk_quota="false" --executor_registration_timeout="1mins" --executor_shutdown_grace_period="5secs" --fetcher_cache_dir="/tmp/mesos/fetch" --fetcher_cache_size="2GB" --frameworks_home="" --gc_delay="1weeks" --gc_disk_headroom="0.1" --hadoop_home="" --help="false" --hostname_lookup="true" --http_authenticators="basic" --http_command_executor="false" --http_heartbeat_interval="30secs" --initialize_driver_logging="true" --isolation="posix/cpu,posix/mem" --launcher="posix" --launcher_dir="/usr/local/mesos-1.2.0/build/src" --logbufsecs="0" --logging_level="INFO" --max_completed_executors_per_framework="150" --oversubscribed_resources_interval="15secs" --perf_duration="10secs" --perf_interval="1mins" --qos_correction_interval_min="0ns" --quiet="false" --recover="reconnect" --recovery_timeout="15mins" --registration_backoff_factor="1secs" --revocable_cpu_low_priority="true" --runtime_dir="/var/run/mesos" --sandbox_directory="/mnt/mesos/sandbox" --strict="true" --switch_user="false" --systemd_enable_support="false" --systemd_runtime_directory="/run/systemd/system" --version="false" --work_dir="/var/lib/mesos"
I get the following warning for both the Mesos Master and Agent, but ignore it because I am running everything on the same host for now:
Master/Agent bound to loopback interface! Cannot communicate with remote schedulers or agents. You might want to set '--ip' flag to a routable IP address.
In fact, my tests with assigning a routable IP address instead of 127.0.0.1 failed to change any of the behavior I describe here.
Start Test Container (with bash shell for testing):
docker run -it --net=host the_test /bin/bash
Some relevant environment variables set inside all three container (Master, Agent, and Test):
HADOOP_HOME=/usr/local/hadoop-2.7.3
HADOOP_CONF_DIR=/usr/local/hadoop-2.7.3/etc/hadoop
SPARK_HOME=/usr/local/spark-2.1.0
SPARK_EXECUTOR_URI=file:////usr/local/spark-2.1.0-bin-without-hadoop.tgz
MASTER=mesos://127.0.0.1:5050
PYSPARK_PYTHON=/usr/local/anaconda2/bin/python
PYSPARK_DRIVER_PYTHON=/usr/local/anaconda2/bin/python
PYSPARK_SUBMIT_ARGS=--driver-memory=4g pyspark-shell
MESOS_PORT=5050
MESOS_IP=127.0.0.1
MESOS_WORKDIR=/var/lib/mesos
MESOS_HOME=/usr/local/mesos-1.2.0
MESOS_NATIVE_JAVA_LIBRARY=/usr/local/lib/libmesos.so
MESOS_MASTER=mesos://127.0.0.1:5050
PYTHONPATH=:/usr/local/spark-2.1.0/python:/usr/local/spark-2.1.0/python/lib/py4j-0.10.1-src.zip
Run Mesos (non-Spark) tests from inside the Test container:
/usr/local/mesos-1.2.0/build/src/examples/python/test-framework 127.0.0.1:5050
This produces the following log output (as expected I think):
I0417 21:28:36.912542 20 sched.cpp:232] Version: 1.2.0
I0417 21:28:36.920013 62 sched.cpp:336] New master detected at master#127.0.0.1:5050
I0417 21:28:36.920472 62 sched.cpp:352] No credentials provided. Attempting to register without authentication
I0417 21:28:36.924165 62 sched.cpp:759] Framework registered with be89e739-be8d-430e-b1e9-3fe55fa18459-0000
Registered with framework ID be89e739-be8d-430e-b1e9-3fe55fa18459-0000
Received offer be89e739-be8d-430e-b1e9-3fe55fa18459-O0 with cpus: 16.0 and mem: 119640.0
Launching task 0 using offer be89e739-be8d-430e-b1e9-3fe55fa18459-O0
Launching task 1 using offer be89e739-be8d-430e-b1e9-3fe55fa18459-O0
Launching task 2 using offer be89e739-be8d-430e-b1e9-3fe55fa18459-O0
Launching task 3 using offer be89e739-be8d-430e-b1e9-3fe55fa18459-O0
Launching task 4 using offer be89e739-be8d-430e-b1e9-3fe55fa18459-O0
Task 0 is in state TASK_RUNNING
Task 1 is in state TASK_RUNNING
Task 2 is in state TASK_RUNNING
Task 3 is in state TASK_RUNNING
Task 4 is in state TASK_RUNNING
Task 0 is in state TASK_FINISHED
Task 1 is in state TASK_FINISHED
Task 2 is in state TASK_FINISHED
Task 3 is in state TASK_FINISHED
Task 4 is in state TASK_FINISHED
All tasks done, waiting for final framework message
Received message: 'data with a \x00 byte'
Received message: 'data with a \x00 byte'
Received message: 'data with a \x00 byte'
Received message: 'data with a \x00 byte'
Received message: 'data with a \x00 byte'
All tasks done, and all messages received, exiting
Run PySpark test program from inside the Test container:
python spark_test.py
This produces the following log output:
17/04/17 21:29:18 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
I0417 21:29:19.187747 205 sched.cpp:232] Version: 1.2.0
I0417 21:29:19.196535 188 sched.cpp:336] New master detected at master#127.0.0.1:5050
I0417 21:29:19.197453 188 sched.cpp:352] No credentials provided. Attempting to register without authentication
I0417 21:29:19.201884 195 sched.cpp:759] Framework registered with be89e739-be8d-430e-b1e9-3fe55fa18459-0001
17/04/17 21:29:34 WARN scheduler.TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
I searched for this error on the internet but every page I found indicates that it is a common error caused by insufficient resources being allocated to the Mesos agent. As I mentioned, the Mesos UI indicates that there are sufficient resources. Please respond if you have any idea why my Spark job is not accepting resources from Mesos or if you have any suggestions of things I could try.
Thank you for your help.
This error is now resolved. In case anybody encounters a similar problem, I wanted to post that in my case it was caused by the HADOOP CLASSPATH not being set in the Mesos Master and Agent containers. Once set, everything works as expected.

How to know if app is in RUNNING state to kill spark-submit process?

I am creating a shell script which will be executed from Jenkins because we have many streaming jobs and it seems easier to manager from Jenkins. So I have created the below script.
#!/bin/bash
spark-submit "spark parameters here" > /dev/null 2>&1 &
processId=$!
echo $processId
sleep 5m
kill $processId
If I don't have a sleep, the spark-submit process is killed immediately and no spark application is submitted. And if there is a sleep the spark-submit process gets enough time to submit the spark application.
My question is, is there a better way to know if the spark application is in RUNNING state so that the spark-submit process can be killed ?
Spark 1.6.0 with YARN
You should spark-submit your Spark application and use yarn application -status <ApplicationId> as described in application section:
Prints the status of the application.
You could get <ApplicationId> from the logs of spark-submit (in client deploy mode) or use yarn application -list -appType SPARK -appStates RUNNING.
I don't know what Spark version you are using or if you are running in standalone mode, but anyway, you can use the REST API for submitting/killing your apps. The last time I checked it was pretty much undocumented, but it worked properly.
When you submit an application, you will get a submissionId which you can use later for either getting the current state or killing it. The possible states are documented here:
// SUBMITTED: Submitted but not yet scheduled on a worker
// RUNNING: Has been allocated to a worker to run
// FINISHED: Previously ran and exited cleanly
// RELAUNCHING: Exited non-zero or due to worker failure, but has not yet started running again
// UNKNOWN: The state of the driver is temporarily not known due to master failure recovery
// KILLED: A user manually killed this driver
// FAILED: The driver exited non-zero and was not supervised
// ERROR: Unable to run or restart due to an unrecoverable error (e.g. missing jar file)
This is specially useful for long-running apps (e.g. streaming), since you don't have to babysit the shell script.

Running Mesos via Monit

I am trying to run Mesos (without zookeeper) using monit to keep slaves running.
I use the following scripts to start and stop mesos slaves:
start-slave.sh:
#!/bin/bash
nohup /home/someuser/mesos/build/bin/mesos-slave.sh
--master=192.168.0.241:5050
--strict=false
--log_dir=/home/someuser/mesos/logs < /dev/null &
sleep 1
pidof lt-mesos-slave > /home/someuser/run/mesos-slave.pid
stop-slave.sh:
#!/bin/bash
cat /home/someuser/run/mesos-slave.pid | xargs kill -9
When I start the scripts via ssh they work great. However when I use them via monit as follows the slaves register (I can see them on the online interface) but when I try to run a computation using spark, it fails in the sense that most of the tasks are lost.
Monit setup:
check process mesos-slave with pidfile /home/someuser/run/mesos-slave.pid
start program = "/home/someuser/run/start-mesos.sh"
as uid someuser
stop program = "/home/someuser/run/stop-mesos.sh"
as uid someuser
if failed port 5051 then restart
Log exerp:
I0925 14:06:21.461169 10633 slave.cpp:2413] Executor '20140924-160043-4043352256-5050-7966-0' of framework 20140925-140255-4043352256-5050-11608-0000 has terminated with signal Killed
E0925 14:06:21.461323 10639 slave.cpp:2686] Failed to unmonitor container for executor 20140924-160043-4043352256-5050-7966-0 of framework 20140925-140255-4043352256-5050-11608-0000: Not monitored
I0925 14:06:21.462224 10633 slave.cpp:2018] Handling status update TASK_LOST (UUID: 8258a34e-7831-4e5d-ba55-6df2b905b5ba) for task 0 of framework 20140925-140255-4043352256-5050-11608-0000 from #0.0.0.0:0
Am I using Monit correctly?

Resources