How to make SLURM use gres.conf - slurm

I distribute jobs using SLURM, and I have a generic resource called "cards". In slurm.conf there is a line:
GresTypes=cards
I do not include this resource in the node configuration lines. Instead, I try and configure it in gres.conf:
NodeName=mynode-01 Name=cards Count=2
Unfortunately, scontrol show node mynode-01 shows Gres=(null).
Both slurm.conf and gres.conf are accessible to all the nodes. I tried to run scontrol reconfigure and to restart SLURM daemon - it doesn't help.

I do not include this resource in the node configuration lines.
Gres must be defined both in slurm.conf (in the node definition lines) and gres.conf to work properly.
The Slurm controller relies on the information in slurm.conf while the Slurm daemon on the compute nodes reads the information in gres.conf. The info should match.

Related

Data Locality in Spark on Kubernetes colocated with HDFS pods

Revisiting the data locality for Spark on Kubernetes question: if the Spark pods are colocated on the same nodes as the HDFS data node pods then does data locality work ?
The Q&A session here: https://www.youtube.com/watch?v=5-4X3HylQQo seems to suggest it doesn't.
Locality is an issue Spark on Kubernetes. Basic Data locality does work if the Kubernetes provider provides a network topology plugins that are required to resolve where the data is and where the spark nodes should be run. and you have built kubernetes to include the code here
There is a method to test this data locality. I have copied it here for completeness:
Here's how one can check if data locality in the namenode works.
Launch a HDFS client pod and go inside the pod.
$ kubectl run -i --tty hadoop --image=uhopper/hadoop:2.7.2
--generator="run-pod/v1" --command -- /bin/bash
Inside the pod, create a simple text file on HDFS.
$ hadoop fs
-fs hdfs://hdfs-namenode-0.hdfs-namenode.default.svc.cluster.local
-cp file:/etc/hosts /hosts
Set the number of replicas for the file to the number of your cluster nodes. This ensures that there will be a copy of the file in the cluster node that your client pod is running on. Wait some time until this happens.
`$ hadoop fs -setrep NUM-REPLICAS /hosts`
Run the following hdfs cat command. From the debug messages, see which datanode is being used. Make sure it is your local datanode. (You can get this from $ kubectl get pods hadoop -o json | grep hostIP. Do this outside the pod)
$ hadoop --loglevel DEBUG fs
-fs hdfs://hdfs-namenode-0.hdfs-namenode.default.svc.cluster.local
-cat /hosts ... 17/04/24 20:51:28 DEBUG hdfs.DFSClient: Connecting to datanode 10.128.0.4:50010 ...
If no, you should check if your local datanode is even in the list from the debug messsages above. If it is not, then this is because step (3) did not finish yet. Wait more. (You can use a smaller cluster for this test if that is possible)
`17/04/24 20:51:28 DEBUG hdfs.DFSClient: newInfo = LocatedBlocks{ fileLength=199 underConstruction=false blocks=[LocatedBlock{BP-347555225-10.128.0.2-1493066928989:blk_1073741825_1001; getBlockSize()=199; corrupt=false; offset=0; locs=[DatanodeInfoWithStorage[10.128.0.4:50010,DS-d2de9d29-6962-4435-a4b4-aadf4ea67e46,DISK], DatanodeInfoWithStorage[10.128.0.3:50010,DS-0728ffcf-f400-4919-86bf-af0f9af36685,DISK], DatanodeInfoWithStorage[10.128.0.2:50010,DS-3a881114-af08-47de-89cf-37dec051c5c2,DISK]]}] lastLocatedBlock=LocatedBlock{BP-347555225-10.128.0.2-1493066928989:blk_1073741825_1001;`
Repeat the hdfs cat command multiple times. Check if the same datanode is being consistently used.

Starting multiple workers on a master node in Standalone mode

I have a machine with 80 cores. I'd like to start a Spark server in standalone mode on this machine with 8 executors, each with 10 cores. But, when I try to start my second worker on the master, I get an error.
$ ./sbin/start-master.sh
Starting org.apache.spark.deploy.master.Master, logging to ...
$ ./sbin/start-slave.sh spark://localhost:7077 -c 10
Starting org.apache.spark.deploy.worker.Worker, logging to ...
$ ./sbin/start-slave.sh spark://localhost:7077 -c 10
org.apache.spark.deploy.worker.Worker running as process 64606. Stop it first.
In the documentation, it clearly states "you can start one or more workers and connect them to the master via: ./sbin/start-slave.sh <master-spark-URL>". So why can't I do that?
A way to get the same parallelism is to start many workers.
You can do this by adding to the ./conf/spark-env.sh file:
SPARK_WORKER_INSTANCES=8
SPARK_WORKER_CORES=10
SPARK_EXECUTOR_CORES=10
In a single machine, it is quite complicated but you can try docker or Kubernetes.
Create multiple docker containers for spark workers.
Just specify a new identity for every new worker/master and then launch the start-worker.sh
export SPARK_IDENT_STRING=worker2
./spark-node2/sbin/start-worker.sh spark://DESKTOP-HSK5ETQ.localdomain:7077
thanks to https://stackoverflow.com/a/46205968/1743724

spark worker not connecting to master

I want to create a spark standalone cluster. I am able to run master and slave on same node, but the slave on different node is neither showing master-URL nor connecting to master.
I am running command:
start-slave.sh spark://spark-server:7077
where spark-server is the hostname of my master.
I am able to ping master from worker, but the WebUI of master isn't showing any worker except that running on same machine. The client node is running a worker but it is independent and not connected to the master.
Please check configuration file "spark-env.sh" on your master node. Have you set the SPARK_MASTER_HOST variable to the IP address of the master node? If not try to set it and restart the master and worker nodes. For example, if your master node's IP is 192.168.0.1, you should have SPARK_MASTER_HOST=192.168.0.1 in there. Note that you don't need to set this variable on your worker nodes.
1) Make sure you set a password less SSH between nodes
Please refer the below link to setup a password less ssh between nodes
http://www.tecmint.com/ssh-passwordless-login-using-ssh-keygen-in-5-easy-steps/
2) Specify the slaves IP Address in slaves file present in $SPARK_HOME/conf directory
[This is the spark folder containing conf directory] on Master node
3) Once you specify the IP Address in slaves file start the spark cluster
[Execute the start-all.sh script present in $SPARK_HOME/sbin directory] on Master Node
Hope this Helps
If you are able to ping the master node from Worker means it has the network connectivity .The new worker node needs to be added in Spark master you need to update few things spark-env.sh
Please check the official document Spark CLuster launch
and update the reuired fileds .
Here is another blog which can help you Spark Cluster modeBlog
This solved my problem:
The idea is to use loopback address when both client and server are on the same machine.
Steps:
go to the conf folder in your spark-hadoop directory, and check if spark-env.sh is present if not then copy of spark-env.sh.template and name as spark-env.sh, then add SPARK_MASTER_HOST=127.0.0.1
then run the command to start the master from the directory (not conf folder)
./sbin/start-master.sh (this will start the master, view it in localhost:8080)
bin/spark-class org.apache.spark.deploy.worker.Worker spark://127.0.0.1:7077 (this will start the worker and you can see it listed under the worker tab in the same web UI i.e, localhost:8080)
you can add multiple workers with the above command
This worked for me, hopefully, this will work for you too.

Docker Container with Apache Spark in standalone cluster mode

I am trying to construct a docker image containing Apache Spark. IT is built upon the openjdk-8-jre official image.
The goal is to execute Spark in cluster mode, thus having at least one master (started via sbin/start-master.sh) and one or more slaves (sbin/start-slave.sh). See spark-standalone-docker for my Dockerfile and entrypoint script.
The build itself actually goes through, the problem is that when I want to run the container, it starts and stops shortly after. The cause is that Spark master launch script starts the master in daemon mode and exits. Thus the container terminates, as there is no process running in the foreground anymore.
The obvious solution is to run the Spark master process in foreground, but I could not figure out how (Google did not turn up anything either). My "workaround-solution" is to run tails -f on the Spark log directory.
Thus, my questions are:
How can you run Apache Spark Master in foreground?
If the first is not possible / feasible / whatever, what is the preferred (i.e. best practice) solution to keeping a container "alive" (I really don't want to use an infinite loop and a sleep command)?
UPDATED ANSWER (for spark 2.4.0):
To start spark master on foreground, just set the ENV variable
SPARK_NO_DAEMONIZE=true on your environment before running ./start-master.sh
and you are good to go.
for more info, check $SPARK_HOME/sbin/spark-daemon.sh
# Runs a Spark command as a daemon.
#
# Environment Variables
#
# SPARK_CONF_DIR Alternate conf dir. Default is ${SPARK_HOME}/conf.
# SPARK_LOG_DIR Where log files are stored. ${SPARK_HOME}/logs by default.
# SPARK_MASTER host:path where spark code should be rsync'd from
# SPARK_PID_DIR The pid files are stored. /tmp by default.
# SPARK_IDENT_STRING A string representing this instance of spark. $USER by default
# SPARK_NICENESS The scheduling priority for daemons. Defaults to 0.
# SPARK_NO_DAEMONIZE If set, will run the proposed command in the foreground. It will not output a PID file.
##
How can you run Apache Spark Master in foreground?
You can use spark-class with Master.
bin/spark-class org.apache.spark.deploy.master.Master
and the same thing for workers:
bin/spark-class org.apache.spark.deploy.worker.Worker $MASTER_URL
If you're looking for production ready solution you should consider using a proper supervisor like dumb-init or tini.

what happens when I kill a yarn's node manager

Suppose there are 10 containers running on this machine(5 is mapreduce tasks, and 5 is spark on yarn executors).
And if I kill the node manager, what happens for these 10 containers process?
Before I restart the node manager ,what should I do first?
Killing nodemanager will only affect the containers of this particular node. All the running containers will get lost on restart/kill. They will get relaunched once the node comes up or the nodemanager process get start(if application/job still running).
NOTE: Jobs ApplicationMaster should not be running on this slave.
what happens when the node with the ApplicationMaster dies ?
In this case the yarn launchs a new ApplicationMaster on some other node. All the containers relaunched again in this case.
Answering according to hadoop 2.7.x dist: check this article: http://hortonworks.com/blog/resilience-of-yarn-applications-across-nodemanager-restarts/
If you don't have yarn.nodemanager.recovery.enabled set to true then you're container will be KILLED (spark or mapreduce or anything else) However you're job will most likely continue to run.
You need to check this property in your env using hadoop conf | grep yarn.nodemanager.recovery.dir . If it false which is for me by default then nothing you can do to prevent getting those container killed upon restart imo. However you can try to modify the flag and set other required property for future cases if you want containers to be recovered.
Look at this one too: http://www.cloudera.com/documentation/enterprise/5-4-x/topics/admin_ha_yarn_work_preserving_recovery.html

Resources