I'm configuring a small cluster, controlled by SLURM.
This cluster has one master node and two partitions.
Users submit their jobs from worker nodes, I've restricted their access to the master node.
Each partition in the cluster is dedicated to a team in our company.
I'd like that members of different teams submit their jobs to different partitions without bothering with additional command line switches.
That is, I'd like default partition for srun or sbatch to be different depending on the node, running these commands.
For example: all jobs, submitted from the host worker1 should go to the partition1,
and all jobs, submitted from the hosts worker[2-4] should go to the partition2.
And all invocations of sbatch or srun should not contain -p (or --partition) switch.
I've tried setting default=YES on different lines in slurm.conf files on different computers, but this did not help.
This can be solved using SLURM_PARTITION and SBATCH_PARTITION environment variables, put in the /etc/environment file.
Details on environment variables are in manual pages for sbatch and srun
Related
I'm trying to install slurm on Virtualbox running Ubuntu. We're using it to run long-running jobs via a web interface and we use slurm to queue and run the jobs. I'm using VirtualBox to create a sandbox for development.
I've set slurm up, but when I queue a job and run squeue I get:
$ squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
2 debug test.sh pchandle PD 0:00 1 (Nodes required for job are DOWN, DRAINED or reserved for jobs in higher priority partitions)
When I run it on my actual hardware, the jobs run successfully.
The output of sinfo is:
$ sinfo
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
debug* up infinite 0 n/a
Yes, it says nodes are 0, but the output is the same on my actual hardware, and jobs run fine. Any suggestions on why it's saying 0 nodes?
Is this an issue with my setup, or is simply not possible to run slurm on VirtualBox due to the hardware limitations? I'm running 4 CPUs. The only obvious difference I can see is that threads per core is only 1 (there are 2 on my local hardware).
Is there anyway to debug why the nodes aren't running jobs? Or why there are no nodes available?
It turned out to be a configuration error.
In the config file /etc/slurm-llnl/slurm.conf, I'd left the configuration NodeName as the default NodeName=localhost[0-31]. Since I am running on a single host it should have been set to NodeName=localhost for a single node on the same machine.
Slurm Single Instance had a description of what the values should be set to, which helped me find the answer.
Install Slurm on a stand alone Ubuntu had the instructions I originally followed.
I'm an intern that's been tasked with installing slurm across three compute units running ubuntu. How things work now is people ssh into one of the compute units and run a job on there, since all three units share memory through nfs mounting. Otherwise they are separate machines though.
My issue is that from what I've read in the documentation, it seems like when installing slurm I would specify each of these compute units as a completely separate node, and any jobs I'd like to run that use multiple cores would still be limited by how many cores are available on the individual node. My supervisor has told me however that the three units should be installed as a single node, and when a job comes in that needs more cores than available on a single compute unit, slurm should just use all the cores. The intention is that we won't be changing how we execute jobs (like a parallelized R script), just "wrapping" them in a sbatch script before sending them to slurm for scheduling and execution.
So is my supervisor correct in that slurm can be used to run our parallelized scripts unchanged with more cores than available on a single machine?
Running a script on more cores than available is nonsense. It does not provide any performance increase, rather the oposite, as more threads have to be managed but the computing power is the same.
But he is right in the sense that you can wrap your current script and send it to SLURM for execution using the whole node. But the three machines will be three nodes. They cannot work as a single node because, well, they are not a single node/machine. They do not share either memory nor busses, nor peripherals... they just share some disk thru the network.
You say that
any jobs I'd like to run that use multiple cores would still be limited by how many cores are available on the individual node
but that's the current situation with SSH. Nothing is lost by using SLURM to manage the resources. In fact, SLURM will take care of giving each job the proper resources and avoiding other users interfering with your computations.
Your best bet: create a cluster of three nodes as usual and let people send their jobs asking for as many resources they need without exceeding the available resources.
Q1: Whats the difference between
concurrent = 3
[[runners]]
..
executor = "shell"
and
concurrent = 3
[[runners]]
...
executor = "shell"
[[runners]]
...
executor = "shell"
[[runners]]
...
executor = "shell"
Q2: Does it makes sense, to...
have 3 executors (workers) of same type on a single runner with global concurrent = 3? Or can single executor with global concurrent = 3 do multiple jobs in parallel safely?
Q3: How they're related...
runners.limit with runners.request_concurrency and concurrent
Thanks
Gitlab's documentation on runners describes them as:
(...) isolated (virtual) machines that pick up jobs through the coordinator API of GitLab CI
Therefore, each runner is an isolated process responsible for picking up requests for job executions and for dealing with them according to pre-defined configurations. As an isolated process, each runner have the capability of creating 'sub-processes' (also called machines) in order to run jobs.
When you define in your config.toml a [[runner]] section, you're configuring a runner and setting how it should deal with job execution requests.
In your questions, you mentioned two of those "how to deal with job execution request"' settings:
limit: "Limit how many jobs can be handled concurrently". In other words, how many 'sub-processes' can be created by a runner in order to execute jobs simultaneously;
request_concurrency: "Limit number of concurrent requests for new jobs from GitLab". In other words, how many job execution requests can a runner take from GitLab CI job queue simultaneously.
Also, there are some settings that apply to a machine globally. In your question you mentioned one of them:
concurrent: "Limit how many jobs globally can be run concurrently. This is the most upper limit of number of jobs using all defined runners". In other words, it limits the maximum amount of 'sub-processes' that can run jobs simultaneously.
Thus, keeping in mind the difference between a runner its sub-processes and also the difference between specific runner settings and global machine settings:
Q1:
The difference is that in your 1st example you have one runner and in your 2nd example you have three runners. It's worth mentioning that in both examples your machine would only allow running 3 jobs simultaneously.
Q2:
Not only a single runner can run multiple jobs concurrently safely but also is possible to control how many jobs you want it to handle (using the aforementioned limit setting).
Also, there is no problem to have similar runners running in the same machine. How you're going to define your runner's configurations is up to you and your infrastructure capabilities.
Also, please notice that an executor only defines how to run your job. It isn't the only thing that defines a runner and it isn't a synonymous for "worker". The ones working are your runners and their sub-processes.
Q3:
To summarize: You can define one or many workers at the same machine. Each one is an isolated process. A runner's limit is how many sub-processes of a runner process can be created to run jobs concurrently. A runner's request_concurrency is how many requests can a runner handle from the Gitlab CI job queue. Finally, setting a value to concurrent will limit how many jobs can be executed at your machine at the same time in the one or more runners running in the machine.
References
For better understanding, I really recommend you read about Autoscaling algorithm and parameters.
Finally, I think you might find this question on how to run runners in parallel on the same server useful.
I've just finished the SLURM installation in our cluster. How do I run programs in node2 (slave) from node1 (head/controller node)? Is there a special configuration for this kind of setup?
Make sure that UIDs are uniform across the cluster nodes, and that a common filesystem (for instance NFS) is mounted on all nodes.
Then you run programs typically by submitting jobs with the sbatch command.
You can refer to this set of slides for a quick start.
In Spark 1.0.0 Standalone mode with multiple worker nodes, I'm trying to run a Spark shell from two different computers (same Linux user).
In the documentation, it says "By default, applications submitted to the standalone mode cluster will run in FIFO (first-in-first-out) order, and each application will try to use all available nodes."
The number of cores per worker is set to 4 with 8 being available (via SPARK_JAVA_OPTS="-Dspark.cores.max=4"). Memory is also limited such that enough should be available for both.
However, when looking at the Spark Master WebUI, the shell application that was started later will always remain in state "WAITING" until the first one is exited. The number of cores assigned to it is 0, the Memory per node 10G (same as the one that is already running)
Is there a way to have both shells running at the same time without using Mesos?
Before a shell will start processing on a spark standalone cluster, there has to be sufficient cores and memory. You must specify from each spark shell the number of cores you want, or it will use them all. If you specify 5 cores, with executor memory=10G (the amount of memory you allocated for the executors), and the second spark shell to run with 2 cores, and 10G of memory, the second one will still not start, because the first shell is using both executors, and is using all of the memory on both. If you specify 5G of executor memory for each spark shell, then they can concurrently run.
Essentially you want to have multiple jobs running on a standalone cluster -- unfortunately, it is really not designed to handle this case well. If you want to do that you should use either mesos or yarn.
One workaround to this is to restrict the number of cores per spark shell using total-executor-cores. For example to restrict it to 16 cores, launch it like this:
bin/spark-shell --total-executor-cores 16 --master spark://$MASTER:7077
In this case each shell will use only 16 cores, so you can have two shells running on your 32 cores cluster. They can then run simultaneously but never use more than 16 cores each :(
This solution is far from ideal, I know. You depend on users to restrict themselves, to shut down their shells, and resources are wasted when a user is not running code. I have created a request to fix this on JIRA, which you can vote for.
The application ends when your shell dies. So, you cannot run concurrently two spark-shells on two laptops. What you can do is launch one spark-shell, launch the other, and have the second start when the first one dies.
Contrarily to spark-shell, spark-submit does terminate once computation is over. So you can spark-submit one app, launch a spark-shell, and have the shell take over the moment the application is done.
Or you can run two apps sequentially (one after the other) with two spark-submit launches.