Prevent GPU usage in SLURM when --gpus is not set

Prevent GPU usage in SLURM when --gpus is not set - pytorch

We're using SLURM to manage a small on-premise cluster. A key resource we are managing is GPUs. When a user requests GPUs via --gpus=2 the CUDA_VISIBLE_DEVICES environment variable is set with the GPUs SLURM allocates to the user.
$ srun --gpus=2 bash -c 'echo $CUDA_VISIBLE_DEVICES'
0,1
We have a small team and can trust our users to not abuse the system (they could easily overwrite the environment variable) so this works great. However, it's a bit too easy to bypass this accidentally because when --gpus isn't specified $CUDA_VISIBLE_DEVICES is left unset so the user can use any GPU (we're typically using PyTorch).
In other words, the following command works fine (so long as it lands on a GPU node) but I would prefer that it fails (because no GPU was requested).
srun sudo docker run -e CUDA_VISIBLE_DEVICES --runtime=nvidia pytorch/pytorch:1.1.0-cuda10.0-cudnn7.5-runtime python -c 'import torch; print(torch.tensor([1., 2.], device=torch.device("cuda:0")))'
It would fail if $CUDA_VISIBLE_DEVICES were set to -1.
$ CUDA_VISIBLE_DEVICES=-1 sudo docker run -e CUDA_VISIBLE_DEVICES --runtime=nvidia pytorch/pytorch:1.1.0-cuda10.0-cudnn7.5-runtime python -c 'import torch; print(torch.tensor([1., 2.], device=torch.device("cuda:0")))'
THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1556653099582/work/aten/src/THC/THCGeneral.cpp line=51 error=38 : no CUDA-capable device is detected
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/opt/conda/lib/python3.6/site-packages/torch/cuda/__init__.py", line 163, in _lazy_init
torch._C._cuda_init()
RuntimeError: cuda runtime error (38) : no CUDA-capable device is detected at /opt/conda/conda-bld/pytorch_1556653099582/work/aten/src/THC/THCGeneral.cpp:51
How can I configure SLURM to set CUDA_VISIBLE_DEVICES to -1 when --gpus is not specified?

You can use the TaskProlog script to set the $CUDA_VISIBLE_DEVICES variable to -1 if it was not set by Slurm.
In slurm.conf, configure TaskProlog=/path/to/prolog.sh and set the following content for prolog.sh.
#! /bin/bash
if [[ -z $CUDA_VISIBLE_DEVICES]]; then
echo export CUDA_VISIBLE_DEVICES=-1
fi
The echo export ... part will inject CUDA_VISIBLE_DEVICES=-1 in the job environment.
Make sure /path/to is visible from all compute nodes.
But this will not prevent a user from playing the system and redefining the variable from within the Python script. Really preventing access would require configuring cgroups.

Related

Getting different behavior when running bash script using SSH connection and startup-script

I am trying to run a script which is running on GCP startup-script, my startup script looks like below:
#!/bin/bash
ulimit -n 100000
source ~/miniconda3/etc/profile.d/conda.sh
conda activate base
set -e
/root/miniconda3/bin/python3.8 /root/spider/src/rotate_ip.py & /root/miniconda3/bin/python3.8 /root/spider/src/main.py
gcloud compute instances stop scheduled-spider --zone asia-northeast1-b
This script behavior is not same as when I run the program connecting with SSH. It also doesn't show any log error. But when I run the program connecting SSH, it works perfectly. The way I run the program in SSH connection:
/root/miniconda3/bin/python3.8 /root/spider/src/rotate_ip.py & /root/miniconda3/bin/python3.8 /root/spider/src/main.py
My assumption is when I run the program from startup script, it's probably not getting the same environment as in SSH connection. Though I am using the following commands in the startup script to get the same environment but it's not working the same way.
source ~/miniconda3/etc/profile.d/conda.sh
conda activate base
In ssh connection:
which bash command gives me /usr/bin/bash. I have also tried this shebang, but no result. Does anyone have any clue what else to try?
I have also tried something like below:
source ~/miniconda3/etc/profile.d/conda.sh
conda activate base
set -e
conda activate base && /root/miniconda3/bin/python3.8 /root/spider/src/rotate_ip.py & /root/miniconda3/bin/python3.8 /root/spider/src/main.py
But no output difference.

Were I doing this, I would use conda run and not muck around with manual activation.
#!/bin/bash
ulimit -n 100000
set -e
conda run -n base python /root/spider/src/rotate_ip.py &
conda run -n base python /root/spider/src/main.py
gcloud compute instances stop scheduled-spider --zone asia-northeast1-b
If it needs interaction, then you may need additional flags (see conda run --help).

How could I run Open MPI under Slurm

I am unable to run Open MPI under Slurm through a Slurm-script.
In general, I am able to obtain the hostname and run Open MPI on my machine.
$ mpirun hostname
myHost
$ cd NPB3.3-SER/ && make ua CLASS=B && mpirun -n 1 bin/ua.B.x inputua.data # Works
But if I do the same operation through the slurm-script mpirun hostname returns empty string and consequently I am unable to run mpirun -n 1 bin/ua.B.x inputua.data.
slurm-script.sh:
#!/bin/bash
#SBATCH -o slurm.out # STDOUT
#SBATCH -e slurm.err # STDERR
#SBATCH --mail-type=ALL
export LD_LIBRARY_PATH="/usr/lib/openmpi/lib"
mpirun hostname > output.txt # Returns empty
cd NPB3.3-SER/
make ua CLASS=B
mpirun --host myHost -n 1 bin/ua.B.x inputua.data
$ sbatch -N1 slurm-script.sh
Submitted batch job 1
The error I am receiving:
There are no allocated resources for the application
bin/ua.B.x
that match the requested mapping:
------------------------------------------------------------------
Verify that you have mapped the allocated resources properly using the
--host or --hostfile specification.
A daemon (pid unknown) died unexpectedly with status 1 while attempting
to launch so we are aborting.
There may be more information reported by the environment (see above).
This may be because the daemon was unable to find all the needed shared
libraries on the remote node. You may set your LD_LIBRARY_PATH to have the
location of the shared libraries on the remote nodes and this will
automatically be forwarded to the remote nodes.
------------------------------------------------------------------
mpirun noticed that the job aborted, but has no info as to the process
that caused that situation.
------------------------------------------------------------------
An ORTE daemon has unexpectedly failed after launch and before
communicating back to mpirun. This could be caused by a number
of factors, including an inability to create a connection back
to mpirun due to a lack of common network interfaces and/or no
route found between them. Please check network connectivity
(including firewalls and network routing requirements).
------------------------------------------------------------------

If Slurm and OpenMPI are recent versions, make sure that OpenMPI is compiled with Slurm support (run ompi_info | grep slurm to find out) and just run srun bin/ua.B.x inputua.data in your submission script.
Alternatively, mpirun bin/ua.B.x inputua.data should work too.
If OpenMPI is compiled without Slurm support the following should work:
srun hostname > output.txt
cd NPB3.3-SER/
make ua CLASS=B
mpirun --hostfile output.txt -n 1 bin/ua.B.x inputua.data
Make sure also that by running export LD_LIBRARY_PATH="/usr/lib/openmpi/lib" you do not overwrite other library paths that are necessary. Better would probably be export LD_LIBRARY_PATH="${LD_LIBRARY_PATH}:/usr/lib/openmpi/lib" (or a more complex version if you want to avoid a leading : if it were initially empty.)

What you need is: 1) run mpirun, 2) from slurm, 3) with --host.
To determine who is responsible for this not to work (Problem 1), you could test a few things.
Whatever you test, you should test exactly the same via command line (CLI) and via slurm (S).
It is understood that some of these tests will produce different results in cases CLI and S.
A few notes are:
1) You are not testing exactly the same things in CLI and S.
2) You say that you are "unable to run mpirun -n 1 bin/ua.B.x inputua.data", while the problem is actually with mpirun --host myHost -n 1 bin/ua.B.x inputua.data.
3) The fact that mpirun hostname > output.txt returns an empty file (Problem 2) does not necessarily have the same origin as your main problem, see paragraph above. You can overcome this problem by using scontrol show hostnames
or with the environment variable SLURM_NODELIST (on which scontrol show hostnames is based), but this will not solve Problem 1.
To work around Problem 2, which is not the most important, try a few things via both CLI and S.
The slurm script below may be helpful.
#SBATCH -o slurm_hostname.out # STDOUT
#SBATCH -e slurm_hostname.err # STDERR
export LD_LIBRARY_PATH="${LD_LIBRARY_PATH:+${LD_LIBRARY_PATH}:}/usr/lib64/openmpi/lib"
mpirun hostname > hostname_mpirun.txt # 1. Returns values ok for me
hostname > hostname.txt # 2. Returns values ok for me
hostname -s > hostname_slurmcontrol.txt # 3. Returns values ok for me
scontrol show hostnames > hostname_scontrol.txt # 4. Returns values ok for me
echo ${SLURM_NODELIST} > hostname_slurmcontrol.txt # 5. Returns values ok for me
(for an explanation of the export command see this).
From what you say, I understand 2, 3, 4 and 5 work ok for you, and 1 does not.
So you could now use mpirun with suitable options --host or --hostfile.
Note the different format of the output of scontrol show hostnames (e.g., for me cnode17<newline>cnode18) and echo ${SLURM_NODELIST} (cnode[17-18]).
The host names could perhaps also be obtained in file names set dynamically with %h and %n in slurm.conf, look for e.g. SlurmdLogFile, SlurmdPidFile.
To diagnose/work around/solve Problem 1, try mpirun with/without --host, in CLI and S.
From what you say, assuming you used the correct syntax in each case, this is the outcome:
mpirun, CLI (original post).
"Works".
mpirun, S (comment?).
Same error as item 4 below?
Note that mpirun hostname in S should have produced similar output in your slurm.err.
mpirun --host, CLI (comment).
Error
There are no allocated resources for the application bin/ua.B.x that match the requested mapping:
...
This may be because the daemon was unable to find all the needed shared
libraries on the remote node. You may set your LD_LIBRARY_PATH to have the
location of the shared libraries on the remote nodes and this will
automatically be forwarded to the remote nodes.
mpirun --host, S (original post).
Error (same as item 3 above?)
There are no allocated resources for the application
bin/ua.B.x
that match the requested mapping:
------------------------------------------------------------------
Verify that you have mapped the allocated resources properly using the
--host or --hostfile specification.
...
This may be because the daemon was unable to find all the needed shared
libraries on the remote node. You may set your LD_LIBRARY_PATH to have the
location of the shared libraries on the remote nodes and this will
automatically be forwarded to the remote nodes.
As per comments, you may have a wrong LD_LIBRARY_PATH path set.
You may also need to use mpi --prefix ...
Related?
https://github.com/easybuilders/easybuild-easyconfigs/issues/204

systemd-run does not set environment variables when using --setenv

According to the systemd-run documentation, the -setenv option can be used to "Run the service process with the specified environment variables set".
However, it seems like the environment variable is actually not available to the process:
# systemd-run -t --setenv=TEST=Success echo TEST:$TEST
Running as unit run-20705.service.
Press ^] three times within 1s to disconnect TTY.
TEST:
Am I misunderstanding the usage of the --setenv option? Running systemd version 219.

You need to prevent bash from resolving $TEST before the systemd command is run.
Also echo is incapable of resolving environmental variables. Bash is needed within the systemd process to resolve TEST
So you need to run the following:
systemd-run -t --setenv=TEST=Success 'bash -c echo TEST:$TEST'

How to run command during Docker build which requires a tty?

I have some script I need to run during a Docker build which requires a tty (which Docker does not provide during a build). Under the hood the script uses the read command. With a tty, I can do things like (echo yes; echo no) | myscript.sh.
Without it I get strange errors I don't completely understand. So is there any way to use this script during the build (given that its not mine to modify?)
EDIT: Here's a more definite example of the error:
FROM ubuntu:14.04
RUN echo yes | read
which fails with:
Step 0 : FROM ubuntu:14.04
---> 826544226fdc
Step 1 : RUN echo yes | read
---> Running in 4d49fd03b38b
/bin/sh: 1: read: arg count
The command '/bin/sh -c echo yes | read' returned a non-zero code: 2

RUN <command> in Dockerfile reference:
shell form, the command is run in a shell, which by default is /bin/sh -c on Linux or cmd /S /C on Windows
let's see what exactly /bin/sh is in ubuntu:14.04:
$ docker run -it --rm ubuntu:14.04 bash
root#7bdcaf403396:/# ls -n /bin/sh
lrwxrwxrwx 1 0 0 4 Feb 19 2014 /bin/sh -> dash
/bin/sh is a symbolic link of dash, see read function in dash:
$ man dash
...
read [-p prompt] [-r] variable [...]
The prompt is printed if the -p option is specified and the standard input is a terminal. Then a line
is read from the standard input. The trailing newline is deleted from the line and the line is split as
described in the section on word splitting above, and the pieces are assigned to the variables in order.
At least one variable must be specified. If there are more pieces than variables, the remaining pieces
(along with the characters in IFS that separated them) are assigned to the last variable. If there are
more variables than pieces, the remaining variables are assigned the null string. The read builtin will
indicate success unless EOF is encountered on input, in which case failure is returned.
By default, unless the -r option is specified, the backslash ``\'' acts as an escape character, causing
the following character to be treated literally. If a backslash is followed by a newline, the backslash
and the newline will be deleted.
...
read function in dash:
At least one variable must be specified.
let's see read function in bash:
$ man bash
...
read [-ers] [-a aname] [-d delim] [-i text] [-n nchars] [-N nchars] [-p prompt] [-t timeout] [-u fd] [name...]
If no names are supplied, the line read is assigned to the variable REPLY. The return code is zero,
unless end-of-file is encountered, read times out (in which case the return code is greater than
128), or an invalid file descriptor is supplied as the argument to -u.
...
So I guess your script myscript.sh is start with #!/bin/bash or something else but not /bin/sh.
Also, you can change your Dockerfile like below:
FROM ubuntu:14.04
RUN echo yes | read ENV_NAME
Links:
https://docs.docker.com/engine/reference/builder/
http://linux.die.net/man/1/dash
http://linux.die.net/man/1/bash

Short answer : You can't do it straightly because docker build or either buildx didn't implement [/dev/tty, /dev/console]. But there is a hacky solution where you can achieve what you need but I highly discourage using it since it break the concept of CI. That's why docker didn't implement it.
Hacky solution
FROM ubuntu:14.04
RUN echo yes | read #tty requirement command
As mentioned in docker reference document the RUN consist of two stage, first is execution of command and the second is commit to the image as a new layer. So you can do the stages manually on your own where we will provide tty to first stage(execution) and then commit the result.
Code:
cd
cat >> tty_wrapper.sh << EOF
echo yes | read ## Your command which needs tty
rm /home/tty_wrapper.sh
EOF
docker run --interactive --tty --detach --privileged --name name1 ubuntu:14.04
docker cp tty_wrapper.sh name1:/home/
docker exec name1 bash -c "cd /home && chmod +x tty_wrapper.sh && ./tty_wrapper.sh "
docker commit name1 your:tag
Your new image is ready.
Here is a description about the code.
At first we make a bash script which wrap our tty to it and then remove itself after fist execute. Then we run a container with provided tty option(you can remove privileged if you don't need). Next step we copy wrapped bash script inside container and do the execution & commit stage on our own.

You don't need a tty for feeding your data to your script . just doing something like (echo yes; echo no) | myscript.sh as you suggested will do. also please make sure you copy your file first before trying to execute it . something like COPY myscript.sh myscript.sh

Most likely you don't need a tty. As the comment on the question shows, even the example provided is a situation where the read command was not properly called. A tty would turn the build into an interactive terminal process, which doesn't translate well to automated builds that may be run from tools without terminals.
If you need a tty, then there's the C library call to openpty that you would use when forking a process that includes a pseudo tty. You may be able to solve your problem with a tool like expect, but it's been so long that I don't remember if it creates a ptty or not. Alternatively, if your application can't be built automatically, you can manually perform the steps in a running container, and then docker commit the resulting container to make an image.
I'd recommend against any of those and to work out the procedure to build your application and install it in a non-interactive fashion. Depending on the application, it may be easier to modify the installer itself.

How to run a MPI task?

I am newbie in Linux and recently started working with our university super-computer and I need to install my program ( GAMESS Quantum Chemistry Software ) on my own allocated space. I have installed and ran it successfully under 'sockets' but actually I need to run it under 'mpi' ( otherwise there will be little advantage of using a super-computer ).
System Setting:
OS: Linux64 , Redhat, intel
MPI: impi
compiler: ifort
modules: slurm , intel/intel-15.0.1 , intel/impi-15.0.1
This software runs ' rungms ' and receives arguments as:
rungms [fileName][Version][CPU count ] ( for example: ./rungms Opt 00 4 )
Here is my bash file ( my feeling is this is the main culprit for my problem !):
#!/bin/bash
#Based off of Monte's Original Script for Torque:
#https://gist.github.com/mlunacek/6306340#file-matlab_example-pbs
#These are SBATCH directives specifying name of file, queue, the
#Quality of Service, wall time, Node Count, #of CPUS, and the
#destination output file (which appends node hostname and JobID)
#SBATCH -J OptMPI
#SBATCH --qos janus-debug
#SBATCH -t 00-00:10:00
#SBATCH -N2
#SBATCH --ntasks-per-node=1
#SBATCH -o output-OptMPI-%N-JobID-%j
#NOTE: This Module Will Be Replaced With Slurm Specific:
module load intel/impi-15.0.1
mpirun /projects/augenda/gamess/rungms Opt 00 2 > OptMPI.out
As I said before, the program is compiled for mpi ( and not 'sockets' ) .
My problem is when I run run sbatch Opt.sh , I receive this error:
srun: error: PMK_KVS_Barrier duplicate request from task 1
when I change -N number , sometimes I receive error saying (4 !=2
).
with odd number of -N I receive error saying it expects even number of processes.
What am I missing ?
Here is the code from our super-computer website as a bash file example

The Slurm Workload Manager has a few ways of invoking an Intel MPI process. Likely, all you have to do is use srun rather than mpirun in your case. If errors are still present, refer here for alternative ways to invoke Intel MPI jobs; it's rather dependent on how the HPC admins configured the system.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string