Handling SLURM .out output - slurm

I am using sbatch to run scripts, and I want the output text to be written in a file from a certain point, i.e. I want to echo some text so the user can see, but after a certain command I want all output to be written in a file. Is there a way to do it?
If not, how can I disable entirely the output logging?
EDIT: Example:
#!/bin/bash
#SBATCH --partition analysis
#SBATCH --nodes 1
#SBATCH --ntasks-per-node 1
#SBATCH --exclusive
#SBATCH --time 14-0
#SBATCH -c1
#SBATCH --mem=400M
#SBATCH --job-name jupyter
module load jupyter
## get tunneling info
XDG_RUNTIME_DIR=""
ipnip=$(hostname -i)
echo "
Copy/Paste this in your local terminal to ssh tunnel with remote
-----------------------------------------------------------------
ssh -N -L 7905:$ipnip:7905 USER#HOST
-----------------------------------------------------------------
"
##UP UNTIL HERE ECHO TO TERMINAL
##FROM NOW ON, ECHO TO A FILE
## start an ipcluster instance and launch jupyter server
jupyter-notebook --no-browser --port=7905 --ip=$ipnip

As per my comment above, it's not possible to write to terminal with an sbatch submitted job.
You can do that with srun in the following way:
#!/bin/bash
srun --partition analysis --nodes 1 --ntasks-per-node 1 --exclusive --time 14-0 -c1 --mem=400M --job-name jupyter wrapper.sh
wrapper.sh:
#!/bin/bash
module load jupyter
## get tunneling info
XDG_RUNTIME_DIR=""
ipnip=$(hostname -i)
echo "
Copy/Paste this in your local terminal to ssh tunnel with remote
-----------------------------------------------------------------
ssh -N -L 7905:$ipnip:7905 USER#HOST
-----------------------------------------------------------------
"
##UP UNTIL HERE ECHO TO TERMINAL
##FROM NOW ON, ECHO TO A FILE
exec > $SLURM_JOBID.out 2>&1
## start an ipcluster instance and launch jupyter server
jupyter-notebook --no-browser --port=7905 --ip=$ipnip

Related

linux slurm - separate .out files for tasks run in paralell on 1 node

I am running jobs in parallel on linux using slurm by requesting a node and running one task per cpu.
However, the output as specified joins both streams into the single out file. I tried the %t flag on the epxectation it would separate the tasks, but it just logs everything in the output file with _0 appended (e.g. sample_output__XXX_XX_0.out).
Any advice on how to best generate a separate .out log per task would be much appreciated
#!/bin/bash
#SBATCH --job-name=recon_all_06172021_1829
#SBATCH --output=/path/recon_all_06172021_1829_%A_%a_%t.out
#SBATCH --error=/path/recon_all_06172021_1829_%A_%a.err
#SBATCH --ntasks=2
#SBATCH --ntasks-per-node=2
#SBATCH --nodes=1
#SBATCH --cpus-per-task=1
#SBATCH --time=23:59:00
#! Always keep the following echo commands to monitor CPU, memory usage
echo "SLURM_MEM_PER_CPU: $SLURM_MEM_PER_CPU"
echo "SLURM_MEM_PER_NODE: $SLURM_MEM_PER_NODE"
echo "SLURM_JOB_NUM_NODES: $SLURM_JOB_NUM_NODES"
echo "SLURM_NNODES: $SLURM_NNODES"
echo "SLURM_NTASKS: $SLURM_NTASKS"
echo "SLURM_CPUS_PER_TASK: $SLURM_CPUS_PER_TASK"
echo "SLURM_JOB_CPUS_PER_NODE: $SLURM_JOB_CPUS_PER_NODE"
command 1 &
command 2
wait
You can redirect the standard output from the command itself, for example:
command 1 > file1 2>&1
command 2 > file2 2>&1
Not as neat as using the sbatch filename patterns, but it will separate the output from each command.

slurm/sbatch doesn't work when option `-o` is specified

I'm trying to run the following script with sbatch on our cluster.
#!/bin/bash
#SBATCH -o /SCRATCH-BIRD/users/lindenbaum-p/work/NEXTFLOW/work/chunkaa/work/a4/6d0605f453add1d97d609839cfd318/command.log
#SBATCH --no-requeue
#SBATCH --partition=Bird
set -e
echo "Hello" 1>&2
sbatch displays a job-id on stdout, there is nothing listed in squeue and it looks like nothing was written/executed.
If the line #SBATCH -o /SCRATCH-BIRD/users/... is removed , then the script works.
the directory exists
$ test -w /SCRATCH-BIRD/users/lindenbaum-p/work/NEXTFLOW/work/chunkaa/work/a4/6d0605f453add1d97d609839cfd318/ && echo OK
OK
could it be a problem with the filesystem ? how can I test this ?
OK, got it the partition is visible from the login node but not from the cluster nodes.

I got no output using echo $SLURM_NTASKS

I create this batch file myfirst_slurm_job.sh that contain the following code:
#!/bin/bash
#SBATCH --output="slurm1.txt"
cd $HOME/..
echo $PWD
echo $SLURMD_NODENAME
echo $SLURM_NTASKS
and then I run this command line:
sbatch myfirst_slurm_job.sh
note: it's my first post
You need to specify the --ntasks/-n flag;
#SBATCH -n 1
else SLURM won't bother to define this variable for you.

Wrapper, This does not look like a batch script

I want to pass arguments in a job. I write a wrapper in the file job.sh but when i use it i get a error:
sbatch: error: This does not look like a batch script. The first
sbatch: error: line must start with #! followed by the path to an interpreter.
sbatch: error: For instance: #!/bin/sh
main.sh
DATE=`date '+%Y-%m-%d %H:%M:%S'`
name='test__'$DATE
./job.sh $name
job.sh
#!/bin/bash
sbatch << EOT
#!/bin/sh
#SBATCH --job-name=$1
#SBATCH --nodes=1 # number of nodes
#SBATCH --ntasks-per-node=1 # number of cores
#module load R
Rscript script.R $1
EOT
script.R
args <- commandArgs(trailingOnly=TRUE)
print((args[[1]]))
(the three files are in the same directory)

set values ​for job variables from another file

I wanted to indicate the name and other values ​​of some variables of a job from another file but I get an error.
sbatch: error: Unable to open file 10:12:35
file.sh
#!/bin/bash
DATE=`date '+%Y-%m-%d %H:%M:%S'`
name='test__'$DATE
sbatch -J $name -o $name'.out' -e $name'.err' job.sh
job.sh
#!/bin/bash
#SBATCH --job-name=test
#SBATCH --nodes=1 # number of nodes
#SBATCH --ntasks-per-node=2 # number of cores
#SBATCH --output=.out
#SBATCH --error=.err
#module load R
Rscript script.R
script.R
for(i in 1:1e6){print(i)}
You are wrongly quoting the variables and the space requested in the date is creating two arguments to sbatch, hence he is complaining about that wrong parameter.
If I were you, I would avoid the space (as a general rule, cause it is more error prone and always requires quoting):
file.sh:
#!/bin/bash
DATE=$(date '+%Y-%m-%dT%H:%M:%S')
name="test__$DATE"
sbatch -J "$name" -o "${name}.out" -e "${name}.err" job.sh

Resources