"rejoin" a bash SLURM job - slurm

Currently, I can use srun [variety of settings] bash to create a shell on a compute note. However, if my ssh disconnects for whatever reason and I want to re-access the shell, how can I do that?

Get your job ID
squeue -u $USERNAME
Get your NodeID
scontrol show job $JOBID | grep NodeList
Attach (you can assign step = 0)
sattach $JOBID.0
This did not work for me, but in theory this should work:
srun --pty --jobid $JOBID -w $NODEID /bin/bash
Source

Assuming the SSH connection from your laptop to the login node of the cluster is unstable, you can use a terminal multiplexer such as screen or tmux, depending on what is already installed on the login node.
Typically, a session would look like this
[you#yourlaptop ~]$ ssh cluster-frontend
[you#cluster ~]$ tmux # to enter a persistent tmux session
[you#cluster ~]$ srun [...] bash # to get a shell on a compute node
[you#computenode ~]$ # some work, then...
some SSH error (e.g. Write failed: Broken pipe)
[you#yourlaptop ~]$ ssh cluster-frontend
[you#cluster ~]$ tmux a # to re-attach to the persistent tmux session
[you#computenode ~]$ # resume work
With screen, you would use screen -r rather than tmux a. Otherwise the process is the same.
If you want to join a job from another terminal instance (on the right below), you can use the Slurm's sattach command.
[you#yourlaptop ~]$ ssh cluster-frontend |
[you#cluster ~]$ srun [...] bash |
srun: job ******* queued and waiting for resources |
srun: job ******* has been allocated resources | [you#yourlaptop ~]$ ssh cluster-frontend
[you#computenode ~]$ | [you#cluster ~]$ sattach --pty ********
[you#computenode ~]$ echo OK | [you#computenode ~]$ echo OK
[you#computenode ~]$ OK | [you#computenode ~]$ OK
The original terminal and the one in which sattach was run are now entirely synchronised.
Note that the above does not protect from an accidental termination of srun; whenever srun terminates, the job is terminated as well.

Related

Change salloc behavior to run all commands on remote node

The default behaviour for salloc will run any shell related commands on the node where salloc was called from, while any srun commands called from that salloc job shell will run on the node that was allocated. Does anyone know of a way to get salloc to interactively run all commands on the remote node job shell?
Below is an example of the current default behaviour I'm seeing. Ideally, the first hostname command would run on slurm-node02 and return that hostname. Thanks!
[testyboi#slurm-node01 ~]$ salloc --nodelist=slurm-node02
salloc: Granted job allocation 890
[testyboi#slurm-node01 ~]$ hostname
slurm-node01
[testyboi#slurm-node01 ~]$ cat salloctest.sh
#!/bin/bash
echo "I am running on "; hostname;
[testyboi#slurm-node01 ~]$ srun -N1 salloctest.sh
I am running on
slurm-node02
The former FAQ, relevant for versions prior to 20.11, suggested to set
SallocDefaultCommand="srun -n1 -N1 --mem-per-cpu=0 --pty --preserve-env --cpu-bind=no --mpi=none $SHELL"
in slurm.conf. For the current version, 20.11, there is a new option. You can set
LaunchParameters=use_interactive_step
in slurm.conf.

Automate SNX restart with crontab

I am using VPN connection using SSL Network Extender(SNX) to connect to remote server. The connection from the remote server is limited to only 12 hours. After that the connection is being disconnected and have to restart the SNX server again. To overcome those hardship I am trying to automate SNX restart using crontab.
I have created one shell script file called vpn.sh.
#!/bin/bash
snx -d
sleep 3
echo 'password' | snx
I have config file call .snxrc inside home directory
server server.com
username username
reauth yes
Inside crontab (crontab -e) config I have
* */12 * * * bash /home/username/vpn.sh > /home/username/cron.log
It runs every 12 hours. But snx -d runs successfully but on reaching echo 'newpass6' | snx I am getting this error:
Failed to init terminal!
Any body encountered such issues? Please help me. I have been struggling for a week now. Thanks in advance.
I have followed this link to setup snx
Because snx client cannot start without a terminal. So i put in my script these commands to start snx in a byobu session.
byobu new-session -d -s vpn;
byobu new-window -t vpn:1 -n "snx" "echo your_password | snx -s your_ip -u your_user; sleep 10"
The approved answer doesn't work for me. It creates an empty tmux session with no command executed inside. So this is my way to do this task:
byobu-tmux new-session -d "echo <password> | nohup snx -s <host> -u <user>"
Only one command to make it work. nohup is required because snx process is going to the background and returning prompt. After that, tmux exits, and snx is not assigned to the terminal. Without nohup after tmux exits, the system will terminate the snx process.

Get PID of the jobs submitted by nohup in Linux

I'm using nohup to submit jobs in background in the machines I got through BSUB and ssh.
My primary machine is on RHEL, from there I am picking up other AIX machine by BSUB(Submits a job to LSF) and also doing a SSH login to another server.
After getting these two machines, executing a script(inner.sh) there through nohup.
I'm capturing the respective PIDs through echo $$ in the script which I am executing(inner.sh).
After submitting the nohup execution in background, I am exiting both of the machines and landing back to primary RHEL machine.
Now, from this RHEL machine, I'm trying to get the status of the nohup execution by ps -p PID with the help of the previously captured two PIDs, but not getting any process listed.
Top level wrapper script wrapper.sh:
#!/bin/bash
#login to a remote server
ssh -k xyz#abc < env_setup.sh
#picking up a AIX machine form LSF
bsub -q night -Is ksh -i env_setup.sh
ps -p process-<AIX_machine>.pid
#got no output
ps -p process-<server_machine>.pid
#got no output
Script passed to Machines picked up by BSUB/SSH to execute nohup env_setup.sh:
#!/bin/bash
nohup sh /path/to/dir/inner.sh > /path/to/dir/log-<hostname>.out &
exit
The actual script which I am trying to execute in machines picked up by BSUB/SSH inner.sh:
#!/bin/bash
echo $$ > /path/to/dir/process-<hostname>.pid
#hope this would give correct us the PID of the remote machine
#execute some other commands
Now, I am getting two process-<hostname>.pid files updated with two PIDs respectively each for both of the machines.
But ps -p at wrapper script is giving us no output.
I am picking up the process IDs from remote machines and doing ps -p at my local RHEL machine.
Is it the reason I am not getting any status update of those two processes?
Can I do anything else to get the status?
ps give the status of local processes. bsub can be used to get the status of processes on each remote machine.

How to avoid extra shell when ssh -f?

If I do
ssh -f 10.10.47.47 "/opt/omni/bin/mbuffer -4 -v 0 -q -I 8024 | /usr/sbin/zfs receive tank/test"
then I have to press CTRL-D to exit an extra shell that have been spawned on the Linux host, where I ran the ssh command.
If I do
ssh -f 10.10.47.47 "sleep 10s"
and then try to CTRL-D than the Linux host hangs until the sleep command exits on the remote host. Very weird behaviour, which I wouldn't expect since -f was used.
Question
Is it possible to avoid this extra shell on the host where the ssh command is executed?

ssh executes remote command in the background [duplicate]

This is a follow-on question to the How do you use ssh in a shell script? question. If I want to execute a command on the remote machine that runs in the background on that machine, how do I get the ssh command to return? When I try to just include the ampersand (&) at the end of the command it just hangs. The exact form of the command looks like this:
ssh user#target "cd /some/directory; program-to-execute &"
Any ideas? One thing to note is that logins to the target machine always produce a text banner and I have SSH keys set up so no password is required.
I had this problem in a program I wrote a year ago -- turns out the answer is rather complicated. You'll need to use nohup as well as output redirection, as explained in the wikipedia artcle on nohup, copied here for your convenience.
Nohuping backgrounded jobs is for
example useful when logged in via SSH,
since backgrounded jobs can cause the
shell to hang on logout due to a race
condition [2]. This problem can also
be overcome by redirecting all three
I/O streams:
nohup myprogram > foo.out 2> foo.err < /dev/null &
This has been the cleanest way to do it for me:-
ssh -n -f user#host "sh -c 'cd /whereever; nohup ./whatever > /dev/null 2>&1 &'"
The only thing running after this is the actual command on the remote machine
Redirect fd's
Output needs to be redirected with &>/dev/null which redirects both stderr and stdout to /dev/null and is a synonym of >/dev/null 2>/dev/null or >/dev/null 2>&1.
Parantheses
The best way is to use sh -c '( ( command ) & )' where command is anything.
ssh askapache 'sh -c "( ( nohup chown -R ask:ask /www/askapache.com &>/dev/null ) & )"'
Nohup Shell
You can also use nohup directly to launch the shell:
ssh askapache 'nohup sh -c "( ( chown -R ask:ask /www/askapache.com &>/dev/null ) & )"'
Nice Launch
Another trick is to use nice to launch the command/shell:
ssh askapache 'nice -n 19 sh -c "( ( nohup chown -R ask:ask /www/askapache.com &>/dev/null ) & )"'
If you don't/can't keep the connection open you could use screen, if you have the rights to install it.
user#localhost $ screen -t remote-command
user#localhost $ ssh user#target # now inside of a screen session
user#remotehost $ cd /some/directory; program-to-execute &
To detach the screen session: ctrl-a d
To list screen sessions:
screen -ls
To reattach a session:
screen -d -r remote-command
Note that screen can also create multiple shells within each session. A similar effect can be achieved with tmux.
user#localhost $ tmux
user#localhost $ ssh user#target # now inside of a tmux session
user#remotehost $ cd /some/directory; program-to-execute &
To detach the tmux session: ctrl-b d
To list screen sessions:
tmux list-sessions
To reattach a session:
tmux attach <session number>
The default tmux control key, 'ctrl-b', is somewhat difficult to use but there are several example tmux configs that ship with tmux that you can try.
I just wanted to show a working example that you can cut and paste:
ssh REMOTE "sh -c \"(nohup sleep 30; touch nohup-exit) > /dev/null &\""
You can do this without nohup:
ssh user#host 'myprogram >out.log 2>err.log &'
Quickest and easiest way is to use the 'at' command:
ssh user#target "at now -f /home/foo.sh"
I think you'll have to combine a couple of these answers to get what you want. If you use nohup in conjunction with the semicolon, and wrap the whole thing in quotes, then you get:
ssh user#target "cd /some/directory; nohup myprogram > foo.out 2> foo.err < /dev/null"
which seems to work for me. With nohup, you don't need to append the & to the command to be run. Also, if you don't need to read any of the output of the command, you can use
ssh user#target "cd /some/directory; nohup myprogram > /dev/null 2>&1"
to redirect all output to /dev/null.
This worked for me may times:
ssh -x remoteServer "cd yourRemoteDir; ./yourRemoteScript.sh </dev/null >/dev/null 2>&1 & "
You can do it like this...
sudo /home/script.sh -opt1 > /tmp/script.out &
It appeared quite convenient for me to have a remote tmux session using the tmux new -d <shell cmd> syntax like this:
ssh someone#elsewhere 'tmux new -d sleep 600'
This will launch new session on elsewhere host and ssh command on local machine will return to shell almost instantly. You can then ssh to the remote host and tmux attach to that session. Note that there's nothing about local tmux running, only remote!
Also, if you want your session to persist after the job is done, simply add a shell launcher after your command, but don't forget to enclose in quotes:
ssh someone#elsewhere 'tmux new -d "~/myscript.sh; bash"'
Actually, whenever I need to run a command on a remote machine that's complicated, I like to put the command in a script on the destination machine, and just run that script using ssh.
For example:
# simple_script.sh (located on remote server)
#!/bin/bash
cat /var/log/messages | grep <some value> | awk -F " " '{print $8}'
And then I just run this command on the source machine:
ssh user#ip "/path/to/simple_script.sh"
If you run remote command without allocating tty, redirect stdout/stderr works, nohup is not necessary.
ssh user#host 'background command &>/dev/null &'
If you use -t to allocate tty to run interactive command along with background command, and background command is the last command, like this:
ssh -t user#host 'bash -c "interactive command; nohup backgroud command &>/dev/null &"'
It's possible that background command doesn't actually start. There's race here:
bash exits after nohup starts. As a session leader, bash exit results in HUP signal sent to nohup process.
nohup ignores HUP signal.
If 1 completes before 2, the nohup process will exit and won't start the background command at all. We need to wait nohup start the background command. A simple workaroung is to just add a sleep:
ssh -t user#host 'bash -c "interactive command; nohup backgroud command &>/dev/null & sleep 1"'
The question was asked and answered years ago, I don't know if openssh behavior changed since then. I was testing on:
OpenSSH_8.6p1, OpenSSL 1.1.1g FIPS 21 Apr 2020
I was trying to do the same thing, but with the added complexity that I was trying to do it from Java. So on one machine running java, I was trying to run a script on another machine, in the background (with nohup).
From the command line, here is what worked: (you may not need the "-i keyFile" if you don't need it to ssh to the host)
ssh -i keyFile user#host bash -c "\"nohup ./script arg1 arg2 > output.txt 2>&1 &\""
Note that to my command line, there is one argument after the "-c", which is all in quotes. But for it to work on the other end, it still needs the quotes, so I had to put escaped quotes within it.
From java, here is what worked:
ProcessBuilder b = new ProcessBuilder("ssh", "-i", "keyFile", "bash", "-c",
"\"nohup ./script arg1 arg2 > output.txt 2>&1 &\"");
Process process = b.start();
// then read from process.getInputStream() and close it.
It took a bit of trial & error to get this working, but it seems to work well now.
YOUR-COMMAND &> YOUR-LOG.log &
This should run the command and assign a process id you can simply tail -f YOUR-LOG.log to see results written to it as they happen. you can log out anytime and the process will carry on
If you are using zsh then use program-to-execute &! is a zsh-specific shortcut to both background and disown the process, such that exiting the shell will leave it running.
A follow-on to #cmcginty's concise working example which also shows how to alternatively wrap the outer command in double quotes. This is how the template would look if invoked from within a PowerShell script (which can only interpolate variables from within double-quotes and ignores any variable expansion when wrapped in single quotes):
ssh user#server "sh -c `"($cmd) &>/dev/null </dev/null &`""
Inner double-quotes are escaped with back-tick instead of backslash. This allows $cmd to be composed by the PowerShell script, e.g. for deployment scripts and automation and the like. $cmd can even contain a multi-line heredoc if composed with unix LF.
First follow this procedure:
Log in on A as user a and generate a pair of authentication keys. Do not enter a passphrase:
a#A:~> ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/home/a/.ssh/id_rsa):
Created directory '/home/a/.ssh'.
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /home/a/.ssh/id_rsa.
Your public key has been saved in /home/a/.ssh/id_rsa.pub.
The key fingerprint is:
3e:4f:05:79:3a:9f:96:7c:3b:ad:e9:58:37:bc:37:e4 a#A
Now use ssh to create a directory ~/.ssh as user b on B. (The directory may already exist, which is fine):
a#A:~> ssh b#B mkdir -p .ssh
b#B's password:
Finally append a's new public key to b#B:.ssh/authorized_keys and enter b's password one last time:
a#A:~> cat .ssh/id_rsa.pub | ssh b#B 'cat >> .ssh/authorized_keys'
b#B's password:
From now on you can log into B as b from A as a without password:
a#A:~> ssh b#B
then this will work without entering a password
ssh b#B "cd /some/directory; program-to-execute &"
I think this is what you need:
At first you need to install sshpass on your machine.
then you can write your own script:
while read pass port user ip; do
sshpass -p$pass ssh -p $port $user#$ip <<ENDSSH1
COMMAND 1
.
.
.
COMMAND n
ENDSSH1
done <<____HERE
PASS PORT USER IP
. . . .
. . . .
. . . .
PASS PORT USER IP
____HERE

Resources