How to know from a bash script if the user abruptly closes ssh session

I have a bash script that acts as the default shell for a user loging in trough ssh.
It provides a menu with several options one of wich is sending a file using netcat.
The netcat of the embedded linux I'm using lacks the -w option, so if the user closes the ssh connection without ever sending the file, the netcat command waits forever.
I need to know if the user abruptly closes the connection so the script can kill the netcat command and exit gracefully.
Things I've tried so far:
Trapping the SIGHUP: it is not issued. The only signal issued i could find is SIGCONT, but I don't think it's reliable and portable.
Playing with the -t option of the read command to detect a closed stdin: this would work if not for a silly bug in the embedded read command (only times out on the first invocation)
I'll try to answer the questions in the comments and explain the situation further.
The code I have is:
nc -l -p 7576 > /dev/null 2>> $LOGFILE < $TMP_DIR/$BACKUP_FILE &
I'm ignoring SIGINT and SIGTSTP, but I've tried to trap all the signals and the only one received is SIGCONT.
Reading the bash man page I've found out that the SIGHUP should be sent to both script and netcat and that the SIGCONT is sent to stopped jobs to ensure they receive the SIGHUP.
I guess the wait makes the script count as stopped and so it receives the SIGCONT but at the same time the wait somehow eats up the SIGHUP.
So I've tried changing the wait for a sleep and then both SIGHUP and SIGCONT are received.
The question is: why is the wait blocking the SIGHUP?
Edit 2: Solved
I solved it polling for a closed stdin with the read builtin using the -t option. To work around the bug in the read builtin I spawn it in a new bash (bash -c "read -t 3 dummy").

Linux script for probing ssh connection in a loop and start log command after connect

I have a host machine that gets rebooted or reconnected quite a few times.
I want to have a script running on my dev machine that continuously tries to log into that machine and if successful runs a specific command (tailing the log data).
Edit: To clarify, the connection needs to stay open. The log command keeps tailing until I stop it manually.
What I have so far
if (("$#" >= 1))
trap 'echo "stopping"; LOOP=0' INT
while (( $LOOP==1 ))
if ping -c1 $IP
echo "Host $IP reached"
sshpass -p 'password' ssh -o ConnectTimeout=10 -q user#$IP '<command would go here>'
echo "Host $IP unreachable"
sleep 1
The LOOP flag is not really used. The script is ended via CTRL-C.
Now this works if I do NOT add a command to be executed after the ssh and instead start the log output manually. On a disconnect the script keeps probing the connection and logs back in once the host is available again.
Also when I disconnect from the host (CTRL-D) the script will log right back into the host if CTRL-C is not pressed fast enough.
When I add a command to be executed after ssh the loop is broken. So pressing (CTRL-C) does not only stop the log but also disconnects and ends the script on the dev machine.
I guess I have to spawn another shell somewhere or something like that?
1) I want the script to keep probing, log in and run a command completely automatically and fall back to probing when the connection breaks.
2) I want to be able to stop the log on the host (CTRL-C) and thereby fall back to a logged in ssh connection to use it manually.
How do I fix this?
Maybe best approach on "fixing" would be fixing requirements.
The problematic part is number "2)".
The problem is from how SIGINT works.
When triggered, it is sent to the current control group related to your terminal. Mostly this is the shell and any process started from there. With more modern shells (you seem to use bash), the shell manages control groups such that programs started in the background are disconnected (by having been assigned a different control group).
In your case the ssh is started in the foreground (from a script executed in the foreground), so it will receive the interrupt, forward it to the remote and terminate as soon as the remote end terminated. As by that time the script shell has processed its signal handler (specified by trap) it is going to exit the loop and terminate itself.
So, as you can see, you have overloaded CTRL-C to mean two things:
terminate the monitoring script
terminate the remote command and continue with whatever is specified for the remote side.
You might get closer to what you want if you drop the first effect (or at least make it more explicit). Then, calling a script on the remote side that does not terminate itself but just the tail command, will be step. In that case you will likely need to use -t switch on ssh to get a terminal allocated for allowing normal shell operation later.
This, will not allow for terminating the remote side with just CTRL-C. You always will need to exit the remote shell that is going to be run.
The essence of such a remote script might look like:
tail command
of course you would need to add whatever parts will be necessary for your shell or coding style.
An alternate approach would be to keep the current remote command being terminated and add another ssh call for the case of being interrupted that is spanning the shell for interactive use. But in that case, also `CTRL-C will not be available for terminating the minoring altogether.
To achieve this you might try changing active interrupt handler with your monitoring script to trigger termination as soon as the remote side returns. However, this will cause a race condition between the user being able to recognize remote command terminated (and control has been returned to local script) and the proper interrupt handler being in place. You might be able to sufficiently lower that risk be first activating the new trap handler and then echoing the fact and maybe add a sleep to allow the user to react.


! /bin/sh
while [ true ];
RESPONSE=`ssh -i /home/user/.ssh/id_host user#$IP 'tail /home/user/log.txt'`
sleep 10

Does linux kill background processes if we close the terminal from which it has started?

I have an embedded system, on which I do telnet and then I run an application in background:
./app_name &
Now if I close my terminal and do telnet from other terminal and if I check then I can see this process is still running.
To check this I have written a small program:
I ran this program in my local linux pc in background and I closed the terminal.
Now, when I checked for this process from other terminal then I found that this process was also killed.
My question is:
Why undefined behavior for same type of process?
On which it is dependent?
Is it dependent on version of Linux?
Who should kill jobs?
Normally, foreground and background jobs are killed by SIGHUP sent by kernel or shell in different circumstances.
When does kernel send SIGHUP?
Kernel sends SIGHUP to controlling process:
for real (hardware) terminal: when disconnect is detected in a terminal driver, e.g. on hang-up on modem line;
for pseudoterminal (pty): when last descriptor referencing master side of pty is closed, e.g. when you close terminal window.
Kernel sends SIGHUP to other process groups:
to foreground process group, when controlling process terminates;
to orphaned process group, when it becomes orphaned and it has stopped members.
Controlling process is the session leader that established the connection to the controlling terminal.
Typically, the controlling process is your shell. So, to sum up:
kernel sends SIGHUP to the shell when real or pseudoterminal is disconnected/closed;
kernel sends SIGHUP to foreground process group when the shell terminates;
kernel sends SIGHUP to orphaned process group if it contains stopped processes.
Note that kernel does not send SIGHUP to background process group if it contains no stopped processes.
When does bash send SIGHUP?
Bash sends SIGHUP to all jobs (foreground and background):
when it receives SIGHUP, and it is an interactive shell (and job control support is enabled at compile-time);
when it exits, it is an interactive login shell, and huponexit option is set (and job control support is enabled at compile-time).
See more details here.
bash does not send SIGHUP to jobs removed from job list using disown;
processes started using nohup ignore SIGHUP.
More details here.
What about other shells?
Usually, shells propagate SIGHUP. Generating SIGHUP at normal exit is less common.
Telnet or SSH
Under telnet or SSH, the following should happen when connection is closed (e.g. when you close telnet window on PC):
client is killed;
server detects that client connection is closed;
server closes master side of pty;
kernel detects that master pty is closed and sends SIGHUP to bash;
bash receives SIGHUP, sends SIGHUP to all jobs and terminates;
each job receives SIGHUP and terminates.
I can reproduce your issue using bash and telnetd from busybox or dropbear SSH server: sometimes, background job doesn't receive SIGHUP (and doesn't terminate) when client connection is closed.
It seems that a race condition occurs when server (telnetd or dropbear) closes master side of pty:
normally, bash receives SIGHUP and immediately kills background jobs (as expected) and terminates;
but sometimes, bash detects EOF on slave side of pty before handling SIGHUP.
When bash detects EOF, it by default terminates immediately without sending SIGHUP. And background job remains running!
It is possible to configure bash to send SIGHUP on normal exit (including EOF) too:
Ensure that bash is started as login shell. The huponexit works only for login shells, AFAIK.
Login shell is enabled by -l option or leading hyphen in argv[0]. You can configure telnetd to run /bin/bash -l or better /bin/login which invokes /bin/sh in login shell mode.
telnetd -l /bin/login
Enable huponexit option.
shopt -s huponexit
Type this in bash session every time or add it to .bashrc or /etc/profile.
Why does the race occur?
bash unblocks signals only when it's safe, and blocks them when some code section can't be safely interrupted by a signal handler.
Such critical sections invoke interruption points from time to time, and if signal is received when a critical section is executed, it's handler is delayed until next interruption point happens or critical section is exited.
You can start digging from quit.h in the source code.
Thus, it seems that in our case bash sometimes receives SIGHUP when it's in a critical section. SIGHUP handler execution is delayed, and bash reads EOF and terminates before exiting critical section or calling next interruption point.
"Job Control" section in official Glibc manual.
Chapter 34 "Process Groups, Sessions, and Job Control" of "The Linux Programming Interface" book.
When you close the terminal, shell sends SIGHUP to all background processes – and that kills them. This can be suppressed in several ways, most notably:
When you run program with nohup it catches SIGHUP and redirect program output.
$ nohup app &
disown tells shell not to send SIGHUP
$ app &
$ disown
Is it dependent on version of linux?
It is dependent on your shell. Above applies at least for bash.
AFAIK in both cases the process should be killed. In order to avoid this you have to issue a nohup like the following:
> nohup ./my_app &
This way your process will continue executing. Probably the telnet part is due to a BUG similar to this one:
In order completely understand whats happening you need to get into unix internals a little bit.
When you are running a command like this
./app_name &
The app_name is sent to background process group. You can check about unix process groups here
When you close bash with normal exit it triggers SIGHUP hangup signal to all its jobs. Some information on unix job control is here.
In order to keep your app running when you exit bash you need to make your app immune to hangup signal with nohup utility.
nohup - run a command immune to hangups, with output to a non-tty
And finally this is how you need to do it.
nohup app_name & 2> /dev/null;
In modern Linux--that is, Linux with systemd--there is an additional reason this might happen which you should be aware of: "linger".
systemd kills processes left running from a login shell, even if the process is properly daemonized and protected from HUP. This is the default behavior in modern configurations of systemd.
If you run
loginctl enable-linger $USER
you can disable this behavior, allowing background processes to keep running. The mechanisms covered by the other answers still apply, however, and you should also protect your process against them.
enable-linger is permanent until it is re-disabled. You can check it with
ls /var/lib/systemd/linger
This may have files, one per username, for users who have enable-linger. Any user listed in the directory has the ability to leave background processes running at logout.

is it possible to create a non-child process inside a shell script?

I'm using a shell process pool API at Github, for a script, as below
function foobar()
job_pool_init 100 0
tcpdump -i eth0 -w tempcap & #
for i in `seq 1 4`;do
job_pool_run foobar $mesg
sleep 5
pkill tcpdump #
echo 'all finish'
if I comment the tcpdump line,
then it works fine, as expected,
but when the tcpdump line is there,
There is a wait command in job_pool_wait, which waits for the ending of all children process, if there is no such a tcpdump line, it is as expected.
But I want to capture something until all the child processes finish, so I have to use a tcpdump. In this script, tcpdump process is a child process,
job_pool_wait will also wait for the ending of tcpdump process, which is not expected.
so a solution is to make tcpdump not a child process,
how can I do it,
or any other solutions?
You should be able to run tcpdump in a sub-shell in the background:
(tcpdump -i eth0 -w tempcap &)
This should prevent it from appearing as a direct descendant of your script.
Answering your literal question, yes, run the command with exec. But I doubt that's what you really wanted.
I think what you really wanted is to be able to wait on specific pid. The wait command takes an optional pid. Either that round need to check when wait returns whether the process that just terminated is a process you're interested in, and wait again if it's not.

How can I keep my Linux program running after I exit ssh of my non-root user?

I've searched, googled, sat in IRC for a week and even talked to a friend who is devoutly aligned with linux but I haven't yet received a solid answer.
I have written a shell script that runs as soon as I log into my non-root user and runs basically just does "./myprogram &" without quotation. When I exit shh my program times out and I am unable to connect to it until I log back in. How can I keep my program running after I exit SSH of my non-root user?
I am curious if this has to be done on the program level or what? My apologizes if this does not belong here, I am not sure where it goes to be perfectly honest.
Beside using nohup, you can run your program in terminal multiplexer like screen or tmux. With them, you can reattach to sessions, which is for example quite helpful if you need to run terminal-based interactive programs or long time running scripts over a unstable ssh connections.
boybu is a nice enhancement of screen.
Try nohup:
Likely your program receives a SIGHUP signal when you exit your ssh session.
There's two signals that can cause your program to die after your ssh session ends: SIGHUP and SIGPIPE.
SIGHUP will be sent to your program because the parent process (ssh) has died. You can get around this either by using the program nohup (i.e. nohup ./myprogram &) or by using the shell builtin disown (./myprogram& disown)
SIGPIPE will be sent to your program if it tries to write to stdout or stderr after the ssh session has been disconnected. To get around this, redirect them to a file or /dev/null, i.e. nohup ./myprogram >/dev/null 2>/dev/null &
You might also want to use the batch (or at) command, in addition to the other answers (nohup, screen, ...). And ssh has a -f option which might interest you.

How to stop ffmpeg remotely?

I'm running ffmpeg on another machine for screen capture. I'd like to be able to stop it recording remotely. FFMPEG requires that q is pressed to stop encoding as it has to do some finalization to finish the file cleanly. I know I could kill it with kill/killall however this can lead to corrupt videos.
Press [q] to stop encoding
I can't find anything on google specifically for this, but some there is suggestion that echoing into /proc//fd/0 will work.
I've tried this but it does not stop ffmpeg. The q is however shown in the terminal in which ffmpeg is running.
echo -n q > /proc/16837/fd/0
So how can I send a character to another existing process in such a way it is as if it were typed locally? Or is there another way of remotely stopping ffmpeg cleanly.
Here's a neat trick I discovered when I was faced with this problem: Make an empty file (it doesn't have to be a named pipe or anything), then write 'q' to it when it's time to stop recording.
$ touch stop
$ <./stop ffmpeg -i ... output.ext >/dev/null 2>>Capture.log &
$ wait for stopping time
$ echo 'q' > stop
FFmpeg stops as though it got 'q' from the terminal STDIN.
Newer versions of ffmpeg don't use 'q' anymore, at least on Ubuntu Oneiric, instead they say to press Ctrl+C to stop them. So with a newer version you can simply use 'killall -INT' to send them SIGINT instead of SIGTERM, and they should exit cleanly.
Elaborating on the answer from sashoalm, i have tested both scenarios, and here are the results:
My experiments shows that doing
killall --user $USER --ignore-case --signal INT ffmpeg
Produces the following on the console where ffmpeg was running
Exiting normally, received signal 2.
While doing
killall --user $USER --ignore-case --signal SIGTERM ffmpeg
Exiting normally, received signal 15.
So it looks that ffmpeg is fine with both signals.
System: Debian GNU/Linux 9 (stretch), 2020-02-28
You can also try to use "expect" to automate the execution and stop of the program. You would have to start it using some virtual shell like screen, tmux or byobu and then start the ffmpeg inside of it. This way you would be able to get again the virtual shell screen and give the "q" option.
Locally or remotely start a virtual shell session, lets say with "screen". Name the session with -S option, like screen -S recvideo Then you can start the ffmpeg as you like. You can, optionally, detach from this session with a Ctrl+a + d.
Connect to the machine where the ffmpeg is running inside the screen (or tmux or whatever) and reconnect to it: screen -d -RR recvideo and then send the "q"
To do that from inside a script you can then use expect, like:
prompt="> "
expect << EOF
set timeout 20
spawn screen -S recvideo
expect "$prompt"
send -- "ffmpeg xxxxx\r"
set timeout 1
expect eof
Then, in another moment or script point or in another script you recover it:
expect << EOF
set timeout 30
spawn screen -d -RR recvideo
expect "$prompt"
send -- "q"
expect "$prompt"
send -- "exit\r"
expect eof
You can also automate the whole ssh session with expect, passing a sequence of commands and "expects" to do what you want.
The question has already been answered for Linux, but it came up when I was looking for the windows equivalent, so I'm gonna add that to the answers:
On powershell, you start the process like this:
$((Start-Process ffmpeg -passthru -argument "FFMPEG_ARGS").ID)
This sends back the PID of the FFMPEG process that you can store in a variable, or echo, and then you send the windows equivalent of sigint (Ctrl + C) using taskkill
taskkill /pid FFMPEG_PID
I tried with Stop-Process (which is what comes up when looking how to do this on Google) but it actually kills the process. (And yes, taskkill doesn't kill it, it gently asks the process to stop... good naming :D)
