Signal handling difference "shutdown -r" and "reboot" in Linux - linux

I am using Ubuntu 16.10 LTS Linux Desktop. I found there is a difference in the signal handling during shutdown -r and reboot.
I built an a.out, which will capture the SIGTERM, and post some messages. With shutdown -r, a.out can get the SIGTERM, and post the messages. But with reboot (and reboot -f), a.out did NOT get the SIGTERM or post the messages.
What is the difference between shutdown -r and reboot in Linux. And who/when sends the SIGTERM to all other processes.
Thanks.

Related

gdb do not echo my input until I press Enter [duplicate]

I have a program running on a remote machine which expects to receive SIGINT from the parent. That program needs to receive that signal to function correctly. Unfortunately, if I run that process remotely over SSH and send SIGINT, the ssh process itself traps and interrupts rather than forwarding the signal.
Here's an example of this behavior using GDB:
Running locally:
$ gdb
GNU gdb 6.3.50-20050815 (Apple version gdb-1344) (Fri Jul 3 01:19:56 UTC 2009)
...
This GDB was configured as "x86_64-apple-darwin".
^C
(gdb) Quit
^C
(gdb) Quit
^C
(gdb) Quit
Running remotely:
$ ssh foo.bar.com gdb
GNU gdb Red Hat Linux (6.3.0.0-1.159.el4rh)
...
This GDB was configured as "i386-redhat-linux-gnu".
(gdb) ^C
Killed by signal 2.
$
Can anybody suggest a way of working around this problem? The local ssh client is OpenSSH_5.2p1.
$ ssh -t foo.bar.com gdb
...
(gdb) ^C
Quit
Try signal SIGINT at the gdb prompt.
It looks like you're doing ctrl+c. The problem is that your terminal window is sending SIGINT to the ssh process running locally, not to the process on the remote system.
You'll have to specify a signal manually using the kill command or system call on the remote system.
or more conveniently using killall
$killall -INT gdb
Can you run a terminal on the remote machine and use kill -INT to send it the signal?

strace init process (PID 1) in linux

The strace manpage says:
On Linux, exciting as it would be, tracing the init process is
forbidden.
I checked the same and it doesn't allow it:
$ strace -p 1
attach: ptrace(PTRACE_ATTACH, ...): Operation not permitted
Why isn't it possible? Even the ptrace manpage says the same about tracing init process. Aren't these tools safe or just that the init process is deemed too special that no other processes (strace/ptrace) can signal it.
sudo strace -p 1 works for me ( you need root privileges for strace )
There was work to allow debugging of init. In 2.4.37 you can't attach to init, but in some kernel this condition was removed - I've found 3.8 kernel
Edit: on my Kubuntu 15.10 there is no On Linux, exciting as it would be, tracing the init process is forbidden. in strace man. Updated man?

Does linux kill background processes if we close the terminal from which it has started?

I have an embedded system, on which I do telnet and then I run an application in background:
./app_name &
Now if I close my terminal and do telnet from other terminal and if I check then I can see this process is still running.
To check this I have written a small program:
#include<stdio.h>
main()
{
while(1);
}
I ran this program in my local linux pc in background and I closed the terminal.
Now, when I checked for this process from other terminal then I found that this process was also killed.
My question is:
Why undefined behavior for same type of process?
On which it is dependent?
Is it dependent on version of Linux?
Who should kill jobs?
Normally, foreground and background jobs are killed by SIGHUP sent by kernel or shell in different circumstances.
When does kernel send SIGHUP?
Kernel sends SIGHUP to controlling process:
for real (hardware) terminal: when disconnect is detected in a terminal driver, e.g. on hang-up on modem line;
for pseudoterminal (pty): when last descriptor referencing master side of pty is closed, e.g. when you close terminal window.
Kernel sends SIGHUP to other process groups:
to foreground process group, when controlling process terminates;
to orphaned process group, when it becomes orphaned and it has stopped members.
Controlling process is the session leader that established the connection to the controlling terminal.
Typically, the controlling process is your shell. So, to sum up:
kernel sends SIGHUP to the shell when real or pseudoterminal is disconnected/closed;
kernel sends SIGHUP to foreground process group when the shell terminates;
kernel sends SIGHUP to orphaned process group if it contains stopped processes.
Note that kernel does not send SIGHUP to background process group if it contains no stopped processes.
When does bash send SIGHUP?
Bash sends SIGHUP to all jobs (foreground and background):
when it receives SIGHUP, and it is an interactive shell (and job control support is enabled at compile-time);
when it exits, it is an interactive login shell, and huponexit option is set (and job control support is enabled at compile-time).
See more details here.
Notes:
bash does not send SIGHUP to jobs removed from job list using disown;
processes started using nohup ignore SIGHUP.
More details here.
What about other shells?
Usually, shells propagate SIGHUP. Generating SIGHUP at normal exit is less common.
Telnet or SSH
Under telnet or SSH, the following should happen when connection is closed (e.g. when you close telnet window on PC):
client is killed;
server detects that client connection is closed;
server closes master side of pty;
kernel detects that master pty is closed and sends SIGHUP to bash;
bash receives SIGHUP, sends SIGHUP to all jobs and terminates;
each job receives SIGHUP and terminates.
Problem
I can reproduce your issue using bash and telnetd from busybox or dropbear SSH server: sometimes, background job doesn't receive SIGHUP (and doesn't terminate) when client connection is closed.
It seems that a race condition occurs when server (telnetd or dropbear) closes master side of pty:
normally, bash receives SIGHUP and immediately kills background jobs (as expected) and terminates;
but sometimes, bash detects EOF on slave side of pty before handling SIGHUP.
When bash detects EOF, it by default terminates immediately without sending SIGHUP. And background job remains running!
Solution
It is possible to configure bash to send SIGHUP on normal exit (including EOF) too:
Ensure that bash is started as login shell. The huponexit works only for login shells, AFAIK.
Login shell is enabled by -l option or leading hyphen in argv[0]. You can configure telnetd to run /bin/bash -l or better /bin/login which invokes /bin/sh in login shell mode.
E.g.:
telnetd -l /bin/login
Enable huponexit option.
E.g.:
shopt -s huponexit
Type this in bash session every time or add it to .bashrc or /etc/profile.
Why does the race occur?
bash unblocks signals only when it's safe, and blocks them when some code section can't be safely interrupted by a signal handler.
Such critical sections invoke interruption points from time to time, and if signal is received when a critical section is executed, it's handler is delayed until next interruption point happens or critical section is exited.
You can start digging from quit.h in the source code.
Thus, it seems that in our case bash sometimes receives SIGHUP when it's in a critical section. SIGHUP handler execution is delayed, and bash reads EOF and terminates before exiting critical section or calling next interruption point.
Reference
"Job Control" section in official Glibc manual.
Chapter 34 "Process Groups, Sessions, and Job Control" of "The Linux Programming Interface" book.
When you close the terminal, shell sends SIGHUP to all background processes – and that kills them. This can be suppressed in several ways, most notably:
nohup
When you run program with nohup it catches SIGHUP and redirect program output.
$ nohup app &
disown
disown tells shell not to send SIGHUP
$ app &
$ disown
Is it dependent on version of linux?
It is dependent on your shell. Above applies at least for bash.
AFAIK in both cases the process should be killed. In order to avoid this you have to issue a nohup like the following:
> nohup ./my_app &
This way your process will continue executing. Probably the telnet part is due to a BUG similar to this one:
https://bugzilla.redhat.com/show_bug.cgi?id=89653
In order completely understand whats happening you need to get into unix internals a little bit.
When you are running a command like this
./app_name &
The app_name is sent to background process group. You can check about unix process groups here
When you close bash with normal exit it triggers SIGHUP hangup signal to all its jobs. Some information on unix job control is here.
In order to keep your app running when you exit bash you need to make your app immune to hangup signal with nohup utility.
nohup - run a command immune to hangups, with output to a non-tty
And finally this is how you need to do it.
nohup app_name & 2> /dev/null;
In modern Linux--that is, Linux with systemd--there is an additional reason this might happen which you should be aware of: "linger".
systemd kills processes left running from a login shell, even if the process is properly daemonized and protected from HUP. This is the default behavior in modern configurations of systemd.
If you run
loginctl enable-linger $USER
you can disable this behavior, allowing background processes to keep running. The mechanisms covered by the other answers still apply, however, and you should also protect your process against them.
enable-linger is permanent until it is re-disabled. You can check it with
ls /var/lib/systemd/linger
This may have files, one per username, for users who have enable-linger. Any user listed in the directory has the ability to leave background processes running at logout.

Simulating a process stuck in a blocking system call

I'm trying to test a behaviour which is hard to reproduce in a controlled environment.
Use case:
Linux system; usually Redhat EL 5 or 6 (we're just starting with RHEL 7 and systemd, so it's currently out of scope).
There're situations where I need to restart a service. The script we use for stopping the service usually works quite well; it sends a SIGTERM to the process, which is designed to handle it; if the process doesn't handle the SIGTERM within a timeout (usually a couple of minutes) the script sends a SIGKILL, then waits a couple minutes more.
The problem is: in some (rare) situations, the process doesn't exit after a SIGKILL; this usually happens when it's badly stuck on a system call, possibly because of a kernel-level issue (corrupt filesystem, or not-working NFS filesystem, or something equally bad requiring manual intervention).
A bug arose when the script didn't realize that the "old" process hadn't actually exited and started a new process while the old was still running; we're fixing this with a stronger locking system (so that at least the new process doesn't start if the old is running), but I find it difficult to test the whole thing because I haven't found a way to simulate an hard-stuck process.
So, the question is:
How can I manually simulate a process that doesn't exit when sending a SIGKILL to it, even as a privileged user?
If your process are stuck doing I/O, You can simulate your situation in this way:
lvcreate -n lvtest -L 2G vgtest
mkfs.ext3 -m0 /dev/vgtest/lvtest
mount /dev/vgtest/lvtest /mnt
dmsetup suspend /dev/vgtest/lvtest && dd if=/dev/zero of=/mnt/file.img bs=1M count=2048 &
In this way the dd process will stuck waiting for IO and will ignore every signal, I know the signals aren't ignore in the latest kernel when processes are waiting for IO on nfs filesystem.
Well... How about just not sending SIGKILL? So your env will behave like it was sent, but the process didn't quit.
Once a proces is in "D" state (or TASK_UNINTERRUPTIBLE) in a kernel code path where the execution can not be interrupted while a task is processed, which means sending any signals to the process would not be useful and would be ignored.
This can be caused due to device driver getting too many interrupts from the hardware, getting too many incoming network packets, data from NIC firmware or blocked on a HDD performing I/O. Normally if this happens very quickly and threads remain in this state for very short span of time.
Therefore what you need to be doing is look at the syslog and sar reports during the time when the process was stuck in D-state. If you find stack traces in the log, try to search kernel.bugzilla.org for similar issues or seek support from the Linux vendor.
I would code the opposite way. Have your server process write its pid in e.g. /var/run/yourserver.pid (this is common practice). Have the starting script read that file and test that the process does not exist e.g. with kill of signal 0, or with
yourserver_pid=$(cat /var/run/yourserver.pid)
if [ -f /proc/$yourserver_pid/exe ]; then
You could improve that by readlink /proc/$yourserver_pid/exe and comparing that to /usr/bin/yourserver
BTW, having a process still alive a few seconds after a SIGKILL is a serious situation (the common case when it could happen is if the process is stuck in a D state, waiting for some NFS server), and you probably should detect and syslog it (e.g. with logger in your script).
I also would try to first send SIGTERM, wait a few seconds, send SIGQUIT, wait a few seconds, and at last send SIGKILL and only a few seconds later test that the server process has gone
A bug arose when the script didn't realize that the "old" process hadn't actually exited and started a new process while the old was still running;
This is the bug in the OS/kernel level, not in your service script. The situation is rare and is hard to simulate because the OS is supposed to kill the process when SIGKILL signal happens. So I guess your goal is to let your script work well under a buggy kernel. Is that correct?
You can attach gdb to the process, SIGKILL won't remove such process from processlist but it will flag it as zombie, which might still be acceptable for your purpose.
void#tahr:~$ ping 8.8.8.8 > /tmp/ping.log &
[1] 3770
void#tahr:~$ ps 3770
PID TTY STAT TIME COMMAND
3770 pts/13 S 0:00 ping 8.8.8.8
void#tahr:~$ sudo gdb -p 3770
...
(gdb)
Other terminal
void#tahr:~$ ps 3770
PID TTY STAT TIME COMMAND
3770 pts/13 t 0:00 ping 8.8.8.8
sudo kill -9 3770
...
void#tahr:~$ ps 3770
PID TTY STAT TIME COMMAND
3770 pts/13 Z 0:00 [ping] <defunct>
First terminal again
(gdb) quit

Why am I getting "killall -q -USR1 udhcpd" error message?

I have a router with me. It always shows the following debug message:
killall -q -USR1 udhcpd
Can anyone explain to me what is happening here? Why am I getting such an error?
killall is used to kill a process by name. It sends a signal to all process running by that name. If no signal is specified, SIGTERM is sent. However, it is not in this case. It is used to send the USR1 signal to udhcpd. According to udhcpd, signal USR1 by default is to trigger a forced write to the lease file.
Reference: busybox udhcpd source code

Resources