Linux: Real-Time Signals - linux

I am testing different scenarios with real-time signals and unable to found the meaning of every signal such as SIGRTMIN+1 and SIGRTMIN+13.
I am able to send and receive the signals but trying to understand the meaning of all the SIGRTMIN+n signals. For example, I send number of Signals based on my program and one of them is killing a process:
/* The child process executes this function. */
void
child_function (void)
{
/* Perform initialization. */
printf ("I'm here!!! My pid is %d.\n", (int) getpid ());
/* Let parent know you’re done. */
kill (getppid (), SIGUSR1);
/* Continue with execution. */
puts ("Bye, now....");
exit (0);
}
I want to understand these SIGRTMIN+13, how does it work if I pass this to pass to kill a process forcefully.

real-time signals are designed to be used for application-defined purposes(it's up to you to give them a meaning),they have the advantage to be queued and accompanied with data which is not possible with standard signals, to use them effectively they are sent using sigqueue() "not kill()", and handled by sigaction() "not signal()".

I can really recommend reading this manual page. SIGRTMIN defines the lowest number you can choose for a user defined real time signal. By default, any program should behave undefined for this signal if you have not yet implemented the signal handler.

Related

How does SIGSTOP work in Linux kernel?

I am wondering how SIGSTOP works inside the Linux Kernel. How is it handled? And how the kernel stops running when it is handled?
I am familiar with the kernel code base. So, if you can reference kernel functions that will be fine, and in fact that is what I want. I am not looking for high level description from a user's perspective.
I have already bugged the get_signal_to_deliver() with printk() statements (it is compiling right now). But I would like someone to explain things in better details.
It's been a while since I touched the kernel, but I'll try to give as much detail as possible. I had to look up some of this stuff in various other places, so some details might be a little messy, but I think this gives a good idea of what happens under the hood.
When a signal is raised, the TIF_SIGPENDING flag is set in the process descriptor structure. Before returning to user mode, the kernel tests this flag with test_thread_flag(TIF_SIGPENDING), which will return true (because a signal is pending).
The exact details of where this happens seem to be architecture dependent, but you can see an example for um:
void interrupt_end(void)
{
struct pt_regs *regs = &current->thread.regs;
if (need_resched())
schedule();
if (test_thread_flag(TIF_SIGPENDING))
do_signal(regs);
if (test_and_clear_thread_flag(TIF_NOTIFY_RESUME))
tracehook_notify_resume(regs);
}
Anyway, it ends up calling arch_do_signal(), which is also architecture dependent and is defined in the corresponding signal.c file (see the example for x86):
void arch_do_signal(struct pt_regs *regs)
{
struct ksignal ksig;
if (get_signal(&ksig)) {
/* Whee! Actually deliver the signal. */
handle_signal(&ksig, regs);
return;
}
/* Did we come from a system call? */
if (syscall_get_nr(current, regs) >= 0) {
/* Restart the system call - no handlers present */
switch (syscall_get_error(current, regs)) {
case -ERESTARTNOHAND:
case -ERESTARTSYS:
case -ERESTARTNOINTR:
regs->ax = regs->orig_ax;
regs->ip -= 2;
break;
case -ERESTART_RESTARTBLOCK:
regs->ax = get_nr_restart_syscall(regs);
regs->ip -= 2;
break;
}
}
/*
* If there's no signal to deliver, we just put the saved sigmask
* back.
*/
restore_saved_sigmask();
}
As you can see, arch_do_signal() calls get_signal(), which is also in signal.c.
The bulk of the work happens inside get_signal(), it's a huge function, but eventually it seems to process the special case of SIGSTOP here:
if (sig_kernel_stop(signr)) {
/*
* The default action is to stop all threads in
* the thread group. The job control signals
* do nothing in an orphaned pgrp, but SIGSTOP
* always works. Note that siglock needs to be
* dropped during the call to is_orphaned_pgrp()
* because of lock ordering with tasklist_lock.
* This allows an intervening SIGCONT to be posted.
* We need to check for that and bail out if necessary.
*/
if (signr != SIGSTOP) {
spin_unlock_irq(&sighand->siglock);
/* signals can be posted during this window */
if (is_current_pgrp_orphaned())
goto relock;
spin_lock_irq(&sighand->siglock);
}
if (likely(do_signal_stop(ksig->info.si_signo))) {
/* It released the siglock. */
goto relock;
}
/*
* We didn't actually stop, due to a race
* with SIGCONT or something like that.
*/
continue;
}
See the full function here.
do_signal_stop() does the necessary processing to handle SIGSTOP, you can also find it in signal.c. It sets the task state to TASK_STOPPED with set_special_state(TASK_STOPPED), a macro that is defined in include/sched.h that updates the current process descriptor status. (see the relevant line in signal.c). Further down, it calls freezable_schedule() which in turn calls schedule(). schedule() calls __schedule() (also in the same file) in a loop until an eligible task is found. __schedule() attempts to find the next task to schedule (next in the code), and the current task is prev. The state of prev is checked, and because it was changed to TASK_STOPPED, deactivate_task() is called, which moves the task from the run queue to the sleep queue:
} else {
...
deactivate_task(rq, prev, DEQUEUE_SLEEP | DEQUEUE_NOCLOCK);
...
}
deactivate_task() (also in the same file) removes the process from the runqueue by decrementing the on_rq field of the task_struct to 0 and calling dequeue_task(), which moves the process to the new (waiting) queue.
Then, schedule() checks the number of runnable processes and selects the next task to enter the CPU according to the scheduling policies in effect (I think this is a little bit out of scope by now).
At the end of the day, SIGSTOP moves a process from the runnable queue to a waiting queue until that process receives SIGCONT.
Nearly every time there is an interrupt, the kernel suspends some process from running and switches to running the interrupt handler (the only exception being when there is no process running). Likewise, the kernel will suspend processes that run too long without giving up the CPU (and technically that's the same thing: it just originates from the timer interrupt or possibly an IPI). Ordinarily in these cases, the kernel then puts the suspended process back on the run queue and when the scheduling algorithm decides the time is right, it is resumed.
In the case of SIGSTOP, the same basic thing happens: the affected processes are suspended due to the reception of the stop signal. They just don't get put back on the run queue until SIGCONT is sent. Nothing extraordinary here: SIGSTOP is just instructing the kernel to make a process non-runnable until further notice.
[One note: you seemed to imply that the kernel stops running with SIGSTOP. That is of course not the case. Only the SIGSTOPped processes stop running.]

Interrupting open() with SIGALRM

We have a legacy embedded system which uses SDL to read images and fonts from an NFS share.
If there's a network problem, TTF_OpenFont() and IMG_Load() hang essentially forever. A test application reveals that open() behaves in the same way.
It occurred to us that a quick fix would be to call alarm() before the calls which open files on the NFS share. The man pages weren't entirely clear whether open() would fail with EINTR when interrupted by SIGALRM, so we put together a test app to verify this approach. We set up a signal handler with sigaction::sa_flags set to zero to ensure that SA_RESTART was not set.
The signal handler was called, but open() was not interrupted. (We observed the same behaviour with SIGINT and SIGTERM.)
I suppose the system treats open() as a "fast" operation even on "slow" infrastructure such as NFS.
Is there any way to change this behaviour and allow open() to be interrupted by a signal?
The man pages weren't entirely clear whether open() would fail with
EINTR when interrupted by SIGALRM, so we put together a test app to
verify this approach.
open(2) is a slow syscall (slow syscalls are those that can sleep forever, and can be awaken when, and if, a signal is caught in the meantime) only for some file types. In general, opens that block the caller until some condition occurs are usually interruptible. Known examples include opening a FIFO (named pipe), or (back in the old days) opening a physical terminal device (it sleeps until the modem is dialed).
NFS-mounted filesystems probably don't cause open(2) to sleep in an interruptible state. After all, you are most likely opening a regular file, and in that case open(2) will not be interruptable.
Is there any way to change this behaviour and allow open() to be
interrupted by a signal?
I don't think so, not without doing some (non-trivial) changes to the kernel.
I would explore the possibility of using setjmp(3) / longjmp(3) (see the manpage if you're not familiar; it's basically non-local gotos). You can initialize the environment buffer before calling open(2), and issue a longjmp(3) in the signal handler. Here's an example:
#include <stdio.h>
#include <stdlib.h>
#include <setjmp.h>
#include <unistd.h>
#include <signal.h>
static jmp_buf jmp_env;
void sighandler(int signo) {
longjmp(jmp_env, 1);
}
int main(void) {
struct sigaction sigact;
sigact.sa_handler = sighandler;
sigact.sa_flags = 0;
sigemptyset(&sigact.sa_mask);
if (sigaction(SIGALRM, &sigact, NULL) < 0) {
perror("sigaction(2) error");
exit(EXIT_FAILURE);
}
if (setjmp(jmp_env) == 0) {
/* First time through
* This is where we would open the file
*/
alarm(5);
/* Simulate a blocked open() */
while (1)
; /* Intentionally left blank */
/* If open(2) is successful here, don't forget to unset
* the alarm
*/
alarm(0);
} else {
/* SIGALRM caught, open(2) canceled */
printf("open(2) timed out\n");
}
return 0;
}
It works by saving the context environment with the help of setjmp(3) before calling open(2). setjmp(3) returns 0 the first time through, and returns whatever value was passed to longjmp(3) otherwise.
Please be aware that this solution is not perfect. Here are some points to keep in mind:
There is a window of time between the call to alarm(2) and the call to open(2) (simulated here with while (1) { ... }) where the process may be preempted for a long time, so there is a chance the alarm expires before we actually attempt to open the file. Sure, with a large timeout such as 2 or 3 seconds this will most likely not happen, but it's still a race condition.
Similarly, there is a window of time between successfully opening the file and canceling the alarm where, again, the process may be preempted for a long time and the alarm may expire before we get the chance to cancel it. This is slightly worse because we have already opened the file so we will "leak" the file descriptor. Again, in practice, with a large timeout this will likely never happen, but it's a race condition nevertheless.
If the code catches other signals, there may be another signal handler in the midst of execution when SIGALRM is caught. Using longjmp(3) inside the signal handler will destroy the execution context of these other signal handlers, and depending on what they were doing, very nasty things may happen (inconsistent state if the signal handlers were manipulating other data structures in the program, etc.). It's as if it started executing, and suddenly crashed somewhere in the middle. You can fix it by: a) carefully setting up all signal handlers such that SIGALRM is blocked before they are invoked (this ensures that the SIGALRM handler does not begin execution until other handlers are done) and b) blocking these other signals before catching SIGALRM. Both actions can be accomplished by setting the sa_mask field of struct sigaction with the necessary mask (the operating system atomically sets the process's signal mask to that value before beginning execution of the handler and unsets it before returning from the handler). OTOH, if the rest of the code doesn't catch signals, then this is not a problem.
sleep(3) may be implemented with alarm(2), and alarm(2) and setitimer(2) share the same timer; if other portions in the code make use of any of these functions, they will interfere and the result will be a huge mess.
Just make sure you weigh in these disadvantages before blindly using this approach. The use of setjmp(3) / longjmp(3) is usually discouraged and makes programs considerably harder to read, understand and maintain. It's not elegant, but right now I don't think you have a choice, unless you're willing to do some core refactoring in the project.
If you do end up using setjmp(3), then at the very least document these limitations.
Maybe there is a strategy of using a separate thread to do the open so the main thread is not held up longer than desired.

How to know the signal delivered to thread

I have a ARM based embedded system running 2.6.33.
A main process-A creates another process-B. Both are aplication process with Real time RR policy. This proc-B creates few threads with pthread_create(). I guess one of the thread is doing some wrong and the process is killed.
On using wait() in process-A i get status 1 returned (NORMAL) as shown below.
I want to know how to get which signal has been delivered to which thread inside
process-B.
waitpid(-1, &status, WUNTRACED | WCONTINUED)
and
if (WIFEXITED(status))
printf("Process %d terminated normally, status %d\n", pid,WEXITSTATUS(status));
Followed the link but got the same status as 1.
http://www.cs.cf.ac.uk/Dave/C/node32.html#SECTION003240000000000000000
Is there any other ways to find out the correct exit status of all threads and signal if any are sent to these threads ?
Ok, firstly, you should know that multithreading and signalling don't mix very well! This is in large reason due to the fact that signals are delivered to a PID; an MT app has 1 PID but multiple threads; which thread will 'get' / handle the signal?
Thus, the 'usual' strategy is to block all signal in all threads except one thread - a dedicated synchronous 'signal handler' thread (it typically issues the blocking sigwait(2); the return value is the signal that just arrived!).
Here's a (simplistic) app to demo mixing threads and signalling.
Second, to understand some detail about how/why a process died - technically, received a signal - use sigaction(2) with the SA_SIGINFO flag. The signal handler signature now is:
void func(int signo, siginfo_t *info, void *context)
The struct siginfo_t will give you all the detail you need about how/why this process received this signal! Ref: sigaction(2) man page.
Of course using this approach does mean that you use sigaction instead of sigwait.. async vs sync handling.
HTH.

how can i check if a process got a signal while he is in wait queue linux 2.4

i am implementing a module that acts as a fifo, in order to prevent two processes from accessing a buffer that is used for reading/writing i used a semaphore,when a semaphore blocks a process it moves it into the wait queue, my question is how can i check if while that process is waiting it received a signal because if it did then i would like to stop what ever that process was doing (reading or writing) and return an error.
the only function i am familiar with is sigpending(sigset_t *set) but i am not really sure how to use it, any help will be appreciated.
(when i say read/write i mean the function that were implemented for the module in fops)
To allow a sleeping task to be woken up when it receives a signal, set the task state to TASK_INTERRUPTIBLE instead of TASK_UNINTERRUPTIBLE.
Such a signal wakeup happens completely independently from any wait queues, so it must be checked for separately (with signal_pending()).
A typical wait loop looks like this:
DECLARE_WAITQUEUE(entry, current);
...
if (need_to_wait) {
add_wait_queue(&wq, &entry);
for (;;) {
set_current_state(TASK_INTERRUPTIBLE);
if (!need_to_wait)
break;
schedule();
if (signal_pending(current)) {
remove_wait_queue(&wq, &entry);
return -EINTR; /* or -ERESTARTSYS */
}
}
set_current_state(TASK_RUNNING);
remove_wait_queue(&wq, &entry);
}
....

How do I "disengage" from `accept` on a blocking socket when signalled from another thread?

I am in the same situation as this guy, but I don't quite understand the answer.
The problem:
Thread 1 calls accept on a socket, which is blocking.
Thread 2 calls close on this socket.
Thread 1 continues blocking. I want it to return from accept.
The solution:
what you should do is send a signal to the thread which is blocked in
accept. This will give it EINTR and it can cleanly disengage - and
then close the socket. Don't close it from a thread other than the one
using it.
I don't get what to do here -- when the signal is received in Thread 1, accept is already blocking, and will continue to block after the signal handler has finished.
What does the answer really mean I should do?
If the Thread 1 signal handler can do something which will cause accept to return immediately, why can't Thread 2 do the same without signals?
Is there another way to do this without signals? I don't want to increase the caveats on the library.
Instead of blocking in accept(), block in select(), poll(), or one of the similar calls that allows you to wait for activity on multiple file descriptors and use the "self-pipe trick". All of the file descriptors passed to select() should be in non-blocking mode. One of the file descriptors should be the server socket that you use with accept(); if that one becomes readable then you should go ahead and call accept() and it will not block. In addition to that one, create a pipe(), set it to non-blocking, and check for the read side becoming readable. Instead of calling close() on the server socket in the other thread, send a byte of data to the first thread on the write end of the pipe. The actual byte value doesn't matter; the purpose is simply to wake up the first thread. When select() indicates that the pipe is readable, read() and ignore the data from the pipe, close() the server socket, and stop waiting for new connections.
The accept() call will return with error code EINTR if a signal is caught before a connection is accepted. So check the return value and error code then close the socket accordingly.
If you wish to avoid the signal mechanism altogether, use select() to determine if there are any incoming connections ready to be accepted before calling accept(). The select() call can be made with a timeout so that you can recover and respond to abort conditions.
I usually call select() with a timeout of 1000 to 3000 milliseconds from a while loop that checks for an exit/abort condition. If select() returns with a ready descriptor I call accept() otherwise I either loop around and block again on select() or exit if requested.
Call shutdown() from Thread 2. accept will return with "invalid argument".
This seems to work but the documentation doesn't really explain its operation across threads -- it just seems to work -- so if someone can clarify this, I'll accept that as an answer.
Just close the listening socket, and handle the resulting error or exception from accept().
I believe signals can be used without increasing "the caveats on the library". Consider the following:
#include <pthread.h>
#include <signal.h>
#include <stddef.h>
static pthread_t thread;
static volatile sig_atomic_t sigCount;
/**
* Executes a concurrent task. Called by `pthread_create()`..
*/
static void* startTask(void* arg)
{
for (;;) {
// calls to `select()`, `accept()`, `read()`, etc.
}
return NULL;
}
/**
* Starts concurrent task. Doesn't return until the task completes.
*/
void start()
{
(void)pthread_create(&thread, NULL, startTask, NULL);
(void)pthread_join(thread);
}
static void noop(const int sig)
{
sigCount++;
}
/**
* Stops concurrent task. Causes `start()` to return.
*/
void stop()
{
struct sigaction oldAction;
struct sigaction newAction;
(void)sigemptyset(&newAction.sa_mask);
newAction.sa_flags = 0;
newAction.sa_handler = noop;
(void)sigaction(SIGTERM, &newAction, &oldAction);
(void)pthread_kill(thread, SIGTERM); // system calls return with EINTR
(void)sigaction(SIGTERM, &oldAction, NULL); // restores previous handling
if (sigCount > 1) // externally-generated SIGTERM was received
oldAction.sa_handler(SIGTERM); // call previous handler
sigCount = 0;
}
This has the following advantages:
It doesn't require anything special in the task code other than normal EINTR handling; consequently, it makes reasoning about resource leakage easier than using pthread_cancel(), pthread_cleanup_push(), pthread_cleanup_pop(), and pthread_setcancelstate().
It doesn't require any additional resources (e.g. a pipe).
It can be enhanced to support multiple concurrent tasks.
It's fairly boilerplate.
It might even compile. :-)

Resources