How are system calls interrupted by signal? - linux

My understanding is as following :
the blocking syscall would normally place the process in the 'TASK_INTERRUPTIBLE' state so that when a signal is delivered, the kernel places the process into 'TASK_RUNNING' state. And the process will be scheduled to run when the next timer tick happens , so that the syscall is interrupted .
But I did a small test , it failed . I worte a usermode process , which called sleep(). And I changed the process's state into TASK_RUNNING in kernel , but sleep() did not be interrupted at all and the process was still sleeping.
Then I tryed wake_up_process(process) , it failed.
Then I tryed set_tsk_thread_flag(process,TIF_SIGPENDING), it failed.
Then I tryed set_tsk_thread_flag(process,TIF_SIGPENDING) and wake_up_process(process), succeeded !! sleep() was interrupted and the process started to run .
So it's not that simple. Does anyone know how exactly are system calls interrupted by signal ?

Check out __send_signal from signal.c. It calls complete_signal near the end, which eventually calls this little function:
void signal_wake_up_state(struct task_struct *t, unsigned int state)
{
set_tsk_thread_flag(t, TIF_SIGPENDING);
/*
* TASK_WAKEKILL also means wake it up in the stopped/traced/killable
* case. We don't check t->state here because there is a race with it
* executing another processor and just now entering stopped state.
* By using wake_up_state, we ensure the process will wake up and
* handle its death signal.
*/
if (!wake_up_state(t, state | TASK_INTERRUPTIBLE))
kick_process(t);
}
And that's how you do it. Note that it is not enough to set the thread flag: you have to use a wakeup function to ensure the process is scheduled.

Related

Condition of do_group_exit() in get_signal() in Linux kernel

Can anybody explain what do_signal() and get_signal() functions do? And what does this line of code in get_signal() actually mean, i.e. when exactly it would run:
/*
* Death signals, no core dump.
*/
do_group_exit(ksig->info.si_signo);
/* NOTREACHED */
This is for example in: https://elixir.bootlin.com/linux/v4.7/source/kernel/signal.c#L2307
get_signal is arch-independent, and returns the next signal in the queue for the current task. If that signal has no handler, it does the default action, which could exit the task (e.g. on SIGSEGV or SIGKILL) with the line of code you linked to, but could also suspend the task, or core dump the task and then exit.
For more on default actions, have a look at the "Standard signals" section of signal(7).
do_signal is arch-specific, and called by the arch-specific code when exiting the kernel. Its job is to call get_signal. If that returns a signal, meaning it's supposed to be handled by userspace, it pushes the signal handler stack frame.

How does SIGSTOP work in Linux kernel?

I am wondering how SIGSTOP works inside the Linux Kernel. How is it handled? And how the kernel stops running when it is handled?
I am familiar with the kernel code base. So, if you can reference kernel functions that will be fine, and in fact that is what I want. I am not looking for high level description from a user's perspective.
I have already bugged the get_signal_to_deliver() with printk() statements (it is compiling right now). But I would like someone to explain things in better details.
It's been a while since I touched the kernel, but I'll try to give as much detail as possible. I had to look up some of this stuff in various other places, so some details might be a little messy, but I think this gives a good idea of what happens under the hood.
When a signal is raised, the TIF_SIGPENDING flag is set in the process descriptor structure. Before returning to user mode, the kernel tests this flag with test_thread_flag(TIF_SIGPENDING), which will return true (because a signal is pending).
The exact details of where this happens seem to be architecture dependent, but you can see an example for um:
void interrupt_end(void)
{
struct pt_regs *regs = &current->thread.regs;
if (need_resched())
schedule();
if (test_thread_flag(TIF_SIGPENDING))
do_signal(regs);
if (test_and_clear_thread_flag(TIF_NOTIFY_RESUME))
tracehook_notify_resume(regs);
}
Anyway, it ends up calling arch_do_signal(), which is also architecture dependent and is defined in the corresponding signal.c file (see the example for x86):
void arch_do_signal(struct pt_regs *regs)
{
struct ksignal ksig;
if (get_signal(&ksig)) {
/* Whee! Actually deliver the signal. */
handle_signal(&ksig, regs);
return;
}
/* Did we come from a system call? */
if (syscall_get_nr(current, regs) >= 0) {
/* Restart the system call - no handlers present */
switch (syscall_get_error(current, regs)) {
case -ERESTARTNOHAND:
case -ERESTARTSYS:
case -ERESTARTNOINTR:
regs->ax = regs->orig_ax;
regs->ip -= 2;
break;
case -ERESTART_RESTARTBLOCK:
regs->ax = get_nr_restart_syscall(regs);
regs->ip -= 2;
break;
}
}
/*
* If there's no signal to deliver, we just put the saved sigmask
* back.
*/
restore_saved_sigmask();
}
As you can see, arch_do_signal() calls get_signal(), which is also in signal.c.
The bulk of the work happens inside get_signal(), it's a huge function, but eventually it seems to process the special case of SIGSTOP here:
if (sig_kernel_stop(signr)) {
/*
* The default action is to stop all threads in
* the thread group. The job control signals
* do nothing in an orphaned pgrp, but SIGSTOP
* always works. Note that siglock needs to be
* dropped during the call to is_orphaned_pgrp()
* because of lock ordering with tasklist_lock.
* This allows an intervening SIGCONT to be posted.
* We need to check for that and bail out if necessary.
*/
if (signr != SIGSTOP) {
spin_unlock_irq(&sighand->siglock);
/* signals can be posted during this window */
if (is_current_pgrp_orphaned())
goto relock;
spin_lock_irq(&sighand->siglock);
}
if (likely(do_signal_stop(ksig->info.si_signo))) {
/* It released the siglock. */
goto relock;
}
/*
* We didn't actually stop, due to a race
* with SIGCONT or something like that.
*/
continue;
}
See the full function here.
do_signal_stop() does the necessary processing to handle SIGSTOP, you can also find it in signal.c. It sets the task state to TASK_STOPPED with set_special_state(TASK_STOPPED), a macro that is defined in include/sched.h that updates the current process descriptor status. (see the relevant line in signal.c). Further down, it calls freezable_schedule() which in turn calls schedule(). schedule() calls __schedule() (also in the same file) in a loop until an eligible task is found. __schedule() attempts to find the next task to schedule (next in the code), and the current task is prev. The state of prev is checked, and because it was changed to TASK_STOPPED, deactivate_task() is called, which moves the task from the run queue to the sleep queue:
} else {
...
deactivate_task(rq, prev, DEQUEUE_SLEEP | DEQUEUE_NOCLOCK);
...
}
deactivate_task() (also in the same file) removes the process from the runqueue by decrementing the on_rq field of the task_struct to 0 and calling dequeue_task(), which moves the process to the new (waiting) queue.
Then, schedule() checks the number of runnable processes and selects the next task to enter the CPU according to the scheduling policies in effect (I think this is a little bit out of scope by now).
At the end of the day, SIGSTOP moves a process from the runnable queue to a waiting queue until that process receives SIGCONT.
Nearly every time there is an interrupt, the kernel suspends some process from running and switches to running the interrupt handler (the only exception being when there is no process running). Likewise, the kernel will suspend processes that run too long without giving up the CPU (and technically that's the same thing: it just originates from the timer interrupt or possibly an IPI). Ordinarily in these cases, the kernel then puts the suspended process back on the run queue and when the scheduling algorithm decides the time is right, it is resumed.
In the case of SIGSTOP, the same basic thing happens: the affected processes are suspended due to the reception of the stop signal. They just don't get put back on the run queue until SIGCONT is sent. Nothing extraordinary here: SIGSTOP is just instructing the kernel to make a process non-runnable until further notice.
[One note: you seemed to imply that the kernel stops running with SIGSTOP. That is of course not the case. Only the SIGSTOPped processes stop running.]

How to know the signal delivered to thread

I have a ARM based embedded system running 2.6.33.
A main process-A creates another process-B. Both are aplication process with Real time RR policy. This proc-B creates few threads with pthread_create(). I guess one of the thread is doing some wrong and the process is killed.
On using wait() in process-A i get status 1 returned (NORMAL) as shown below.
I want to know how to get which signal has been delivered to which thread inside
process-B.
waitpid(-1, &status, WUNTRACED | WCONTINUED)
and
if (WIFEXITED(status))
printf("Process %d terminated normally, status %d\n", pid,WEXITSTATUS(status));
Followed the link but got the same status as 1.
http://www.cs.cf.ac.uk/Dave/C/node32.html#SECTION003240000000000000000
Is there any other ways to find out the correct exit status of all threads and signal if any are sent to these threads ?
Ok, firstly, you should know that multithreading and signalling don't mix very well! This is in large reason due to the fact that signals are delivered to a PID; an MT app has 1 PID but multiple threads; which thread will 'get' / handle the signal?
Thus, the 'usual' strategy is to block all signal in all threads except one thread - a dedicated synchronous 'signal handler' thread (it typically issues the blocking sigwait(2); the return value is the signal that just arrived!).
Here's a (simplistic) app to demo mixing threads and signalling.
Second, to understand some detail about how/why a process died - technically, received a signal - use sigaction(2) with the SA_SIGINFO flag. The signal handler signature now is:
void func(int signo, siginfo_t *info, void *context)
The struct siginfo_t will give you all the detail you need about how/why this process received this signal! Ref: sigaction(2) man page.
Of course using this approach does mean that you use sigaction instead of sigwait.. async vs sync handling.
HTH.

Linux kernel - wait queues

I'm reading "Linux kernel development 3rd edition by Robert Love" to get a general idea about how the Linux kernel works..(2.6.2.3)
I'm confused about how wait queues work for example this code:
/* ‘q’ is the wait queue we wish to sleep on */
DEFINE_WAIT(wait);
add_wait_queue(q, &wait);
while (!condition) { /* condition is the event that we are waiting for */
prepare_to_wait(&q, &wait, TASK_INTERRUPTIBLE);
if (signal_pending(current))
/* handle signal */
schedule();
}
finish_wait(&q, &wait);
I want to know which process is running this code? is it a kernel thread? whose process time is this?
And also in the loop, while the condition is still not met we will continue sleeping and call schedule to run another process the question is when do we return to this loop?
The book says that when a process sleeps, it's removed from our run queue, else it would be waken and have to enter a busy loop...
Also says: "sleeping should always be handled in a loop that ensures that the condition for which the task is waiting has indeed occurred."
I just want to know in what context is this loop running?
Sorry if this is a stupid Question. I'm just having trouble seeing the big pic
Which process is running the code? The process that called it. I don't mean to make fun of the question but the gist is that kernel code can run in different contexts: Either because a system call led to this place, because it is in a interrupt handler, or because it is a callback function called from another context (such as workqueues or timer functions).
Since this example is sleeping, it must be in a context where sleeping is allowed, meaning it is executed in response to a system call or at least in a kernel thread. So the answer is the process time is taken from the process (or kernel thread) that called into this kernel code that needs to sleep. That is the only place where sleeping is allowed in the first place.
A certain special case are workqueues, these are explicitly for functions that need to sleep. Typical use would be to queue a function that needs to sleep from a context where sleeping is forbidden. In that case, the process context is that of one of the kernel worker threads designated to process workqueue items.
You will return to this loop when the wait_queue is woken up, which either sets one task waiting on the queue to runnable or all of them, depending on the wake_up function called.
The most important thing is, forget about this unless you are interested in the implementation details. Since many people got this wrong and it's basically the same thing everywhere it's needed, there have long been macros encapsulating the whole procedure. Look up wait_event(), that's how your example should really look like:
wait_event(q, condition);
As per your example... I added comments....
NOTE: while creating waiting queue by default it will be in sleep stat.
DEFINE_WAIT(wait); /* first wait ---> it the kernel global wait queue it is pointing */
add_wait_queue(q, &wait); /* first wait ---> it the kernel global wait queue it is pointing using add_wait_queue(q, &wait); ---> you are adding your own waiting queue (like appending linked list) */
while (!condition) {
/* condition is the event that we are waiting for */
/*condition --> Let's say you are getting data from user space in write method (using __get_user()) */
prepare_to_wait(&q, &wait, TASK_INTERRUPTIBLE);
/* This will wait when any wake_up_process() call will be generated having interrupt */
if (signal_pending(current))
/* This is continuously monitoring if any signal is pending on current CPU on which wait queue is running while not pending any signal generally used return -ERESTARTSYS; or "break" the loop if interrupts came exa., SIGINT or SIGKILL and finishes wait queue statement to check again /
/ handle signal */
schedule(); // Scheduling of wait queue
// Remove from global data structure
}
finish_wait(&q, &wait); //Finishing wait queue

How can a process kill itself?

#include<stdlib.h>
#include<unistd.h>
#include<signal.h>
int main(){
pid_t pid = fork();
if(pid==0){
system("watch ls");
}
else{
sleep(5);
killpg(getpid(),SIGTERM); //to kill the complete process tree.
}
return 0;
}
Terminal:
anirudh#anirudh-Aspire-5920:~/Desktop/testing$ gcc test.c
anirudh#anirudh-Aspire-5920:~/Desktop/testing$ ./a.out
Terminated
for the first 5 secs the output of the "watch ls" is shown and then it terminates because I send a SIGTERM.
Question: How can a process kills itself ? I have done kill(getpid(),SIGTERM);
My hypothesis:
so during the kill() call the process switches to kernel mode. The kill call sends the SIGTERM to the process and copies it in the process's process table. when the process comes back to user mode it sees the signal in its table and it terminates itself (HOW ? I REALLY DO NOT KNOW )
(I think I am going wrong (may be a blunder) somewhere in my hypothesis ... so Please enlighten me)
This code is actually a stub which I am using to test my other modules of the Project.
Its doing the job for me and I am happy with it but there lies a question in my mind how actually a process kills itself. I want to know the step by step hypothesis.
Thanks in advance
Anirudh Tomer
Your process dies because you are using killpg(), that sends a signal to a process group, not to a process.
When you fork(), the children inherits from the father, among the other things, the process group. From man fork:
* The child's parent process ID is the same as the parent's process ID.
So you kill the parent along with the child.
If you do a simple kill(getpid(), SIGTERM) then the father will kill the child (that is watching ls) and then will peacefully exit.
so during the kill() call the process switches to kernel mode. The kill call sends the SIGTERM to the process and copies it in the process's process table. when the process comes back to user mode it sees the signal in its table and it terminates itself (HOW ? I REALLY DO NOT KNOW )
In Linux, when returning from the kernel mode to the user-space mode the kernel checks if there are any pending signals that can be delivered. If there are some it delivers the signals just before returning to the user-space mode. It can also deliver signals at other times, for example, if a process was blocked on select() and then killed, or when a thread accesses an unmapped memory location.
I think it when it sees the SIGTERM signal in its process tables it first kills its child processes( complete tree since I have called killpg() ) and then it calls exit().
I am still looking for a better answer to this question.
kill(getpid(), SIGKILL); // itself I think
I tested it after a fork with case 0: and it quit regular from separate parent process.
I don't know if this is a standard certification method ....
(I can see from my psensor tool that CPU usage return in 34% like a normal program code with
a counter stopped ) .
This is super-easy in Perl:
{
local $SIG{TERM} = "IGNORE";
kill TERM => -$$;
}
Conversion into C is left as an exercise for the reader.

Resources