I'm reading "Linux kernel development 3rd edition by Robert Love" to get a general idea about how the Linux kernel works..(2.6.2.3)
I'm confused about how wait queues work for example this code:
/* ‘q’ is the wait queue we wish to sleep on */
DEFINE_WAIT(wait);
add_wait_queue(q, &wait);
while (!condition) { /* condition is the event that we are waiting for */
prepare_to_wait(&q, &wait, TASK_INTERRUPTIBLE);
if (signal_pending(current))
/* handle signal */
schedule();
}
finish_wait(&q, &wait);
I want to know which process is running this code? is it a kernel thread? whose process time is this?
And also in the loop, while the condition is still not met we will continue sleeping and call schedule to run another process the question is when do we return to this loop?
The book says that when a process sleeps, it's removed from our run queue, else it would be waken and have to enter a busy loop...
Also says: "sleeping should always be handled in a loop that ensures that the condition for which the task is waiting has indeed occurred."
I just want to know in what context is this loop running?
Sorry if this is a stupid Question. I'm just having trouble seeing the big pic
Which process is running the code? The process that called it. I don't mean to make fun of the question but the gist is that kernel code can run in different contexts: Either because a system call led to this place, because it is in a interrupt handler, or because it is a callback function called from another context (such as workqueues or timer functions).
Since this example is sleeping, it must be in a context where sleeping is allowed, meaning it is executed in response to a system call or at least in a kernel thread. So the answer is the process time is taken from the process (or kernel thread) that called into this kernel code that needs to sleep. That is the only place where sleeping is allowed in the first place.
A certain special case are workqueues, these are explicitly for functions that need to sleep. Typical use would be to queue a function that needs to sleep from a context where sleeping is forbidden. In that case, the process context is that of one of the kernel worker threads designated to process workqueue items.
You will return to this loop when the wait_queue is woken up, which either sets one task waiting on the queue to runnable or all of them, depending on the wake_up function called.
The most important thing is, forget about this unless you are interested in the implementation details. Since many people got this wrong and it's basically the same thing everywhere it's needed, there have long been macros encapsulating the whole procedure. Look up wait_event(), that's how your example should really look like:
wait_event(q, condition);
As per your example... I added comments....
NOTE: while creating waiting queue by default it will be in sleep stat.
DEFINE_WAIT(wait); /* first wait ---> it the kernel global wait queue it is pointing */
add_wait_queue(q, &wait); /* first wait ---> it the kernel global wait queue it is pointing using add_wait_queue(q, &wait); ---> you are adding your own waiting queue (like appending linked list) */
while (!condition) {
/* condition is the event that we are waiting for */
/*condition --> Let's say you are getting data from user space in write method (using __get_user()) */
prepare_to_wait(&q, &wait, TASK_INTERRUPTIBLE);
/* This will wait when any wake_up_process() call will be generated having interrupt */
if (signal_pending(current))
/* This is continuously monitoring if any signal is pending on current CPU on which wait queue is running while not pending any signal generally used return -ERESTARTSYS; or "break" the loop if interrupts came exa., SIGINT or SIGKILL and finishes wait queue statement to check again /
/ handle signal */
schedule(); // Scheduling of wait queue
// Remove from global data structure
}
finish_wait(&q, &wait); //Finishing wait queue
Related
I am wondering how SIGSTOP works inside the Linux Kernel. How is it handled? And how the kernel stops running when it is handled?
I am familiar with the kernel code base. So, if you can reference kernel functions that will be fine, and in fact that is what I want. I am not looking for high level description from a user's perspective.
I have already bugged the get_signal_to_deliver() with printk() statements (it is compiling right now). But I would like someone to explain things in better details.
It's been a while since I touched the kernel, but I'll try to give as much detail as possible. I had to look up some of this stuff in various other places, so some details might be a little messy, but I think this gives a good idea of what happens under the hood.
When a signal is raised, the TIF_SIGPENDING flag is set in the process descriptor structure. Before returning to user mode, the kernel tests this flag with test_thread_flag(TIF_SIGPENDING), which will return true (because a signal is pending).
The exact details of where this happens seem to be architecture dependent, but you can see an example for um:
void interrupt_end(void)
{
struct pt_regs *regs = ¤t->thread.regs;
if (need_resched())
schedule();
if (test_thread_flag(TIF_SIGPENDING))
do_signal(regs);
if (test_and_clear_thread_flag(TIF_NOTIFY_RESUME))
tracehook_notify_resume(regs);
}
Anyway, it ends up calling arch_do_signal(), which is also architecture dependent and is defined in the corresponding signal.c file (see the example for x86):
void arch_do_signal(struct pt_regs *regs)
{
struct ksignal ksig;
if (get_signal(&ksig)) {
/* Whee! Actually deliver the signal. */
handle_signal(&ksig, regs);
return;
}
/* Did we come from a system call? */
if (syscall_get_nr(current, regs) >= 0) {
/* Restart the system call - no handlers present */
switch (syscall_get_error(current, regs)) {
case -ERESTARTNOHAND:
case -ERESTARTSYS:
case -ERESTARTNOINTR:
regs->ax = regs->orig_ax;
regs->ip -= 2;
break;
case -ERESTART_RESTARTBLOCK:
regs->ax = get_nr_restart_syscall(regs);
regs->ip -= 2;
break;
}
}
/*
* If there's no signal to deliver, we just put the saved sigmask
* back.
*/
restore_saved_sigmask();
}
As you can see, arch_do_signal() calls get_signal(), which is also in signal.c.
The bulk of the work happens inside get_signal(), it's a huge function, but eventually it seems to process the special case of SIGSTOP here:
if (sig_kernel_stop(signr)) {
/*
* The default action is to stop all threads in
* the thread group. The job control signals
* do nothing in an orphaned pgrp, but SIGSTOP
* always works. Note that siglock needs to be
* dropped during the call to is_orphaned_pgrp()
* because of lock ordering with tasklist_lock.
* This allows an intervening SIGCONT to be posted.
* We need to check for that and bail out if necessary.
*/
if (signr != SIGSTOP) {
spin_unlock_irq(&sighand->siglock);
/* signals can be posted during this window */
if (is_current_pgrp_orphaned())
goto relock;
spin_lock_irq(&sighand->siglock);
}
if (likely(do_signal_stop(ksig->info.si_signo))) {
/* It released the siglock. */
goto relock;
}
/*
* We didn't actually stop, due to a race
* with SIGCONT or something like that.
*/
continue;
}
See the full function here.
do_signal_stop() does the necessary processing to handle SIGSTOP, you can also find it in signal.c. It sets the task state to TASK_STOPPED with set_special_state(TASK_STOPPED), a macro that is defined in include/sched.h that updates the current process descriptor status. (see the relevant line in signal.c). Further down, it calls freezable_schedule() which in turn calls schedule(). schedule() calls __schedule() (also in the same file) in a loop until an eligible task is found. __schedule() attempts to find the next task to schedule (next in the code), and the current task is prev. The state of prev is checked, and because it was changed to TASK_STOPPED, deactivate_task() is called, which moves the task from the run queue to the sleep queue:
} else {
...
deactivate_task(rq, prev, DEQUEUE_SLEEP | DEQUEUE_NOCLOCK);
...
}
deactivate_task() (also in the same file) removes the process from the runqueue by decrementing the on_rq field of the task_struct to 0 and calling dequeue_task(), which moves the process to the new (waiting) queue.
Then, schedule() checks the number of runnable processes and selects the next task to enter the CPU according to the scheduling policies in effect (I think this is a little bit out of scope by now).
At the end of the day, SIGSTOP moves a process from the runnable queue to a waiting queue until that process receives SIGCONT.
Nearly every time there is an interrupt, the kernel suspends some process from running and switches to running the interrupt handler (the only exception being when there is no process running). Likewise, the kernel will suspend processes that run too long without giving up the CPU (and technically that's the same thing: it just originates from the timer interrupt or possibly an IPI). Ordinarily in these cases, the kernel then puts the suspended process back on the run queue and when the scheduling algorithm decides the time is right, it is resumed.
In the case of SIGSTOP, the same basic thing happens: the affected processes are suspended due to the reception of the stop signal. They just don't get put back on the run queue until SIGCONT is sent. Nothing extraordinary here: SIGSTOP is just instructing the kernel to make a process non-runnable until further notice.
[One note: you seemed to imply that the kernel stops running with SIGSTOP. That is of course not the case. Only the SIGSTOPped processes stop running.]
i am implementing a module that acts as a fifo, in order to prevent two processes from accessing a buffer that is used for reading/writing i used a semaphore,when a semaphore blocks a process it moves it into the wait queue, my question is how can i check if while that process is waiting it received a signal because if it did then i would like to stop what ever that process was doing (reading or writing) and return an error.
the only function i am familiar with is sigpending(sigset_t *set) but i am not really sure how to use it, any help will be appreciated.
(when i say read/write i mean the function that were implemented for the module in fops)
To allow a sleeping task to be woken up when it receives a signal, set the task state to TASK_INTERRUPTIBLE instead of TASK_UNINTERRUPTIBLE.
Such a signal wakeup happens completely independently from any wait queues, so it must be checked for separately (with signal_pending()).
A typical wait loop looks like this:
DECLARE_WAITQUEUE(entry, current);
...
if (need_to_wait) {
add_wait_queue(&wq, &entry);
for (;;) {
set_current_state(TASK_INTERRUPTIBLE);
if (!need_to_wait)
break;
schedule();
if (signal_pending(current)) {
remove_wait_queue(&wq, &entry);
return -EINTR; /* or -ERESTARTSYS */
}
}
set_current_state(TASK_RUNNING);
remove_wait_queue(&wq, &entry);
}
....
Wait(semaphore sem) {
DISABLE_INTS
sem.val--
if (sem.val < 0){
add thread to sem.L
block(thread)
}
ENABLE_INTS
Signal(semaphore sem){
DISABLE_INTS
sem.val++
if (sem.val <= 0) {
th = remove next
thread from sem.L
wakeup(th)
}
ENABLE_INTS
If block(thread) stops a thread from executing, how, where, and when does it return?
Which thread enables interrupts following the Wait()?
the thread that called block() shouldn’t return until another thread has called wakeup(thread)!
but how does that other thread get to run?
where exactly does the thread switch occur?
block(thread) works that way:
Enables interrupts
Uses some kind of waiting mechanism (provided by the operating system or the busy waiting in the simplest case) to wait until the wakeup(thread) on this thread is called. This means that in this point thread yields its time to the scheduler.
Disables interrupts and returns.
Yes, UP and DOWN are mostly useful when called from different threads, but it is not impossible that you call these with one thread - if you start semaphore with a value > 0, then the same thread can entry the critical section and execute both DOWN (before) and UP (after). Value which initializes the semaphore tells how many threads can enter the critical section at once, which might be 1 (mutex) or any other positive number.
How are the threads created? That is not shown on the lecture slide, because that is only a principle how semaphore works using a pseudocode. But it is a completely different story how you use those semaphores in your application.
I'm synchronizing reader and writer processes on Linux.
I have 0 or more process (the readers) that need to sleep until they are woken up, read a resource, go back to sleep and so on. Please note I don't know how many reader processes are up at any moment.
I have one process (the writer) that writes on a resource, wakes up the readers and does its business until another resource is ready (in detail, I developed a no starve reader-writers solution, but that's not important).
To implement the sleep / wake up mechanism I use a Posix condition value, pthread_cond_t. The clients call a pthread_cond_wait() on the variable to sleep, while the server does a pthread_cond_broadcast() to wake them all up. As the manual says, I surround these two calls with a lock/unlock of the associated pthread mutex.
The condition variable and the mutex are initialized in the server and shared between processes through a shared memory area (because I'm not working with threads, but with separate processes) an I'm sure my kernel / syscall support it (because I checked _POSIX_THREAD_PROCESS_SHARED).
What happens is that the first client process sleeps and wakes up perfectly. When I start the second process, it blocks on its pthread_cond_wait() and never wakes up, even if I'm sure (by the logs) that pthread_cond_broadcast() is called.
If I kill the first process, and launch another one, it works perfectly. In other words, the condition variable pthread_cond_broadcast() seems to wake up only one process a time. If more than one process wait on the very same shared condition variable, only the first one manages to wake up correctly, while the others just seem to ignore the broadcast.
Why this behaviour? If I send a pthread_cond_broadcast(), every waiting process should wake up, not just one (and, however, not always the same one).
Have you set the PTHREAD_PROCESS_SHARED attribute on both your condvar and mutex?
For Linux consult the following man pages:
pthread_mutexattr_init (with sample)
pthread_mutexattr_setpshared
pthread_condattr_init
pthread_condattr_setpshared
Methods, types, constants etc. are normally defined in /usr/include/pthread.h, /usr/include/nptl/pthread.h.
Do you test for some condition before calling pthread_cond_wait() ? I am asking because, it's a very common mistake : Your process must not call wait() unless you know some other process will call signal() (or broadcast()) later.
concidering this code (from pthread_cond_wait man page) :
pthread_mutex_lock(&mut);
while (x <= y) {
pthread_cond_wait(&cond, &mut);
}
/* operate on x and y */
pthread_mutex_unlock(&mut);
If your omit the while test, and just signal from another process whenever your (x <= y) condition is true, it won't work since the signal only wakes up the processes that are already waiting. If signal() was called before the other process calls wait() the signal will be lost and the waiting process will be waiting forever.
EDIT : About the while loop.
When you are signaling one process from another process it is set on the ''ready list'' but not necessarily scheduled and your condition (x <= y) may be change again since no one holds the lock. That's why you need to check for your condition each time you are about to wait. It should always be wakeup -> check if the condition is still true -> do work.
hope it's clear.
The documentation says that it should work... are you sure it's the same conditional value that the rest of the threads are looking at?
This is the example code from opengroup.org:
pthread_cond_wait(mutex, cond):
value = cond->value; /* 1 */
pthread_mutex_unlock(mutex); /* 2 */
pthread_mutex_lock(cond->mutex); /* 10 */
if (value == cond->value) { /* 11 */
me->next_cond = cond->waiter;
cond->waiter = me;
pthread_mutex_unlock(cond->mutex);
unable_to_run(me);
} else
pthread_mutex_unlock(cond->mutex); /* 12 */
pthread_mutex_lock(mutex); /* 13 */
pthread_cond_signal(cond):
pthread_mutex_lock(cond->mutex); /* 3 */
cond->value++; /* 4 */
if (cond->waiter) { /* 5 */
sleeper = cond->waiter; /* 6 */
cond->waiter = sleeper->next_cond; /* 7 */
able_to_run(sleeper); /* 8 */
}
pthread_mutex_unlock(cond->mutex); /* 9 */
what the last poster said is correct. the KEY to the whole cond-variable situation working correctly is that the cond-var is NOT signalled prior to it being waited on. its strictly a signal that is to be used when others (single or multiple) are waiting. when no one is waiting, its effectively a NOP. which, btw, is NOT how i believe it SHOULD work, but how it DOES work.
larry
I have designed an application which is running 20 instance of a thread.
for(int i = 0;i<20;i++)
{
threadObj[i].start();
}
How can I wait in the main thread until those 20 threads finish?
You need to use QThread::wait().
bool QThread::wait ( unsigned long time = ULONG_MAX )
Blocks the thread until either of
these conditions is met:
The thread associated with this
QThread object has finished execution (i.e. when it returns from
run()). This function will return true if the thread has finished. It
also returns true if the thread has
not been started yet.
time milliseconds has elapsed. If time is
ULONG_MAX (the default), then the wait
till never timeout (the thread must
return from run()). This function
will return false if the wait timed
out.
This provides similar functionality to
the POSIX pthread_join() function.
Just loop over the threads and call wait() for each one.
for(int i = 0;i < 20;i++)
{
threadObj[i].wait();
}
If you want to let the main loop run while you're waiting. (E.g. to process events and avoid rendering the application unresponsible.) You can use the signals & slots of the threads. QThread's got a finished() singal which you can connect to a slot that remembers which threads have finished yet.
You can also use QWaitCondition
What Georg has said is correct. Also remember you can call signal slot from across threads. So you can have your threads emit a signal to you upon completion. SO you can keep track of no of threads that have completed their tasks/have exited. This could be useful if you don't want your Main thread to go in a blocking call wait.