How to implement blocking call without wasting CPU time? - linux

I have some HW resource which triggers an interrupt when finished.
I want to implement a function activateHw() which would return only when the action is finished (meaning, when the interrupt is triggered) but I don't want it to waste CPU time (no interrupt polling). Basically, what I want is:
bool activateHw() {
trigger_hw();
sleep_until_interrupt_arrived();
return true;
}
How can I achieve that in Linux?

If you are writing a kernel module, you can use request_irq() to set up a callback / interrupt handler. Your interrupt handler can write the data to character device. Your blocking function just needs to call the poll syscall on the character device. It will block and sleep till data is available.
Have a look at gpio.txt , specifically how gpio pins can be setup to write to /sys/.../gpioxx/value by configuring what edge should trigger the interupt. This can be used for userspace code as well if the programme is not too time critical.
here is a SO question that seemed related.

I suggest you use a wait queue. In the interrupt handler you wake up the waiting thread. That way you will not waste resources (eg. using a spin lock).
Take a look at this tutorial.
Take a look at the linux source, an example usage would be /drivers/char/hpet.c or drivers/char/rtc.c.

Related

Can the same timer interrupt occur in parallel?

I implemented one timer interrupt handler in kernel module.
This timer interrupt handler requires about 1000us to run.
And I want this timer to trigger up every 10us.
(In doing so, I hope the same handler will be performed in parallel.)
(I know that this can create a tremendous amount of interrupt overhead, but I want to implement it for some testing.)
But this handler does not seem to run in parallel.
Timer interrupt seems to wait until the handler in progress is finished.
Can the same timer interrupt occur in parallel?
If not, is there a kernel mechanism that can run the same handler in parallel?
If the timer triggers every 10us, and requires 1000us (1ms) to complete, you would require 100 dedicated cpu's to barely keep up to the timers. The short answer is no, the interrupt system isn't going to support this. If an interrupt recursed, it would inevitably consume the interrupt handler stack.
Interrupts typically work by having a short bit of code be directly invoked when the interrupt asserts. If more work is to be done, this short bit would schedule a slower bit to follow on, and inhibit this source of interrupt. This is to minimize the latency caused by disparate devices seeking cpu attention. The slower bit, when it determines it has satiated the device request, can re-enable interrupts from this source.
[ In linux, the short bit is called the top half; the slower bit the bottom half. It is a bit confusing, because decades of kernel implementation pre-linux named it exactly the other way around. Best to avoid these terms. ]
One of many ways to get the effect you desire is to have this slow handler release a semaphore then re-enable the interrupt. You could then have an appropriate number of threads sit in a loop acquiring the semaphore then performing your task.

Clarification about the behaviour of request_threaded_irq

I have scoured the web, but haven't found a convincing answer to a couple of related questions I have, with regard to the "request_threaded_irq" feature.
Question1:
Firstly, I was reading this article, regarding threaded IRQ's:
http://lwn.net/Articles/302043/
and there is this one line that isn't clear to me:
"Converting an interrupt to threaded makes only sense when the handler
code takes advantage of it by integrating tasklet/softirq
functionality and simplifying the locking."
I understand had we gone ahead with a "traditional", top half/bottom half approach, we would have needed either spin-locks or disable local IRQ to meddle with shared data. But, what I don't understand is, how would threaded interrupts simplify the need for locking by integrating tasklet/softirq functionality.
Question2:
Secondly, what advantage (if any), does a request_threaded_handler approach have over a work_queue based bottom half approach ? In both cases it seems, as though the "work" is deferred to a dedicated thread. So, what is the difference ?
Question3:
Lastly, in the following prototype:
int request_threaded_irq(unsigned int irq, irq_handler_t handler, irq_handler_t thread_fn, unsigned long irqflags, const char *devname, void *dev_id)
Is it possible that the "handler" part of the IRQ is continuously triggered by the relevant IRQ (say a UART receving characters at a high rate), even while the "thread_fn"(writing rx'd bytes to a circular buffer) part of the interrupt handler is busy processing IRQ's from previous wakeups ? So, wouldn't the handler be trying to "wakeup" an already running "thread_fn" ? How would the running irq thread_fn behave in that case ?
I would really appreciate if someone can help me understand this.
Thanks,
vj
For Question 2,
An IRQ thread on creation is setup with a higher priority, unlike workqueues.
In kernel/irq/manage.c, you'll see some code like the following for creation of kernel threads for threaded IRQs:
static const struct sched_param param = {
.sched_priority = MAX_USER_RT_PRIO/2,
};
t = kthread_create(irq_thread, new, "irq/%d-%s", irq,
new->name);
if (IS_ERR(t)) {
ret = PTR_ERR(t);
goto out_mput;
}
sched_setscheduler_nocheck(t, SCHED_FIFO, &param);
Here you can see, the scheduling policy of the kernel thread is set to an RT one (SCHED_FIFO) and the priority of the thread is set to MAX_USER_RT_PRIO/2 which is higher than regular processes.
For Question 3,
The situation you described can also occur with normal interrupts. Typically in the kernel, interrupts are disabled while an ISR executes. During the execution of the ISR, characters can keep filling the device's buffer and the device can and must continue to assert an interrupt even while interrupts are disabled.
It is the job of the device to make sure the IRQ line is kept asserted till all the characters are read and any processing is complete by the ISR. It is also important that the interrupt is level triggered, or depending on the design be latched by the interrupt controller.
Lastly, the device/peripheral should have an adequately sized FIFO so that characters delivered at a high rate are not lost by a slow ISR. The ISR should also be designed to read as many characters as possible when it executes.
Generally speaking what I've seen is, a controller would have a FIFO of a certain size X, and when the FIFO is filled X/2, it would fire an interrupt that would cause the ISR to grab as much data as possible. The ISR reads as much as possible and then clears the interrupt. Meanwhile, if the FIFO is still X/2, the device would keep the interrupt line asserted causing the ISR to execute again.
Previously, the bottom-half was not a task and still could not block. The only difference was that interrupts were disabled. The tasklet or softirq allow different inter-locks between the driver's ISR thread and the user API (ioctl(), read(), and write()).
I think the work queue is near equivalent. However, the tasklet/ksoftirq has a high priority and is used by all ISR based functionality on that processor. This may give better scheduling opportunities. Also, there is less for the driver to manage; everything is already built-in to the kernel's ISR handler code.
You must handle this. Typically ping-pong buffers can be used or a kfifo like you suggest. The handler should be greedy and get all data from the UART before returning IRQ_WAKE_THREAD.
For Question no 3,
when an threadedirq is activated the corresponding interrupt line is masked / disabled. when the threadedirq runs and completes it enables it towards the end of the it. hence there won't be any interrupt firing while the respective threadedirq is running.
The original work of converting "hard"/"soft" handlers to threaded handlers was done by Thomas Gleixner & team when building the PREEMPT_RT Linux (aka Linux-as-an-RTOS) project (it's not part of mainline).
To truly have Linux run as an RTOS, we cannot tolerate a situation where an interrupt handler interrupts the most critical rt (app) thread; but how can we ensure that the app thread even overrides an interrupt?? By making it (the interrupt) threaded, schedulable (SCHED_FIFO) and have a lower priority than the app thread (interrupt threads rtprio defaults to 50). So a "rt" SCHED_FIFO app thread with a rtprio of 60 would be able to "preempt" (closely enough that it works) even an interrupt thread. That should answer your Qs. 2.
Wrt to Qs 3:
As others have said, your code must handle this situation.
Having said that, pl note that a key point to using a threaded handler is so that you can do work that (possibly) blocks (sleeps). If your "bottom half" work is guaranteed to be non-blocking and must be fast, pl use the traditional style 'top-half/bh' handlers.
How can we do that? Simple: don't use request_threaded_irq() just call request_irq() - the comment in the code clearly says (wrt 3rd parameter):
* #thread_fn: Function called from the irq handler thread
* If NULL, no irq thread is created"
Alternatively, you can pass the IRQF_NO_THREAD flag to request_irq.
(BTW, a quick check with cscope on the 3.14.23 kernel source tree shows that request_irq() is called 1502 times [giving us non-threaded interrupt handling], and request_threaded_irq() [threaded interrupts] is explicitly called 204 times).

does kernel's panic() function completely freezes every other process?

I would like to be confirmed that kernel's panic() function and the others like kernel_halt() and machine_halt(), once triggered, guarantee complete freezing of the machine.
So, are all the kernel and user processes frozen? Is panic() interruptible by the scheduler? The interrupt handlers could still be executed?
Use case: in case of serious error, I need to be sure that the hardware watchdog resets the machine. To this end, I need to make sure that no other thread/process is keeping the watchdog alive. I need to trigger a complete halt of the system. Currently, inside my kernel module, I simply call panic() to freeze everything.
Also, the user-space halt command is guaranteed to freeze the system?
Thanks.
edit: According to: http://linux.die.net/man/2/reboot, I think the best way is to use reboot(LINUX_REBOOT_CMD_HALT): "Control is given to the ROM monitor, if there is one"
Thank you for the comments above. After some research, I am ready to give myself a more complete answer, below:
At least for the x86 architecture, the reboot(LINUX_REBOOT_CMD_HALT) is the way to go. This, in turn, calls the syscall reboot() (see: http://lxr.linux.no/linux+v3.6.6/kernel/sys.c#L433). Then, for the LINUX_REBOOT_CMD_HALT flag (see: http://lxr.linux.no/linux+v3.6.6/kernel/sys.c#L480), the syscall calls kernel_halt() (defined here: http://lxr.linux.no/linux+v3.6.6/kernel/sys.c#L394). That function calls syscore_shutdown() to execute all the registered system core shutdown callbacks, displays the "System halted" message, then it dumps the kernel, AND, finally, it calls machine_halt(), that is a wrapper for native_machine_halt() (see: http://lxr.linux.no/linux+v3.6.6/arch/x86/kernel/reboot.c#L680). It is this function that stops the other CPUs (through machine_shutdown()), then calls stop_this_cpu() to disable the last remaining working processor. The first thing that this function does is to disable interrupts on the current processor, that is the scheduler is no more able to take control.
I am not sure why the syscall reboot() still calls do_exit(0), after calling kernel_halt(). I interpret it like that: now, with all processors marked as disabled, the syscall reboot() calls do_exit(0) and ends itself. Even if the scheduler is awoken, there are no more enabled processors on which it could schedule some task, nor interrupt: the system is halted. I am not sure about this explanation, as the stop_this_cpu() seems to not return (it enters an infinite loop). Maybe is just a safeguard, for the case when the stop_this_cpu() fails (and returns): in this case, do_exit() will end cleanly the current task, then the panic() function is called.
As for the panic() code (defined here: http://lxr.linux.no/linux+v3.6.6/kernel/panic.c#L69), the function first disables the local interrupts, then it disables all the other processors, except the current one by calling smp_send_stop(). Finally, as the sole task executing on the current processor (which is the only processor still alive), with all local interrupts disabled (that is, the preemptible scheduler -- a timer interrupt, after all -- has no chance...), then the panic() function loops some time or it calls emergency_restart(), that is supposed to restart the processor.
If you have better insight, please contribute.

How do system calls like select() or poll() work under the hood?

I understand that async I/O ops via select() and poll() do not use processor time i.e its not a busy loop but then how are these really implemented under the hood ? Is it supported in hardware somehow and is that why there is not much apparent processor cost for using these ?
It depends on what the select/poll is waiting for. Let's consider a few cases; I'm going to assume a single-core machine for simplification.
First, consider the case where the select is waiting on another process (for example, the other process might be carrying out some computation and then outputs the result through a pipeline). In this case the kernel will mark your process as waiting for input, and so it will not provide any CPU time to your process. When the other process outputs data, the kernel will wake up your process (give it time on the CPU) so that it can deal with the input. This will happen even if the other process is still running, because modern OSes use preemptive multitasking, which means that the kernel will periodically interrupt processes to give other processes a chance to use the CPU ("time-slicing").
The picture changes when the select is waiting on I/O; network data, for example, or keyboard input. In this case, while archaic hardware would have to spin the CPU waiting for input, all modern hardware can put the CPU itself into a low-power "wait" state until the hardware provides an interrupt - a specially handled event that the kernel handles. In the interrupt handler the CPU will record the incoming data and after returning from the interrupt will wake up your process to allow it to handle the data.
There is no hardware support. Well, there is... but is nothing special and it depends on what kind of file descriptor are you watching. If there is a device driver involved, the implementation depends on the driver and/or the device. For example, sockets. If you wait for some data to read, there are a sequence of events:
Some process calls poll()/select()/epoll() system call to wait for data in a socket. There is a context switch from the user mode to the kernel.
The NIC interrupts the processor when some packet arrives. The interrupt routine in the driver push the packet in the back of a queue.
There is a kernel thread that takes data from that queue and wakes up the network code inside the kernel to process that packet.
When the packet is processed, the kernel determines the socket that was expecting for it, saves the data in the socket buffer and returns the system call back to user space.
This is just a very brief description, there are a lot of details missing but I think that is enough to get the point.
Another example where no drivers are involved is a unix socket. If you wait for data from one of them, the process that waits is added to a list. When other process on the other side of the socket writes data, the kernel checks that list and the point 4 is applied again.
I hope it helps. I think that examples are the best to undertand it.

How does Linux blocking I/O actually work?

In Linux, when you make a blocking i/o call like read or accept, what actually happens?
My thoughts: the process get taken out of the run queue, put into a waiting or blocking state on some wait queue. Then when a tcp connection is made (for accept) or the hard drive is ready or something for a file read, a hardware interrupt is raised which lets those processes waiting to wake up and run (in the case of a file read, how does linux know what processes to awaken, as there could be lots of processes waiting on different files?). Or perhaps instead of hardware interrupts, the individual process itself polls to check availability. Not sure, help?
Each Linux device seems to be implemented slightly differently, and the preferred way seems to vary every few Linux releases as safer/faster kernel features are added, but generally:
The device driver creates read and
write wait queues for a device.
Any process thread wanting to wait
for i/o is put on the appropriate
wait queue. When an interrupt occurs
the handler wakes up one or more
waiting threads. (Obviously the
threads don't run immediately as we are in interrupt
context, but are added to the
kernel's scheduling queue).
When scheduled by the kernel the
thread checks to see if conditions
are right for it to proceed - if not
it goes back on the wait queue.
A typical example (slightly simplified):
In the driver at initialisation:
init_waitqueue_head(&readers_wait_q);
In the read function of a driver:
if (filp->f_flags & O_NONBLOCK)
{
return -EAGAIN;
}
if (wait_event_interruptible(&readers_wait_q, read_avail != 0))
{
/* signal interrupted the wait, return */
return -ERESTARTSYS;
}
to_copy = min(user_max_read, read_avail);
copy_to_user(user_buf, read_ptr, to_copy);
Then the interrupt handler just issues:
wake_up_interruptible(&readers_wait_q);
Note that wait_event_interruptible() is a macro that hides a loop that checks for a condition - read_avail != 0 in this case - and repeatedly adds to the wait queue again if woken when the condition is not true.
As mentioned there are a number of variations - the main one is that if there is potentially a lot of work for the interrupt handler to do then it does the bare minimum itself and defers the rest to a work queue or tasklet (generally known as the "bottom half") and it is this that would wake the waiting threads.
See Linux Device Driver book for more details - pdf available here:
http://lwn.net/Kernel/LDD3
Effectivly the method will only returns when the file is ready to read, when data is on a socket, when a connection has arrived...
To make sure it can return immediatly you probably want to use the Select system call to find a ready file descriptor.
Read this: http://www.minix3.org/doc/
It's a very, clear, very easy to understand explanation. It generally applies to Linux, also.

Resources