How does the kernel track which processes receive data from an interrupt? - io

In a preemptive kernel (say Linux), say process A makes a call to getc on stdin, so it's blocked waiting for a character. I feel like I have a fundamental misunderstanding of how the kernel knows then to wake process A at this point and deliver the data after it's received.
My understanding is then this process can be put into a suspended state while the scheduler schedules other processes/threads to run, or it gets preempted. When the keypress happens, through polling/interrupts depending on the implementation, the OS runs a device driver that decodes the key that was pressed. However it's possible (and likely) that my process A isn't currently running. At this point, I'm confused on how my process that was blocked waiting on I/O is now queued to run again, especially how it knows which process is waiting for what. It seems like the device drivers hold some form of a wait queue.
Similarly, and I'm not sure if this is exactly related to the above, but if my browser window, for example, is in focus, it seems to receive key presses but not other windows. Does every window/process have the ability to "listen" for keyboard events even if they're not in focus, but just don't for user experience sake?
So I'm curious how kernels (or how some) keep track of what processes are waiting on which events, and when those events come in, how it determines which processes to schedule to run?

The events that processes wait on are abstract software events, such as a particular queue is not empty, rather than concrete hardware events, such as a interrupt 4635 occurring.
Some configuration ( perhaps guided by a hardware description like device tree ) identifies interrupt 4635 as being a signal from a given serial device with a given address. The serial device driver configures itself so it can access the device registers of this serial port, and attaches its interrupt handler to the given interrupt identifier (4635).
Once configured, when an interrupt from the serial device is raised, the lowest level of the kernel invokes this serial device's interrupt handler. In turn, when the handler sees a new character arriving, it places it in the input queue of that device. As it enqueues the character, it may notice that some process(es) are waiting for that queue to be non-empty, and cause them to be run.
That approximately describes the situation using condition variables as the signalling mechanism between interrupts and processes, as was established in UNIX-y kernels 44 years ago. Other approaches involve releasing a semaphore on each character in the queue; or replying with messages for each character. There are many forms of synchronization that can be used.
Common to all such mechanisms, is that the caller chooses to suspend itself to wait for io to complete; and does so by associating its suspension with the instance of the object which it is expecting input from.
What happens next can vary; typically the waiting process, which is now running, reattempts to remove a character from the input queue. It is possible some other process got to it first, in which case, it merely goes back to waiting for the queue to become non empty.
So, the OS doesn't explicitly route the character from the device to the application; a series of implicit and indirect steps does.

Related

How the OS knows when an I/O operation has finished execution?

Consider the situation, where you issue a read from the disc (I/O operation). Then what is the exact mechanism that the OS uses to get to know whether the operation has been executed?
Then what is the exact mechanism that the OS uses to get to know whether the operation has been executed?
The exact mechanism depends on the specific hardware (and OS and scenario); but typically when a device finishes doing something the device triggers an IRQ that causes the CPU to interrupt whatever it was doing and switch to a device driver's interrupt handler.
Sometimes/often device driver ends up maintaining a queue or buffer of pending commands; so that when its interrupt handler is executed (telling it that a previous command has completed) it takes the next pending command and tells the device to start it. Sometimes/often this also includes some kind of IO priority scheme, where driver can ask device to do more important work sooner (while less important work is postponed/remains pending).
A device driver is typically also tied to scheduler in some way - a normal thread in user-space might (directly or indirectly - e.g. via. file system) request that data be transferred and the scheduler will be told to not give that thread CPU time because it's blocked/waiting for something; and then later when the transfer is completed the device driver's interrupt handler tells the scheduler that the requesting thread can continue, causing it to be unblocked/able to be given CPU time by scheduler again.

If I "get back to the main thread" then what exactly happens, and how do interrupts work with threads?

Background: I was using Beej's guide and he mentioned forking and ensuring you "get the zombies". An Operating Systems book I grabbed explained how the OS creates "threads" (I always thought it was a more fundamental piece), and by quoting it, I mean it the OS decides nearly everything. Basically they share all external resources, but they split the register and stack spaces (and I think a 3rd thing).
So I get to the waitpid function which http://www.qnx.com's developer docs explain very well. In fact, I read the entire section on threads, minus all the types of conditions after a Processes and Threads google.
The fact that I can split code up and put it back together doesn't confuse me. HOW I can do this is confusing.
In C and C++, your program is a Main() function, which goes forward, calls other functions, maybe loops forever (waiting for input or rendering), and then eventually quits or returns. In this model I see NO reason for it to stop beyond a "I'm waiting for something", in which case it just loops.
Well, it seems it can loop by setting certain things, like "I'm waiting for a semaphore" or "a response" or "an interrupt". Or maybe it gets interrupted without waiting for one. This is what confuses me.
The processor time-slices processes and threads. That's all fine and dandy, but how does it decide when to stop one? I understand that you get to the Polling function and say "Hey I'm waiting for input, clock tick or user do something". Somehow it tells this to the os? I'm not sure. But moreso:
It seems to be able to completely randomly interrupt or interject, even on a single-threaded application. So you're running one thread and suddenly waitpid() says "Hey, I finished a process, let me interrupt this, we both hate zombies, I gotta do this." and you're still looping on some calculation. So, what just happens??? I have no idea, somehow they both run and your computation isn't messed with, 'cause it's single threaded, but that somehow doesn't mean that it won't stop what it's doing to run waitpid() inside the same thread WHILE you're still doing your other app things.
Also confusing, is how you can be notified, like iOSes notifications, and say "Hey, I got some UI changes, get me off of 16 and put me back on 1 so I can change this thing". But same question as last paragraph, how does it interrupt a thread that's running?
I think I understand the splitting, but this joining is utterly confusing. It's like the textbooks have this "rabbit from hat" step I'm supposed to accept. Other SO posts told me they don't share the same stack, but that didn't help, now I'm imagining a slinky (stack) leaning over to another slinky, but unsure how it recombines to change the data.
Thanks for any help, I apologize that this is long, but I know someone's going to misinterpret this and give me the "they are different stacks" answer if I'm too concise here.
Thanks,
OK, I'll have a go, though it's gonna be 'economical with the truth':)
It's sorta like this:
The OS kernel scheduler/dispatcher is a state-machine for managing threads. A thread comprises a stack, (allocated at the time of thread creation), and a Thread Control Block, (TCB), struct in the kernel that holds thread state and can store thread context, (including user registers, especially the stack-pointer). A thread must have code to run, but the code is not dedicated to the thread - many threads can run the same code. Threads have states, eg. blocked on I/O, blocked on an inter-thread signal, sleeping for a timer period, ready, running on a core.
Threads belong to processes - a process must have at least one thread to run its code and has one created for it by the OS loader when the process starts up. The 'main thread' may then create others that will also belong to that process.
The state-machine inputs are software interrupts - system calls from those threads that are already running on cores, and hardware interrupts from perhiperal devices/controllers, (disk, network, mouse, KB etc), that use processor hardware features to stop the processor/s running instructions from the threads and 'immediately' run driver code instead.
The output of the state-machine is a set of threads running on cores. If there are fewer ready threads than cores, the OS will halt the unuseable cores. If there are more ready threads than cores, (ie. the machine is overloaded), the 'sheduling algorithm' that decided with threads to run takes into account several factors - thread and process priority, prority boosts for threads that have just become ready on I/O completion or inter-thread signal, foreground-process boosts and others.
The OS has the ability to stop any running thread on any core. It has an interprocessor hardware-interrupt channel and drivers that can force any thread to enter the OS and be blocked/stopped, (maybe because another thread has just beome ready and the OS scheduling algorithm has decided that a running thread must be immediately preempted).
The software intrrupts from running threads can change the set of running threads by requesting I/O, or by signaling other threads, (the events, mutexes, condition-variables and semaphores). The hardware interrupts from peripheral devices can change the set of running threads by signaling I/O completion.
When the OS gets these inputs, it uses that input, and internal state in containers of Thread Control Block and Process Control Block structs, to decide which set of ready threads to run next. It can block a thread from running by saving its context, (including registers, especially stack pointer), in its TCB and not returning from the interrupt. It can run a thread that was blocked by restoring its context from its TCB to a core and performing an interrupt-return, so allowing the thread to resume from where it left off.
The gain is that no thread that is waiting for I/O gets to run at all and so does not use any CPU and, when I/O becomes avilable, a waiting thread is made ready 'immediately' and, if there is a core available, running.
This combination of OS state data, and hardware/software interrupts, effciently matches up threads that can make forward progress with cores avalable to run them, and no CPU is wasted on polling I/O or inter-thread comms flags.
All this complexity, both in the OS and for the developer who has to design multithreaded apps and so put up with locks, synchronization, mutexes etc, has just one vital goal - high performance I/O. Without it, you can forget video streaming, BitTorrent and browsers - they would all be too piss-slow to be useable.
Statements and phrases like 'CPU quantum', 'give up the remainder of their time-slice' and 'round-robin' make me want to throw up.
It's a state-machine. Hardware and software interrupts go in, a set of running threads comes out. The hardware timer interrupt, (the one that can time-out system calls, allow threads to sleep and share out CPU on a box that is overloaded), though valuable, is just one of many.
So I'm on thread 16, and I need to get to thread 1 to modify UI. I
randomly stop it anywhere, "move the stack over to thread 1" then
"take its context and modify it"?
No, time for 'economical with truth' #2...
Thread 1 is running the GUI. To do this, it needs inputs from mouse, keyboard. The classic way for this to happen is that thread 1 waits, blocked, on a GUI input queue - a thread-safe producer-consumer queue, for KB/mouse messages. It's using no CPU - the cores are off running services and BitTorrent downloads. You hit a key on the keyboard, and the keyboard-controller hardware raises an interrupt line on the interrupt controller, causing a core to jump to the keyboard driver code as soon as it has finished its current instruction. The driver reads the KB controller, assembles a KeyPressed message and pushes it onto the input queue of the GUI thread with focus - your thread 1. The driver exits by calling the scheduler interrupt entry point so that a scheduling run can be performed and your GUI thread is assigned a core an run on it. To thread 1, all it has done is make a blocking 'pop' call on a queue and, eventually, it returns with a message to process.
So, thread 1 is performing:
void* HandleGui{
while(true){
GUImessage message=thread1InputQueue.pop();
switch(message.type){
.. // lots of case statements to handle all the possible GUI messages
..
..
};
};
};
If thread 16 wants to interact with the GUI, it cannot do it directly. All it can do is to queue a message to thread 1, in a similar way to the KB/mouse drivers, to instruct it to do stuff.
This may seem a bit restrictive, but the message from thread 16 can contain more than POD. It could have a 'RunMyCode' message type and contain a function pointer to code that thread 16 wants to be run in the context of thread 1. When thread 1 gets around to hadling the message, its 'RunMyCode' case statement calls the function pointer in the message. Note that this 'simple' mechanism is asynchronous - thread 16 has issued the mesage and runs on - it has no idea when thread 1 will get around to running the function it passed. This can be a problem if the function accesses any data in thread 16 - thread 16 may also be accessing it. If this is an issue, (and it may not be - all the data required by the function may be in the message, which can be passed into the function as a parameter when thread 1 calls it), it is possible to make the function call synchronous by making thread 16 wait until thread 1 has run the function. One way would be for the function signal an OS synchronization object as its last line - an object upon which thread 16 will wait immediately after queueing its 'RunMyCode' message:
void* runOnGUI(GUImessage message){
// do stuff with GUI controls
message.notifyCompletion->signal(); // tell thread 16 to run again
};
void* thread16run(){
..
..
GUImessage message;
waitEvent OSkernelWaitObject;
message.type=RunMyCode;
message.function=runOnGUI;
message.notifyCompletion=waitEvent;
thread1InputQueue.push(message); // ask thread 1 to run my function.
waitEvent->wait(); // wait, blocked, until the function is done
..
..
};
So, getting a function to run in the context of another thread requires cooperation. Threads cannot call other threads - only signal them, usually via the OS. Any thread that is expected to run such 'externally signaled' code must have an accessible entry point where the function can be placed and must execute code to retreive the function address and call it.

Threads: When ones thread is running can you interact with the other?

So I'm learning about threads at the moment and I'm wondering how some things are handled. For example, say I have a program where one thread listens for input and another performs some calculation on a single processor. When the calculation thread is running, what happens if the user should press a button intended for the input thread? Won't the input get ignored by the input thread until it is switched to that specific thread?
It depends a good deal on how the input mechanism is implemented. One easy-but-very-inelegant way to implement I/O is continuous polling... in that scenario, the input thread might sit in a loop, reading a hardware register over and over again, and when the value in the register changes from 0 to 1, the input thread would know that the button is pressed:
void inputThread()
{
while(1)
{
if (some_register_indicates_the_button_is_pressed()) react();
}
}
The problem with this method is that it's horribly inefficient -- the input thread is using billions of CPU cycles just checking the register over and over again. In a multithreaded system running this code, the thread scheduler would switch the CPU between the busy-waiting input thread and the calculation thread every quantum (e.g. once every 10 milliseconds) so the input thread would use half of the CPU cycles and the calculation thread would use the other half. In this system, if the input thread was running at the instant the user pressed the button, the input would be detected almost instantaneously, but if the calculation thread was running, the input wouldn't be detected until the next time the input thread got to run, so there might be as much as 10mS delay. (Worse, if the user released the button too soon, the input thread might never notice it was pressed at all)
An improvement over continuous polling is scheduled polling. It works the same as above, except that instead of the input thread just polling in a loop, it polls once, then sleeps for a little while, then polls again:
void inputThread()
{
while(1)
{
if (some_register_indicates_the_button_is_pressed()) react();
usleep(3000); // sleep for 30 milliseconds
}
}
This is much less inefficient that the first case, since every time usleep() is called, the thread scheduler puts the input thread to sleep and the CPU is made immediately available for any other threads to use. usleep() also sets a hardware timer, and when that hardware timer goes off (30 milliseconds later) it raises an interrupt. The interrupt causes the CPU to leave off whatever it was doing and run the thread-scheduling code again, and the thread-scheduling code will (in most cases) realize that its time for usleep() to return, and wake up the input thread so it can do another iteration of its loop. This still isn't perfect: the inputThread is still using a small amount of CPU on an ongoing basis -- not much, but if you do many instances of this it starts to add up. Also, the problem of the thread being asleep the whole time the button is held down is still there, and potentially even more likely.
Which leads us to interrupt-driven I/O. In this model, the input thread doesn't poll at all; instead it tells the OS to notify it when the button is pressed:
void inputThread()
{
while(1)
{
sleep_until_button_is_pressed();
react();
}
}
The OS's notification facility, in turn, has to set things up so that the OS is notified when the button is pressed, so that the OS can wake up and notify the input thread. The OS does this by telling the button's control hardware to generate an interrupt when the button is pressed; once that interrupt goes off, it works much like the timer interrupt in the previous example; the CPU runs the thread scheduler code, which sees that it's time to wake up the input thread, and lets the input thread run. This mechanism has very nice properties: (1) the input thread gets woken up ASAP when the button is pressed (there's no waiting around for the calculation thread to finish its quantum first), and (2) the input thread doesn't eat up any CPU cycles at all, except when the button is pushed. Because of these advantages, it's this sort of mechanism that is used in modern computers for any non-trivial I/O.
Note that on a modern PC or Mac, there's much more going on than just two threads and a hardware button; e.g. there are dozens of hardware devices (keyboard, mouse, video card, hard drive, network card, sound card, etc) and dozens of programs running at once, and it's the operating system's job to mediate between them all as necessary. Despite all that, the general principles are still the same; let's say that in your example the button the user clicked wasn't a physical button but an on-screen GUI button. In that case, something like the following sequence of events would occur:
User's finger presses the left mouse button down
Mouse's internal hardware sends a mouse-button-pressed message over the USB cable to the computer's USB controller
Computer's USB controller generates an interrupt
Interrupt causes the CPU to break out of the calculation thread's code and run the OS's scheduler routine
The thread scheduler sees that the USB interrupt line indicates a USB event is ready, and responds by running the USB driver's interrupt handler code
USB driver's interrupt handler code reads in the event, sees that it is a mouse-button-pressed event, and passes it along to the window manager
Window manager knows which window has the focus, so it knows which program to forward the mouse-button-pressed event to
Window manager tells the OS to wake up the input thread associated with that window
Your input thread wakes up and calls react()
If you're running on a single processor system, then yes.
Short answer: yes, threads always interact. The problems start to appear when they interact in a non-predictable way. Every thread in a process has access to the entire process memory space, so changing memory in one thread may spoil the data for another thread.
Well, there are multiple ways the thread can comunicate with each other. One of them is having global variable and use it as a buffer for communication beteen threads.
When you asked about button there must be a thread containing event loader loop. Within this thread, input won't be ignored according to my experience.
You can see some of my threads about this topic:
Here, I was interested how to make 3 thread application that do communicate through events.
The thread waiting for user input will be made ready 'immediately'. On most OS, threads that were waiting on I/O and have become ready are given a temporary priority boost and, even on a single-core CPU, will 'immediately' preempt another thread that was running at the same priority.
So, if a single-core CPU is running a calculation and another, waiting, thread of the same priority gets input, it will probably run straightaway.

How do system calls like select() or poll() work under the hood?

I understand that async I/O ops via select() and poll() do not use processor time i.e its not a busy loop but then how are these really implemented under the hood ? Is it supported in hardware somehow and is that why there is not much apparent processor cost for using these ?
It depends on what the select/poll is waiting for. Let's consider a few cases; I'm going to assume a single-core machine for simplification.
First, consider the case where the select is waiting on another process (for example, the other process might be carrying out some computation and then outputs the result through a pipeline). In this case the kernel will mark your process as waiting for input, and so it will not provide any CPU time to your process. When the other process outputs data, the kernel will wake up your process (give it time on the CPU) so that it can deal with the input. This will happen even if the other process is still running, because modern OSes use preemptive multitasking, which means that the kernel will periodically interrupt processes to give other processes a chance to use the CPU ("time-slicing").
The picture changes when the select is waiting on I/O; network data, for example, or keyboard input. In this case, while archaic hardware would have to spin the CPU waiting for input, all modern hardware can put the CPU itself into a low-power "wait" state until the hardware provides an interrupt - a specially handled event that the kernel handles. In the interrupt handler the CPU will record the incoming data and after returning from the interrupt will wake up your process to allow it to handle the data.
There is no hardware support. Well, there is... but is nothing special and it depends on what kind of file descriptor are you watching. If there is a device driver involved, the implementation depends on the driver and/or the device. For example, sockets. If you wait for some data to read, there are a sequence of events:
Some process calls poll()/select()/epoll() system call to wait for data in a socket. There is a context switch from the user mode to the kernel.
The NIC interrupts the processor when some packet arrives. The interrupt routine in the driver push the packet in the back of a queue.
There is a kernel thread that takes data from that queue and wakes up the network code inside the kernel to process that packet.
When the packet is processed, the kernel determines the socket that was expecting for it, saves the data in the socket buffer and returns the system call back to user space.
This is just a very brief description, there are a lot of details missing but I think that is enough to get the point.
Another example where no drivers are involved is a unix socket. If you wait for data from one of them, the process that waits is added to a list. When other process on the other side of the socket writes data, the kernel checks that list and the point 4 is applied again.
I hope it helps. I think that examples are the best to undertand it.

How does Linux blocking I/O actually work?

In Linux, when you make a blocking i/o call like read or accept, what actually happens?
My thoughts: the process get taken out of the run queue, put into a waiting or blocking state on some wait queue. Then when a tcp connection is made (for accept) or the hard drive is ready or something for a file read, a hardware interrupt is raised which lets those processes waiting to wake up and run (in the case of a file read, how does linux know what processes to awaken, as there could be lots of processes waiting on different files?). Or perhaps instead of hardware interrupts, the individual process itself polls to check availability. Not sure, help?
Each Linux device seems to be implemented slightly differently, and the preferred way seems to vary every few Linux releases as safer/faster kernel features are added, but generally:
The device driver creates read and
write wait queues for a device.
Any process thread wanting to wait
for i/o is put on the appropriate
wait queue. When an interrupt occurs
the handler wakes up one or more
waiting threads. (Obviously the
threads don't run immediately as we are in interrupt
context, but are added to the
kernel's scheduling queue).
When scheduled by the kernel the
thread checks to see if conditions
are right for it to proceed - if not
it goes back on the wait queue.
A typical example (slightly simplified):
In the driver at initialisation:
init_waitqueue_head(&readers_wait_q);
In the read function of a driver:
if (filp->f_flags & O_NONBLOCK)
{
return -EAGAIN;
}
if (wait_event_interruptible(&readers_wait_q, read_avail != 0))
{
/* signal interrupted the wait, return */
return -ERESTARTSYS;
}
to_copy = min(user_max_read, read_avail);
copy_to_user(user_buf, read_ptr, to_copy);
Then the interrupt handler just issues:
wake_up_interruptible(&readers_wait_q);
Note that wait_event_interruptible() is a macro that hides a loop that checks for a condition - read_avail != 0 in this case - and repeatedly adds to the wait queue again if woken when the condition is not true.
As mentioned there are a number of variations - the main one is that if there is potentially a lot of work for the interrupt handler to do then it does the bare minimum itself and defers the rest to a work queue or tasklet (generally known as the "bottom half") and it is this that would wake the waiting threads.
See Linux Device Driver book for more details - pdf available here:
http://lwn.net/Kernel/LDD3
Effectivly the method will only returns when the file is ready to read, when data is on a socket, when a connection has arrived...
To make sure it can return immediatly you probably want to use the Select system call to find a ready file descriptor.
Read this: http://www.minix3.org/doc/
It's a very, clear, very easy to understand explanation. It generally applies to Linux, also.

Resources