How OS handle input operations? - io

I learn how an OS system works and know that peripheral devices can send interrupts that OS handles then. But I don't have a vision of how actually it handles it.
What happens when I move the mouse around? Does it send interrupts every millisecond? How OS can handle the execution of a process and mouse positioning especially if there is one CPU? How can OS perform context switch in this case effectively?
Or for example, there are 3 launched processes. Process 1 is active, process 2 and process 3 are ready to go but in the pending state. The user inputs something with the keyboard in process 1. As I understand OS scheduler can launch process 2 or process 3 while awaiting input. I assume that the trick is in timings. Like the processor so fast that it's able to launched processes 2 and 3 between user's presses.
Also, I will appreciate any literature references where I could get familiar with how io stuff works especially in terms of timings and scheduling.

Let's assume it's some kind of USB device. For USB you have 2 layers of device drivers - the USB controller driver and the USB peripheral (keyboard, mouse, joystick, touchpad, ...) driver. The USB peripheral driver asks the USB controller driver to poll the device regularly (e.g. maybe every 8 milliseconds) and the USB controller driver sets that up and the USB controller hardware does this polling (not software/driver), and if it receives something from the USB peripheral it'll send an IRQ back to the USB controller driver.
When the USB controller sends an IRQ it causes the CPU to interrupt whatever it was doing and execute the USB controller driver's IRQ handler. The USB controller driver's IRQ handler examines the state of the USB controller and figures out why it sent an IRQ; and notices that the USB controller received data from a USB peripheral; so it determines which USB peripheral driver is responsible and forwards the received data to that USB peripheral's device driver.
Note: Because it's bad to spend too much time handling an IRQ (because it can cause the handling of other more important IRQs to be postponed) often there will be some kind separation between the IRQ handler and higher level logic at some point; which is almost always some variation of a queue where the IRQ handler puts a notification on a queue and then returns from the IRQ handler, and the notification on the queue causes something else to be executed run later. This might happen in the middle of the USB controller driver (e.g. USB controller driver's IRQ handler does a little bit of work, then creates a notification that causes the rest of the USB controller driver to do the rest of the work). There's multiple ways to implement this "queue of notifications" (deferred procedure calls, message passing, some other form of communication, etc) and different operating systems use different approaches.
The USB peripheral's device driver (e.g. keyboard driver, mouse driver, ...) receives the data sent by the USB controller's driver (that came from the USB controller that got it from polling the USB peripheral); and examines that data. Depending on what the data contains the USB peripheral's device driver will probably construct some kind of event describing what happened in a "standard for that OS" way. This can be complicated (e.g. involve tracking past state of the device and lookup tables for keyboard layout, etc). In any case the resulting event will be forwarded to something else (often a user-space process) using some form of "queue of notifications". This might be the same kind of "queue of notifications" that was used before; but might be something very different (designed to suit user-space instead of being designed for kernel/device drivers only).
Note: In general every OS that supports multi-tasking provides one or more ways that normal processes can use to communicate with each other; called "inter-process communication". There are multiple possibilities - pipes, sockets, message passing, etc. All of them interact with scheduling. E.g. a process might need to wait until it receives data and call a function (e.g. to read from a pipe, or read from a socket, or wait for a message, or ..) that (if there's no data in the queue to receive) will cause the scheduler to be told to put the task into a "blocked" state (where the task won't be given any CPU time); and when data arrives the scheduler is told to bump the task out of the "blocked" state (so it can/will be given CPU time again). Often (for good operating systems), whenever a task is bumped out of the "blocked" state the scheduler will decide if the task should preempt the currently running task immediately, or not; based on some kind of task/thread priorities. In other words; if a lower priority task is currently running and a higher priority task is waiting to receive data, then when the higher priority task receives the data it was waiting for the scheduler may immediately do a task switch (from lower priority task to higher priority task) so that the higher priority task can examine the data it received extremely quickly (without waiting for ages while the CPU is doing less important work).
In any case; the event (from the USB peripheral's device driver) is received by something (likely a process in user-space, likely causing that process to be unblocked and given CPU time immediately by the scheduler). This is the top of a "hierarchy/tree of stuff" in user-space; where each thing in the tree might look at the data it receives and may forward it to something else in the tree (using the same inter-process communication to forward the data to something else). For example; that "hierarchy/tree of stuff" might have a "session manager" at the top of the tree, then "GUI" under that, then several application windows under that. Sometimes an event will be consumed and not forwarded to something else (e.g. if you press "alt+tab" then the GUI might handle that itself, and the GUI won't forward it to the application window that currently has keyboard focus).
Eventually most events will end up at a normal application. Normal applications often have a language run-time that will abstract the operating systems details to make the application more portable (so that the programmer doesn't have to care which OS their application is running on). For example, for Java, the Java virtual machine might convert the operating system's event (that arrived in an "OS specific" format via. an "OS specific" communication mechanism) into a generic "KeyEvent" (and notify any "KeyListener").
The entire path (from drivers to a function/method inside an application) could involve many thousands of lines of code written by hundreds of people spread across many separate layers; where the programmers responsible for one piece (e.g. GUI) don't have to worry much about what the programmers working on other pieces (e.g. drivers) do. For this reason; you probably won't find a single source of information that covers everything (at all layers). Instead, you'll find information for device driver developers only, or information for C++ application developers only, or ...
This is also why nobody will be able to provide more than a generic overview (without any OS specific or "layer specific" details) - they'd have to write 12 entire books to provide an extremely detailed answer.

Related

How does the kernel track which processes receive data from an interrupt?

In a preemptive kernel (say Linux), say process A makes a call to getc on stdin, so it's blocked waiting for a character. I feel like I have a fundamental misunderstanding of how the kernel knows then to wake process A at this point and deliver the data after it's received.
My understanding is then this process can be put into a suspended state while the scheduler schedules other processes/threads to run, or it gets preempted. When the keypress happens, through polling/interrupts depending on the implementation, the OS runs a device driver that decodes the key that was pressed. However it's possible (and likely) that my process A isn't currently running. At this point, I'm confused on how my process that was blocked waiting on I/O is now queued to run again, especially how it knows which process is waiting for what. It seems like the device drivers hold some form of a wait queue.
Similarly, and I'm not sure if this is exactly related to the above, but if my browser window, for example, is in focus, it seems to receive key presses but not other windows. Does every window/process have the ability to "listen" for keyboard events even if they're not in focus, but just don't for user experience sake?
So I'm curious how kernels (or how some) keep track of what processes are waiting on which events, and when those events come in, how it determines which processes to schedule to run?
The events that processes wait on are abstract software events, such as a particular queue is not empty, rather than concrete hardware events, such as a interrupt 4635 occurring.
Some configuration ( perhaps guided by a hardware description like device tree ) identifies interrupt 4635 as being a signal from a given serial device with a given address. The serial device driver configures itself so it can access the device registers of this serial port, and attaches its interrupt handler to the given interrupt identifier (4635).
Once configured, when an interrupt from the serial device is raised, the lowest level of the kernel invokes this serial device's interrupt handler. In turn, when the handler sees a new character arriving, it places it in the input queue of that device. As it enqueues the character, it may notice that some process(es) are waiting for that queue to be non-empty, and cause them to be run.
That approximately describes the situation using condition variables as the signalling mechanism between interrupts and processes, as was established in UNIX-y kernels 44 years ago. Other approaches involve releasing a semaphore on each character in the queue; or replying with messages for each character. There are many forms of synchronization that can be used.
Common to all such mechanisms, is that the caller chooses to suspend itself to wait for io to complete; and does so by associating its suspension with the instance of the object which it is expecting input from.
What happens next can vary; typically the waiting process, which is now running, reattempts to remove a character from the input queue. It is possible some other process got to it first, in which case, it merely goes back to waiting for the queue to become non empty.
So, the OS doesn't explicitly route the character from the device to the application; a series of implicit and indirect steps does.

How does D3D9's Presentation Interval work?

If I set the presentation interval in Direct3D9 to D3DPRESENT_INTERVAL_ONE, when I call Present it waits until the monitor updates. It always waits the correct amount and (presumably) doesn't use a spinlock.
I'd like to be able to do the same "waiting" that Present does in Direct3D9, however I don't want to use Direct3D. How exactly does it wait for vsync perfectly without using a spinlock? Can just the waiting be programmed without Direct3D?
Synchronization with the vertical retrace is handled by driver in a device dependent manner. It's not inconceivable that there exists some implementation just busy waits, polling some device register until it detects the beginning of the retrace interval. The alternative would to sleep waiting on a device interrupt, which frees up the CPU for other tasks, but increases the latency because of the necessary kernel-mode/user-mode transitions. It's also possible for a driver to implement a hybrid approach by estimating the time to the retrace, sleeping for a bit less than that and then busy waiting.
I don't know which of these three possible implementations is typical, but it doesn't really matter. Windows doesn't provide any device independent means for a Windows application to synchronize with the virtual retrace outside of DirectX (and I guess OpenGL). Unlike a video card driver, applications don't have direct access to the hardware, so can't read the device registers nor request or handle interrupts.

Is there really no way to control priority of workqueue processing as compared to user processes/threads?

I've been reading a variety of references that discuss the use of bottom-half work queues for deferred processing in linux drivers. From what I glean, it seems like any work done by kernel work queues gets scheduled just like ordinary user processes/threads and that the only real difference between a kernel work queue-related process and a user process is that the work queue can move data between user-side buffers and kernel buffers. I would appreciate knowing if my interpretation of these references is correct, or whether there are mechanisms by which I can maintain some degree of control over the priority of work queue processing. More specifically, I'd like to know if I can guarantee that a work queue process has higher priority than any user process, at least when the work queue process is not sleeping. I'm asking this question in the context of handling reads/writes from/to chips hanging off a 400 kHz (i.e. slow) I2C bus. We're running linux 2.6.10 on an ARM9 processor. - Thanks!

Need help handling multiple shared I2C MAX3107 chips on shared ARM9 GPIO interrupt (linux)

Our group is working with an embedded processor (Phytec LPC3180, ARM9). We have designed a board that includes four MAX3107 uart chips on one of the LPC3180's I2C busses. In case it matters, we are running kernel 2.6.10, the latest version available for this processor (support of this product has not been very good; we've had to develop or fix a number of the drivers provided by Phytec, and Phytec seems to have no interest in upgrading the linux code (especially kernel version) for this product. This is too bad in that the LPC3180 is a nice device, especially in the context of low power embedded products that DO NOT require ethernet and in fact don't want ethernet (owing to the associated power consumption of ethernet controller chips). The handler that is installed now (developed by someone else) is based on a top-half handler and bottom-half work queue approach.
When one of four devices (MAX3107 UART chips) on the I2C bus receives a character it generates an interrupt. The interrupt lines of all four MAX3107 chips are shared (open drain pull-down) and the line is connected to a GPIO pin of the 3180 which is configured for level interrupt. When one of the 3017's generates an interrupt a handler is run which does the following processing (roughly):
spin_lock_irqsave();
disable_irq_nosync(irqno);
irq_enabled = 0;
irq_received = 1;
spin_unlock_irqrestore()
set_queued_work(); // Queue up work for all four devices for every interrupt
// because at this point we don't know which of the four
// 3107's generated the interrupt
return IRQ_HANDLED;
Note, and this is what I find somewhat troubling, that the interrupt is not re-enabled before leaving the above code. Rather, the driver is written such that the interrupt is re-enabled by a bottom half work queue task (using the "enable_irq(LPC_IRQ_LINE) function call". Since the work queue tasks do not run in interrupt context I believe they may sleep, something that I believe to be a bad idea for an interrupt handler.
The rationale for the above approach follows:
1. If one of the four MAX3107 uart chips receives a character and generates an interrupt (for example), the interrupt handler needs to figure out which of the four I2C devices actually caused the interrupt. However, and apparently, one cannot read the I2C devices from within the context of the upper half interrupt handler since the I2C reads can sleep, something considered inappropriate for an interrupt handler upper-half.
2. The approach taken to address the above problem (i.e. which device caused the interrupt) is to leave the interrupt disabled and exit the top-half handler after which non-interrupt context code can query each of the four devices on the I2C bus to figure out which received the character (and hence generated the interrupt).
3. Once the bottom-half handler figures out which device generated the interrupt, the bottom-half code disables the interrupt on that chip so that it doesn't re-trigger the interrupt line to the LPC3180. After doing so it reads the serial data and exits.
The primary problem here seems to be that there is not a way to query the four MAX3107 uart chips from within the interrupt handler top-half. If the top-half simply re-enabled interrupts before returning, this would cause the same chip to generate the interrupt again, leading, I think, to the situation where the top-half disables the interrupt, schedules bottom-half work queues and disables the interrupt only to find itself back in the same place because before the lower-half code would get to the chip causing the interrupt, another interrupt has occurred, and so forth, ....
Any advice for dealing with this driver will be much appreciated. I really don't like the idea of allowing the interrupt to be disabled in the top-half of the driver yet not be re-enabled prior to existing the top-half drive code. This does not seem safe.
Thanks,
Jim
PS: In my reading I've discovered threaded interrupts as a means to deal with the above-described requirements (at least that's my interpretation of web site articles such as http://lwn.net/Articles/302043/). I'm not sure if the 2.6.10 kernel as provided by Phytec includes threaded interrupt functions. I intend to look into this over the next few days.
If your code is written properly it shouldn't matter if a device issues interrupts before handling of prior interrupts is complete, and you are correct that you don't want to do blocking operations in the top half, but blocking operations are acceptable in a bottom half, in fact that is part of the reason they exist!
In this case I would suggest an approach where the top half just schedules the bottom half, and then the bottom half loops over all 4 devices and handles any pending requests. It could be that multiple devices need processing, or none.
Update:
It is true that you may overload the system with a load test, and the software may need to be optimized to handle heavy loads. Additionally I don't have a 3180, and four 3107s (or similar) of my own to test this out on, so I am speaking theoretically, but I am not clear why you need to disable interrupts at all.
Generally speaking when a hardware device asserts an interrupt it will not assert another one until the current one is cleared. So you have 4 devices sharing one int line:
Your top half fires and adds something to the work queue (ie triggers bottom half)
Your bottom half scans all devices on that int line (ie all four 3107s)
If one of them caused the interrupt you will then read all data necessary to fully process the data (possibly putting it in a queue for higher level processing?)
You "clear" the interrupt on the current device.
When you clear the interrupt then the device is allowed to trigger another interrupt, but not before.
More details about this particular device:
It seems that this device (MAX3107) has a buffer of 128 words, and by default you are getting interrupted after every single word. But it seems that you should be able to take better advantage of the buffer by setting the FIFO level registers. Then you will get interrupted only after that number of words has been rx (or if you fill your tx FIFO up beyond the threshold in which case you should slow down the transmit speed (ie buffer more in software)).
It seems the idea is to basically pull data off the devices periodically (maybe every 100ms or 10ms or whatever seems to work for you) and then only have the interrupt act as a warning that you have crossed a threshold, which might schedule the periodic function for immediate execution, or increases the rate at which it is called.
Interrupts are enabled & disabled because we use level-based interrupts, not edge-based. The ramifications of that are explicitly explained in the driver code header, which you have, Jim.
Level-based interrupts were required to avoid losing an edge interrupt from a character that arrives on one UART immediately after one arriving on another: servicing the first effectively eliminates the second, so that second character would be lost. In fact, this is exactly what happened in the initial, edge-interrupt version of this driver once >1 UART was exercised.
Has there been an observed failure with the current scheme?
Regards,
The Driver Author (someone else)

How do system calls like select() or poll() work under the hood?

I understand that async I/O ops via select() and poll() do not use processor time i.e its not a busy loop but then how are these really implemented under the hood ? Is it supported in hardware somehow and is that why there is not much apparent processor cost for using these ?
It depends on what the select/poll is waiting for. Let's consider a few cases; I'm going to assume a single-core machine for simplification.
First, consider the case where the select is waiting on another process (for example, the other process might be carrying out some computation and then outputs the result through a pipeline). In this case the kernel will mark your process as waiting for input, and so it will not provide any CPU time to your process. When the other process outputs data, the kernel will wake up your process (give it time on the CPU) so that it can deal with the input. This will happen even if the other process is still running, because modern OSes use preemptive multitasking, which means that the kernel will periodically interrupt processes to give other processes a chance to use the CPU ("time-slicing").
The picture changes when the select is waiting on I/O; network data, for example, or keyboard input. In this case, while archaic hardware would have to spin the CPU waiting for input, all modern hardware can put the CPU itself into a low-power "wait" state until the hardware provides an interrupt - a specially handled event that the kernel handles. In the interrupt handler the CPU will record the incoming data and after returning from the interrupt will wake up your process to allow it to handle the data.
There is no hardware support. Well, there is... but is nothing special and it depends on what kind of file descriptor are you watching. If there is a device driver involved, the implementation depends on the driver and/or the device. For example, sockets. If you wait for some data to read, there are a sequence of events:
Some process calls poll()/select()/epoll() system call to wait for data in a socket. There is a context switch from the user mode to the kernel.
The NIC interrupts the processor when some packet arrives. The interrupt routine in the driver push the packet in the back of a queue.
There is a kernel thread that takes data from that queue and wakes up the network code inside the kernel to process that packet.
When the packet is processed, the kernel determines the socket that was expecting for it, saves the data in the socket buffer and returns the system call back to user space.
This is just a very brief description, there are a lot of details missing but I think that is enough to get the point.
Another example where no drivers are involved is a unix socket. If you wait for data from one of them, the process that waits is added to a list. When other process on the other side of the socket writes data, the kernel checks that list and the point 4 is applied again.
I hope it helps. I think that examples are the best to undertand it.

Resources