Linux thread sleep vs read - linux

In my application there is a Linux thread that needs to be active every 10 ms,
thus I use usleep (10*1000). Result: thread never wakes up after 10 ms but always after 20 ms. OK, it is related to scheduler timeslice, CONFIG_HZ etc.
I was trying to use usleep(1*1000) (that is 1 ms) but the result was the same. Thread always wakes up after 20 ms.
But in the same application the other thread handles network events (UDP packets) that came in every 10 ms. There is blocking 'recvfrom' (or 'select') and it wakes up every 10 ms when there is incoming packet.
Why it is so ? Does select has to put the thread in 'sleep' when there are no packets? Why it behave differently and how can I cause my thread to be active every 10 ms (well more or less) without external network events?
Thanks,
Rafi

You seem to be under the common impression that these modern preemptive multitaskers are all about timeslices and quantums.
They are not.
They are all about software and hardware interrupts, and the timer hardware interrupt is only one of many that can set a thread ready and change the set of running threads. The hardware interrupt from a NIC that causes a network driver to run is an example of another one.
If a thread is blocked, waiting for UDP datagrams, and a datagram becomes avaialable because of a NIC interrupt running a driver, the blocked thread will be made ready as soon as the NIC driver has run because the driver will signal the thread and request an immediate reschedule on exit. If your box is not overloaded with higher-rpiority ready threads, it will be set running 'immediately' to handle the datagram that is now available. This mechanism provides high-performance I/O and has nothing to do with any timers.
The timer interrupt runs periodically to support sleep() and other system-call timeouts. It runs at a fairly low frequency/high interval, (like 1/10ms), because it is another overhead that should be minimised. Running such an interrupt at a higher frequency would reduce timer granularity at the expense of increased interrupt-state and rescheduling overhead that is not justified in most desktop installations.
Summary: your timer operations are subject to 10ms granularity but your datagram I/O responds quickly.
Also, why does you thread need to be active every 10ms? What are you polling for?

Related

How one thread wakes up others inside Linux Kernel

My question is how and when one thread wakes up other thread(s)?
I tried to look at Linux kernel code, but didn't find what I was looking for.
For example, there is one thread waiting on mutex, conditional variable, file descriptor (event fd, for example).
What work is performed by thread that releases mutex, and work is performed by other cpu core that is about to run sleeping thread?
I have searched existing answers, but did not find details.
I have have read that scheduler can usually be called:
after a system call before returning to userspace
after interrupt processing
after timer interrupt processing - for example, every 1ms (hz = 1000) ок 4ms (hz = 250)
I believe that thread that releases some resource, it calls through some system call kernel function try_to_wake_up. This function picks some task(s) and sets its state to RUNNABLE. This work is performed by signaling thread and takes some time. But how actually task is started to run? If system is buzy, there may be no free cpus to run this task. Some time in the future, for example, after call by timer when some other thread goes to sleep or exhausted its quantum, scheduler is called on some cpu and takes runnable task for running. Maybe this task will be preferably run on that cpu where it ran previously.
But there must be some other scenario. When there are idle cpus, I believe that task is awakened immediately, without waiting at most 1ms or even 4ms (wake-up latency is always around several microseconds, not milleseconds).
Also, for example, imagine situation when some thread is running exclusively on some cpu core.
This cpu core may be isolated from kernel threads and interrupts handlers and only one user thread has affinity set to run on this and only on this core. I believe that if there are enough free cpu cores, no other threads will be normally scheduled to run on that core (am I wrong?)
Also this cpu core may have nohz_full option enabled. So when user thread goes to sleep, this core goes to sleep too. No irqs from devices, no timer irqs are processed.
So there must be some way for one cpu to tell other cpu to start running (throug interrupt), call scheduler and run user thread that is ready to awake.
Scheduler must run not on the cpu that releases resource, but on some other cpu, that should be awakened. Maybe this is performed somehow via IPI interrupt? Can you help me to find corresponding code in the kernel or describe how it works?
Thank you.

What's the longest time that a thread, which is blocking on receiving from an RS232 serial port, can take to wake up?

Assuming I set the process to the highest possible priority and there is no swap...
What's the longest time that a thread, which is blocking on receiving from an RS232 serial port, can take to wake up?
I want to know whether the thread will be woken within microseconds of the UART interrupt hitting the kernel, or whether it will have to wait for the next 100ms timeslice on a CPU.
What's the longest time that a thread, which is blocking on receiving from an RS232 serial port, can take to wake up?
Depending on the mode (e.g. canonical) a process could wait forever (e.g. for the EOL character).
I want to know whether the thread will be woken within microseconds of the UART interrupt hitting the kernel, or
The end of frame (i.e. the stop bit) on the wire is a better (i.e. consistent) reference point.
"UART interrupt hitting the kernel" is a poor reference point considering interrupt generation and processing can be deferred.
A UART FIFO may not generate an interrupt for every character/byte.
The interrupt controller prioritizes pending interrupts, and UARTs are rarely assigned high priorities.
Software can disable interrupts for critical regions.
whether it will have to wait for the next 100ms timeslice on a CPU.
The highest-priority runable process gets control after a syscall completes.
Reference: Linux Kernel Development: Preemption and Context Switching:
Consequently, whenever the kernel is preparing to return to user-space, either
on return from an interrupt or after a system call, the value of need_resched
is checked. If it is set, the scheduler is invoked to select a new (more fit)
process to execute.
I'm looking to minimise Linux serial latency between the received stop bit and the start bit of the reply from a high-priority userspace thread.
I suspected that is what you are really seeking.
Configuration of the serial terminal is crucial for minimizing such latency, e.g. research the ASYNC_LOW_LATENCY serial flag.
However configuration of the Linux kernel can further improve/minimize such latency, e.g. this developer reports a magnitude reduction from millisecs to only ~100 microsec.
I'm only familiar with serial interfaces on ATMEGA and STM32 microcontrollers ...
Then be sure to review Linux serial drivers.

Why would this thread in my program be starving?

I've got a program that has about 80 threads. It's running on a ~50ish core machine on linux 3.36. At most there are 2 of these programs running at once, and they are identical. Nothing else is running on the machine.
The threads themselves are real-time linux pthreads with SCHED_RR (round robin) policy.
10 are highest priority (yes, I set ulimit to 99) and have cpu affinity set to 10 of the cores. In other words, they are each pinned to their own core.
about 60 are medium priority.
about 10 are low priority.
The 10 highest priority threads are constantly using cpu.
The rest are doing network IO as well as doing some work on the CPU. Here's the problem: I'm seeing one of the low priority threads being starved, sometimes over 15 seconds at a time. This specific thread is waiting on a TCP socket for some data. I know the data has been fully sent because I can see that the server on the other end of the connection has sent the data (i.e., it logs a timestamp after sending the data). Usually the thread takes milliseconds to receive and process it, but sporadically it will take 15 seconds after the other server has successfully sent the data. Note that increasing the priority of the thread and pinning it to a CPU has eradicated this issue, but this is not a long-term solution. I would not expect this behavior in the first place - 15 seconds is a very long time.
Does anyone know why this would be happening? We have ruled out that it is any of the logic in the program/threads. Also note that the program is written in C.
I would not expect this behavior in the first place - 15 seconds is a very long time.
If your 60 medium-priority threads were all runnable, then that's exactly what you'd expect: with realtime threads then lower-priority threads won't run at all while there's higher-priority threads still runnable.
You might be able to use perf timechart to analyse exactly what's going on.

At what points in a program the system switch threads

I know that threads cannot actually run in parallel on the same core, but in a regular desktop system there is normally hundreds or even thousands of threads. Which is of course much more than today's average of 4 core CPU's. So the system actually running some thread for X time and then switches to run another thread for Y amount of time an so on.
My question is, how does the system decide how much time to execute each thread?
I know that when a program is calling sleep() on a thread for an amount of time, the operation system can use this time to execute other threads, but what happens when a program does not call sleep at all?
E.g:
int main(int argc, char const *argv[])
{
while(true)
printf("busy");
return 0;
}
When does the operating system decide to suspend this thread and excutre another?
The OS keeps a container of all those threads that can use CPU execution, (usually such threads are described as being'ready'). On most desktop systems, this is a very small fraction of the total number of threads. Most threads in such systems are waiting on either I/O, (this includes sleeping - waiting on timer I/O), or inter-thread signaling; such threads cannot use CPU execution and so the OS does not dispatch them onto cores.
A software syscall, (eg. a request to open a file, a request to sleep or wait for a signal from another thread), or a hardware interrupt from a peripheral device, (eg. a disk controller, NIC, KB, mouse), may cause the set of ready threads to change and so initiate a scheduling run.
When run, the shceduler decides on what set of ready threads to assign to the available cores. The algorithm it uses is a compromise that tries to optimize overall performance by balancing the need for expensive context-switches with the need for responsive I/O. The kernel CAN stop any thread on any core an preempt it, but it would surely prefer not to:)
So:
My question is, how does the system decide how much time to execute
each thread?
Essentially, it does not. If the set of ready threads is not greater than the number of cores, there is no need to stop/control/influence a CPU-intensive loop - it can be allowed to run on forever, taking up a whole core.
Note that your example is very poor - the printf() call will request output from the OS and, if not immediately available, the OS will block your seemingly 'CPU only' thread until it is.
but what happens when a program does not call sleep at all?
It's just one more thread. If it is purely CPU-intensive, then whether it runs continually depends upon the loading on the box and the number of cores available, as already described. It can, of course, get blocked by requesting I/O or electing to wait for a signal from another thread, so removing itself from the set of ready threads.
Note that one I/O device is a hardware timer. This is very useful for timing out system calls and providing Sleep() functionality. It usually does have a side-effect on those boxes where the number of ready threads is larger than the number of cores available to run them, (ie. the box is overloaded or the task/s it runs have no limits on CPU use). It can result in sharing out the available cores around the ready threads, so giving the illusion of running more threads than it's actually physically capable of, (try not to get hung up on Sleep() and the timer interrupt - it's one of many interrupts that can change thread state).
It is this behaviour of the timer hardware, interrupt and driver that gives rise to the apalling 'quantum', 'time-sharing', 'round-robin' etc. etc.etc. confusion and FUD that surrounds the operation of modern preemptive kernels.
A preemptive kernel, and it's drivers etc, is a state-machine. Syscalls from running threads and hardware interrupts from peripheral devices go in, a set of running threads comes out.
It depends which type of scheduling your OS is using for example lets take
Round Robbin:
In order to schedule processes fairly, a round-robin scheduler generally employs time-sharing, giving each job a time slot or quantum(its allowance of CPU time), and interrupting the job if it is not completed by then. The job is resumed next time a time slot is assigned to that process. If the process terminates or changes its state to waiting during its attributed time quantum, the scheduler selects the first process in the ready queue to execute.
There are others scheduling algorithms as well you will find this link useful:https://www.cs.uic.edu/~jbell/CourseNotes/OperatingSystems/5_CPU_Scheduling.html
The operating system has a component called the scheduler that decides which thread should run and for how long. There are essentially two basic kinds of schedulers: cooperative and preemptive. Cooperative scheduling requires that the threads cooperate and regularly hand control back to the operating system, for example by doing some kind of IO. Most modern operating systems use preemptive scheduling.
In preemptive scheduling the operating system gives a time slice for the thread to run. The OS does this by setting a handler for a CPU timer: the CPU regularly runs a piece of code (the scheduler) that checks if the current thread's time slice is over, and possibly decides to give the next time slice to a thread that is waiting to run. The size of the time slice and how to choose the next thread depends on the operating system and the scheduling algorithm you use. When the OS switches to a new thread it saves the state of the CPU (register contents, program counter etc) for the current thread into main memory, and restores the state of the new thread - this is called a context switch.
If you want to know more, the Wikipedia article on Scheduling has lots of information and pointers to related topics.

Meaning of "Sleeping" process [duplicate]

What causes these sleeping processes that I see in top? If I were to call PHP's sleep() function, would that add to the sleeping count I see in top? Are there any disadvantages to having a high number in sleeping?
A process is sleeping when it is blocked, waiting for something. For example, it might have called read() and is waiting on data to arrive from a network stream.
sleep() is indeed one way to have your process sleep for a while. Sleeping is, however, the normal state of all but heavily compute-bound processes - sleeping is essentially what a process does when it isn't doing anything else. It's the normal state of affairs for most of your processes to be sleeping - if that's not the case, it tends to indicate that you need more CPU horsepower.
A sleeping process is like suspended process.
A process sleeps when:
It's doing an I/O operation (blocking for I/O)
When you order it to sleep by sleep()
The status of any process can be:
Ready: when it ready for execution and it's in the queue waiting the processor call with specific priority
Sleeping: When it was running and it was blocked for I/O operation or when executing sleep()
Running: When the processor executes a process it becomes running.
Status Meaning
R Runnable
T Stopped
P Waiting on Pagein
D Waiting on I/O
S Sleeping < 20 seconds
I Idle - sleeping >20 seconds
Z Zombie or defunct
They are processes which aren't running on the CPU right now. This is not necessarily a bad thing.
If you have huge numbers (10,000 on a server system, for example) of processes sleeping, the amount of memory etc used to keep track of them may make the system less efficient for non-sleeping processes.
Otherwise, it's fine.
Most normal server systems have 100 to 1000 much of the time; this is not a big deal.
Just because they're not doing anything just now doesn't mean they won't, very soon. Keeping them in memory, ready, reduces latency when they are required.
To go into a bit more detail here, the S state means the process is waiting on a timer or a slow device, while the D state means it is waiting on a fast device.
What constitutes a fast device vs a slow device is not terribly well defined, but generally, all serial, network, and terminal devices are slow devices, while disks are fast devices.

Resources