POSIX message queue - mq_send thread wake order - linux

Can someone explain to me how message queues handle waking multiple
threads blocked on a single message queue?
My situation is I have multiple writers blocking on a full message
queue, each posting messages with priority equal to the thread
priority. I want to make sure they wake and post in priority order,
however my application is behaving as if they are waking in FIFO order
(i.e. the order in which they blocked). Each blocking thread is
scheduled with the SCHED_FIFO policy with a different priority with
system level scope.
I've searched the Internet high and low for something describing how
this should work and all I can find is POSIX man pages describing that
multiple blockers wake in priority order if Priority Scheduling is
supported. Since the kernel scheduler is a priority scheduler I
would think that the threads would wake in priority order and post to
the queue, however that doesn't appear to be the case. I'm sure I'm
just missing some subtle detail and was hoping the experts here on
this list can help shine some light on what I'm seeing, since its at
the kernel level that these threads are made ready to run.
I have a small test application that I can post here if necessary. It simply fills a queue, then has a few threads all try and write to it, all with different thread priorities and posting with a message priority equal to the thread priority. I then remove a message from the queue and would expect the highest priority thread wake up and post its message. However, the first thread to wait posts its message first.
Any help or documentation anyone can point me to in order to get to the bottom of this?
Thanks in advance!

Turns out the Linux kernel looks at the tasks' priority value if the queue is full and adds them to a wait queue in task niceness order (which is priority order for non-RT tasks). The wait queue does not honor real-time priorities which my application uses. Non-RT priorities (nice values) are being handled properly and wake up in niceness order.
The root cause of my issue down in how the kernel was handling priorities when adding tasks to the internal kernel wait queues. I submitted a patch to the linux-kernel list which was accepted and will be rolled into future releases which changes the priority check when adding tasks to the wait queue - it now honors both non-RT priority as well as RT priority. It does NOT handle priorities of deadline scheduled tasks.

Related

vxworks message queue "lost" a task blocked on it. What can be a reason?

We have a quite large multitasking communication system implemented on Vxworks 5.5 and PPC8260. The system should handle a lot of Ethernet traffic and also handle some cyclic peripheral control activities via RS-232, memory mapped I/O etc. What happens is that in some moment few message queues we are using for inter task communication become overflowed (I see it by log inspection). When I check the status of tasks responsible for serving this Message queues (that is doing receive on them) they appear to be READY.When I inspect msgQShow for the queues themselves they are full but no tasks appear to be blocked on them. But looking at task stack trace shows that a task actually pending inside msgQReceive call.Specifically in the qJobGet kernel call or something alike.
It is unlikely in the extreme that a message queue "lost" a task that was blocked on it.
From your description we can assume:
A message Queue has overflowed. Presumably you have detected this by checking the return value from msgQSend, which has been invoked either with a timeout value, or NO_WAIT.
msgQShow confirms the Q is full
Tasks that should be reading the queue are in the READY state.
The READY state is the state that tasks are in when they are available to run. Tasks are held in a queue (strictly, one queue per priority level), and when the reach the head of the queue they will be scheduled.
If tasks are persistently showing as READY, that suggests that they are not getting CPU time. The fact that the msgQ does not appear to empty supports that.
You should use tools such as system viewer to diagnose. You may need to raise the priority of the reader tasks. If your msgQSend is using NO_WAIT, you may need to use a timeout value

How the epoll(), mutex and semaphore alike system calls are implemented behind the scene?

This is really a question confusing me for a long time. I tried googling a lot but still don't quite understand. My question is like this:
for system calls such as epoll(), mutex and semaphore, they have one thing in common: as soon as something happens(taking mutex for example, a thread release the lock), then a thread get woken up(the thread who are waiting for the lock can be woken up).
I'm wondering how is this mechanism(an event in one thread happens, then another thread is notified about this) implemented on earth behind the scene? I can only come up with 2 ways:
Hardware level interrupt: For example, as soon as another thread releases the lock, an edge trigger will happen.
Busy waiting: busy waiting in very low level. for example, as soon as another thread releases the lock, it will change a bit from 0 to 1 so that threads who are waiting for the lock can check this bit.
I'm not sure which of my guess, if any, is correct. I guess reading linux source code can help here. But it's sort of hard to a noob like me. It will be great to have a general idea here plus some pseudo code.
Linux kernel has a built-in object class called "wait queue" (other OSes have similar mechanisms). Wait queues are created for all types of "waitable" resources, so there are quite a few of them around the kernel. When thread detects that it must wait for a resource, it joins the relevant wait queue. The process goes roughly as following:
Thread adds its control structure to the linked list associated with the desired wait queue.
Thread calls scheduler, which marks the calling thread as sleeping, removes it from "ready to run" list and stashes its context away from the CPU. The scheduler is then free to select any other thread context to load onto the CPU instead.
When the resource becomes available, another thread (be it a user/kernel thread or a task scheduled by an interrupt handler - those usually piggy back on special "work queue" threads) invokes a "wake up" call on the relevant wait queue. "Wake up" means, that scheduler shall remove one or more thread control structures from the wait queue linked list and add all those threads to the "ready to run" list, which will enable them to be scheduled in due course.
A bit more technical overview is here:
http://www.makelinux.net/ldd3/chp-6-sect-2

Thread scheduling in unix

In RR scheduling policy what will happen if a low priority thread locks a mutex and is removed by the scheduler because another high priority thread is waiting?
Will it also releases the lock held by low priority thread?
For example consider 3 threads running in a process with priorities 10,20 and 30 in RR scheduling policy.
Now at a given point of time low priority thread 1 locks mutex and is still doing execution mean while the high priority thread pops in and also waits on mutex held by thread 1. Now thread 2 comes in to picture which also needs the same mutex locked by thread 1.
As far as I know as per scheduling algorithm the threads sleeping or waiting for mutex,semaphore etc are removed and the other ones, even having low priority are allowed to execute. Is this correct? If so, in the above example ultimately high priority threads wait for the completion of low priority thread which doesn't make any sense.
Is this how the system works if at all threads are designed like I said above?
or
the thread priority should be set in such a way that high priority one's will not depend on low priority one's mutexe's ?
Also can anyone please explain me how scheduling works at process level? How do we set priority for a process?
Typically, scheduling and locks are unrelated in any other aspect than a "waiting thread is not scheduled until it's finished waiting". It would be rather daft to have a MUTEX that "stops other thread from accessing my data" but it only works if the other thread has the same or lower priority than the current thread.
The phenomena of "a low priority holds a lock that a high priority thread 'needs'" is called priority inversion, and it's a well known scenario in computer theory.
There are some schemes that "temporarily increase priority of a lock-holding thread until it releases the lock to the highest priority of the waiting threads" (or the priority of the first waiting thread if it's higher than the current thread, or some other variation on that theme). This is done to combat priority inversion - but it has other drawbacks too, so it's not implemented in all OS's/schedulers (after all, it affects other threads than the one that is waiting too).
Edit:
The point of a mutex (or other similar locks) is that it prevents two threads from accessing the same resources at once. For example, say we want to update five different variables with some pretty lengthy processing (complicated math, fetching data from a serial port or a network drive, or some such), but if we only do two of the variables, some other process using these would get an invalid result, then we clearly can't "let go" of the lock.
The high priority thread simply has to wait for all five variables to be updated and the low priority lock.
There is no simple workaround that the application can do to "fix" this problem - don't hold the locks more than necessary of course [and it may be that we can actually fix the problem described above, by performing the lengthy processing OUTSIDE of the lock, and only do the final "store it in the 5 variables" with the lock on. This would reduce the potential time that the high priority thread has to wait, but if the system is really busy, it won't really fix the problem.
There are whole PhD thesis written on this subject, and I'm not a specialist on "how to write a scheduler" - I have a fair idea how one works, just like I know how the engine in my car works - but if someone gave me a bunch of suitable basic shapes of steel and aluminium, and the required tools/workspace and told me to build an engine, I doubt it would work great...Same with a scheduler - I know what the parts are called, but not how to build one.
If a high priority thread needs to wait on a low priority thread (mutex semaphore etc), the low priority thread is temporarily elevated to the same priority as the high priority thread.
The high priority thread is not going to have the lock for which it is requesting until the low priority thread will unlock it.
To avoid this we can use semaphore where any other thread can initiate to unlock but in mutex it is not possible.

prevent linux thread from being interrupted by scheduler

How do you tell the thread scheduler in linux to not interrupt your thread for any reason? I am programming in user mode. Does simply locking a mutex acomplish this? I want to prevent other threads in my process from being scheduled when a certain function is executing. They would block and I would be wasting cpu cycles with context switches. I want any thread executing the function to be able to finish executing without interruption even if the threads' timeslice is exceeded.
How do you tell the thread scheduler in linux to not interrupt your thread for any reason?
Can't really be done, you need a real time system for that. The closes thing you'll get with linux is to
set the scheduling policy to a realtime scheduler, e.g. SCHED_FIFO, and also set the PTHREAD_EXPLICIT_SCHED attribute. See e.g. here , even now though, e.g. irq handlers and other other stuff will interrupt your thread and run.
However, if you only care about the threads in your own process not being able to do anything, then yes, having them block on a mutex your running thread holds is sufficient.
The hard part is to coordinate all the other threads to grab that mutex whenever your thread needs to do its thing.
You should architect your sw so you're not dependent on the scheduler doing the "right" thing from your app's point of view. The scheduler is complicated. It will do what it thinks is best.
Context switches are cheap. You say
I would be wasting cpu cycles with context switches.
but you should not look at it that way. Use the multi-threaded machinery of mutexes and blocked / waiting processes. The machinery is there for you to use...
You can't. If you could what would prevent your thread from never releasing the request and starving other threads.
The best you can do is set your threads priority so that the scheduler will prefer it over lower priority threads.
Why not simply let the competing threads block, then the scheduler will have nothing left to schedule but your living thread? Why complicate the design second guessing the scheduler?
Look into real time scheduling under Linux. I've never done it, but if you indeed do NEED this this is as close as you can get in user application code.
What you seem to be scared of isn't really that big of a deal though. You can't stop the kernel from interrupting your programs for real interrupts or of a higher priority task wants to run, but with regular scheduling the kernel does uses it's own computed priority value which pretty much handles most of what you are worried about. If thread A is holding resource X exclusively (X could be a lock) and thread B is waiting on resource X to become available then A's effective priority will be at least as high as B's priority. It also takes into account if a process is using up lots of cpu or if it is spending lots of time sleeping to compute the priority. Of course, the nice value goes in there too.

what is kernel thread dispatching?

Can someone give me an easy to understand definition of kernel thread dispatching or just thread dispatching if there's no difference between the two?
From what I understand it's just doing a context switch while the currently active thread waits on a lock from another thread, so the CPU goes and does something else while this thread is in blocking mode.
I might however have misunderstood.
It's basically the process by which the operating system determines which of the many active threads is sent (dispatched) to the CPU for processing at any given point.
Each operating system has its own implementation, but the basic concept is to keep a sorted list of threads by priority, and dispatch them as needed to the CPU. Time slicing is added to allow multiple programs to run concurrently, etc.

Resources