Kthread and Schedule() slowing down the code. OR how to sleep in kernel - multithreading

Using module_init I have created and woken up a kthread. In order to keep it alive and also do my function task, I used the following approach. That was the only approach I could make it running since I am changing the flag in an interrupt. Now I am facing an unbelievably drop in the performance of the code. I narrowed down a problem to the following piece of code:
while(1){
//Do my tasks here after changing flag
while(get_flag() ){ //Waiting for a flag, to basically do my Func in the previous line.
schedule();
}
}//to keep a kthread alive after initial create.
Details about dropping the performance: without using the second while(1) which includes schedule, the rate of data transmission in my code is 35MB/s but with this little line, it drops to 5MB/s.
Is there any other way that I can make a kthread sleep and wait for a flag change?

Ideally, This is not the way you should do this in Kernel. But if you have to do it this way.
See if you are doing a blocking check for the flag? If that is the case, change it to non-blocking wait, just check for the flag and schedule that should be enough in most of the cases. The scheduling algorithm will make sure to get the fair share of CPU for all the processes. Also, if you are doing a blocking check for flag you are unnecessarily wasting CPU cycles since you are doing the processing only on the next scheduler slice. with the same logic, if you want to get better performance, you should wake up your waiting process from your producer thread with wakeup_task()
-or-
if you just want to achieve the functionality, I feel the right way to do it is the following method. using a wait queue, wait_even_interruptible() and wake_up_interruptible()
From your above said kernel thread you just need to call the wait_event_interruptible
see the pseudo code below
while (1){
wait_event_interruptible(wq, your_flag)
{
<do your task>
}
}
and from the place you are setting the flag
{
<some event>
<set flag>
wake_up_interruptible (wq)
}
You don't have to call the schedule explicitly.

Related

Will Go's scheduler yield control from one goroutine to another for CPU-intensive work?

The accepted answer at golang methods that will yield goroutines explains that Go's scheduler will yield control from one goroutine to another when a syscall is encountered. I understand that this means if you have multiple goroutines running, and one begins to wait for something like an HTTP response, the scheduler can use this as a hint to yield control from that goroutine to another.
But what about situations where there are no syscalls involved? What if, for example, you had as many goroutines running as logical CPU cores/threads available, and each were in the middle of a CPU-intensive calculation that involved no syscalls. In theory, this would saturate the CPU's ability to do work. Would the Go scheduler still be able to detect an opportunity to yield control from one of these goroutines to another, that perhaps wouldn't take as long to run, and then return control back to one of these goroutines performing the long CPU-intensive calculation?
There are few if any promises here.
The Go 1.14 release notes says this in the Runtime section:
Goroutines are now asynchronously preemptible. As a result, loops without function calls no longer potentially deadlock the scheduler or significantly delay garbage collection. This is supported on all platforms except windows/arm, darwin/arm, js/wasm, and plan9/*.
A consequence of the implementation of preemption is that on Unix systems, including Linux and macOS systems, programs built with Go 1.14 will receive more signals than programs built with earlier releases. This means that programs that use packages like syscall or golang.org/x/sys/unix will see more slow system calls fail with EINTR errors. ...
I quoted part of the third paragraph here because this gives us a big clue as to how this asynchronous preemption works: the runtime system has the OS deliver some OS signal (SIGALRM, SIGVTALRM, etc.) on some sort of schedule (real or virtual time). This allows the Go runtime to implement the same kind of schedulers that real OSes implement with real (hardware) or virtual (virtualized hardware) timers. As with OS schedulers, it's up to the runtime to decide what to do with the clock ticks: perhaps just run the GC code, for instance.
We also see a list of platforms that don't do it. So we probably should not assume it will happen at all.
Fortunately, the runtime source is actually available: we can go look to see what does happen, should any given platform implement it. This shows that in runtime/signal_unix.go:
// We use SIGURG because it meets all of these criteria, is extremely
// unlikely to be used by an application for its "real" meaning (both
// because out-of-band data is basically unused and because SIGURG
// doesn't report which socket has the condition, making it pretty
// useless), and even if it is, the application has to be ready for
// spurious SIGURG. SIGIO wouldn't be a bad choice either, but is more
// likely to be used for real.
const sigPreempt = _SIGURG
and:
// doSigPreempt handles a preemption signal on gp.
func doSigPreempt(gp *g, ctxt *sigctxt) {
// Check if this G wants to be preempted and is safe to
// preempt.
if wantAsyncPreempt(gp) && isAsyncSafePoint(gp, ctxt.sigpc(), ctxt.sigsp(), ctxt.siglr()) {
// Inject a call to asyncPreempt.
ctxt.pushCall(funcPC(asyncPreempt))
}
// Acknowledge the preemption.
atomic.Xadd(&gp.m.preemptGen, 1)
atomic.Store(&gp.m.signalPending, 0)
}
The actual asyncPreempt function is in assembly, but it just does some assembly-only trickery to save user registers, and then calls asyncPreempt2 which is in runtime/preempt.go:
//go:nosplit
func asyncPreempt2() {
gp := getg()
gp.asyncSafePoint = true
if gp.preemptStop {
mcall(preemptPark)
} else {
mcall(gopreempt_m)
}
gp.asyncSafePoint = false
}
Compare this to runtime/proc.go's Gosched function (documented as the way to voluntarily yield):
//go:nosplit
// Gosched yields the processor, allowing other goroutines to run. It does not
// suspend the current goroutine, so execution resumes automatically.
func Gosched() {
checkTimeouts()
mcall(gosched_m)
}
We see the main differences include some "async safe point" stuff and that we arrange for an M-stack-call to gopreempt_m instead of gosched_m. So, apart from the safety check stuff and a different trace call (not shown here) the involuntary preemption is almost exactly the same as voluntary preemption.
To find this, we had to dig rather deep into the (Go 1.14, in this case) implementation. One might not want to depend too much on this.
A little bit more on this to complete #torek's answer.
Goroutines are interruptible when there is a syscall, but also when a routine is waiting on a lock, a chan or sleeping.
As #torek's said, since 1.14 routines can also be preempted even when they do none of the above. The scheduler can mark any routine as preemptible after it ran for more than 10ms.
More reading there: https://medium.com/a-journey-with-go/go-goroutine-and-preemption-d6bc2aa2f4b7

interrupt switch (PIC)

#define SW1 RB5
int IOFlag = 2; //while in out
void SW(){
if(!RB5)
__delay_ms(50);
while(!RB5);
__delay_ms(50);
IOFlag++;
}
void main(){
SW();
while(IOFlag % 2 != 0){
SW();
//some routines..
}
}
I used pic16f73, RB5 input use for switch.
When some of the routine is running, switch is not operating properly.
It is possible if you use the interrupt. However I don't know how to use it properly.
You need to understand the difference between polling and interrupts.
With polling (what you appear to be doing), you periodically check the state of some "thing" and act on it.
With interrupts, the "thing" happening causes your main thread of execution to be suspended, and an interrupt service routine (ISR) run.
Polling has the disadvantage of potentially long latency, the time between the thing happening and you finding out about it. In fact, you can even lose events if the thing is a momentary switch for example - you switch it on then off then, when the code checks for it, it's off.
Now you can still use polling if you wish, provided you understand these implications. Sometimes the easiest solution is to poll more often.
For example, if one of your //some routines.. jobs is a long running loop, you can poll from within there:
for (int i = 0; i < numThings; i++) {
doSomethingQuickWitn (thing[i]);
SW(); // Poll here as well
}
// Rather than here.
However, for _minimal latency, using interrupts is usually better and is reasonably simple once you wrap your head around the concept.
Your ISR (which will run on the given event, interrupting the main thread of execution) simply has to store the fact that the event has happened and communicate that to your main thread somehow.
For situations where you don't care how many times the event has happened, a flag will do the job. Your ISR simply sets the flag and your main thread of execution checks it periodically to see if it's been set, then clears it (with interrupts disabled so as to avoid race conditions). That would be something like (pseudo-code):
global val switchHit = false;
main:
interrupt (7, intFn) // call intfn() on interrupt 7
while true:
disableInts() // disallow interrupts for a short while
if switchHit:
handleSwitch() // switch was hit, do something (quickly)
switchHit = false // mark as not hit
enableInts() // and re-allow interrupts
doLotsOfOtherStuff()
intfn:
switchHit = true // notify main
Note that I'm not worry about race conditions within the ISR, interrupts are generally disabled automatically there.
More complicated information transfer may involve a count rather than a flag, or even a message queue of some sort, flowing from the ISR to the main thread of execution.

Do we need a sleep() while running a forever process in Linux?

I have read that a forever process like daemon should run with a sleep() in their while(1) or for(;;) loop. They say, it is required because otherwise this process will always be in a run queue and the kernel will always run it. This will block the other process. I don't agree that it will block the other process completely. If there is a time slicing, then it will execute other process. But, certainly it will steal a time from others. Making a delay for other process since this process is always in the run state. By default, the Linux runs as a round-robin. The first task is swapd, then other tasks . This is a circular link list with first task as swapd(process-id is 0) and then other tasks. I believe this is still based as time sliced. A particular time for each process. These tasks are nothing but the process-descriptor. I believe this link list is maintained by the init process. Please do correct me here If I am wrong. Other question is if we need to give a sleep() then what should be its value? How can we determine the sleep value to get the best results?
If your program has useful things to do, don't throttle it. A program can move out of the run queue by doing blocking stuff like IO and waiting.
If you are writing a polling loop that can spin an arbitrary number of times you probably want to throttle it a bit with sleep because spinning too often has little value.
That said, polling loops are a means of last resort. Normally, programs perform useful work with every instruction, so they don't sleep at all.
Sleep is almost certainly the wrong solution.
Usually what you do it call a blocking function which wakes you up when there's something for you to do.
For example, if you're a network service you'd want to remain inactive until a request arrives.
In other words, the core of your daemon should not look like this:
while(1)
{
if (checkIfSomethingToDo())
doSomething();
else
sleep(1);
}
but rather a little like this:
while(1)
{
int ret = poll(fds, nfds, -1);
if (ret > 0)
doSomething();
}
Have the kernel put you to sleep until there's actual work to do. It's not hard to implement, you'd be a lot more efficient (not stealing CPU time from others, only to waste it doing no actual work) and your response latency will go down too.
A sleep forces the os to pass execution to another thread and therefore is helpfull, or at least fair. Start with sleep one. Should be ok.

Can I prevent a Linux user space pthread yielding in critical code?

I am working on an user space app for an embedded Linux project using the 2.6.24.3 kernel.
My app passes data between two file nodes by creating 2 pthreads that each sleep until a asynchronous IO operation completes at which point it wakes and runs a completion handler.
The completion handlers need to keep track of how many transfers are pending and maintain a handful of linked lists that one thread will add to and the other will remove.
// sleep here until events arrive or time out expires
for(;;) {
no_of_events = io_getevents(ctx, 1, num_events, events, &timeout);
// Process each aio event that has completed or thrown an error
for (i=0; i<no_of_events; i++) {
// Get pointer to completion handler
io_complete = (io_callback_t) events[i].data;
// Get pointer to data object
iocb = (struct iocb *) events[i].obj;
// Call completion handler and pass it the data object
io_complete(ctx, iocb, events[i].res, events[i].res2);
}
}
My question is this...
Is there a simple way I can prevent the currently active thread from yielding whilst it runs the completion handler rather than going down the mutex/spin lock route?
Or failing that can Linux be configured to prevent yielding a pthread when a mutex/spin lock is held?
You can use the sched_setscheduler() system call to temporarily set the thread's scheduling policy to SCHED_FIFO, then set it back again. From the sched_setscheduler() man page:
A SCHED_FIFO process runs until either
it is blocked by an I/O request, it is
preempted by a higher priority
process, or it calls sched_yield(2).
(In this context, "process" actually means "thread").
However, this is quite a suspicious requirement. What is the problem you are hoping to solve? If you are just trying to protect your linked list of completion handlers from concurrent access, then an ordinary mutex is the way to go. Have the completion thread lock the mutex, remove the list item, unlock the mutex, then call the completion handler.
I think you'll want to use mutexes/locks to prevent race conditions here. Mutexes are by no way voodoo magic and can even make your code simpler than using arbitrary system-specific features, which you'd need to potentially port across systems. Don't know if the latter is an issue for you, though.
I believe you are trying to outsmart the Linux scheduler here, for the wrong reasons.
The correct solution is to use a mutex to prevent completion handlers from running in parallel. Let the scheduler do its job.

How to implement a thread safe timer on linux?

As we know, doing things in signal handlers is really bad, because they run in an interrupt-like context. It's quite possible that various locks (including the malloc() heap lock!) are held when the signal handler is called.
So I want to implement a thread safe timer without using signal mechanism.
How can I do?
Sorry, actually, I'm not expecting answers about thread-safe, but answers about implementing a timer on Unix or Linux which is thread-safe.
Use usleep(3) or sleep(3) in your thread. This will block the thread until the timeout expires.
If you need to wait on I/O and have a timer expire before any I/O is ready, use select(2), poll(2) or epoll(7) with a timeout.
If you still need to use a signal handler, create a pipe with pipe(2), do a blocking read on the read side in your thread, or use select/poll/epoll to wait for it to be ready, and write a byte to the write end of your pipe in the signal handler with write(2). It doesn't matter what you write to the pipe - the idea is to just get your thread to wake up. If you want to multiplex signals on the one pipe, write the signal number or some other ID to the pipe.
You should probably use something like pthreads, the POSIX threads library. It provides not only threads themselves but also basic synchronization primitives like mutexes (locks), conditions, semaphores. Here's a tutorial I found that seems to be decent:
http://www.yolinux.com/TUTORIALS/LinuxTutorialPosixThreads.html
For what it's worth, if you're totally unfamiliar with multithreaded programming, it might be a little easier to learn it in Java or Python, if you know either of those, than in C.
I think the usual way around the problems you describe is to make the signal handlers do only a minimal amount of work. E.g. setting some timer_expired flag. Then you have some thread that regularly checks whether the flag has been set, and does the actual work.
If you don't want to use signals I suppose you'd have to make a thread sleep or busy-wait for the specified time.
Use a Posix interval timer, and have it notify via a signal. Inside the signal handler function almost none of C's functions, like printf() can be used, as they aren't re-entrant.
Use a single global flag, declared static volatile for your signal handler to manipulate. The handler should literally have this one line of code, and NOTHING else; This flag should impact the flow control elsewhere in the 1 & Only thread in the program.
static volatile bool g_zig_instead_of_zag_flg = false;
...
void signal_handler_fnc()
g_zig_instead_of_zag_flg = true;
return
int main() {
if(false == g_zig_instead_of_zag) {
do_zag();
} else {
do_zig();
g_zig_instead_of_zag = false;
return 0;
}
Michael Kerrisk's The Linux Programming Interface has examples of both methods, and a few more, but the examples come with a lot of his own private functions you have to get working, and the examples carefully avoid many of the gotchas they should explore, so not great.
Using the Poxix interval timer that notifies via a thread makes everything a lot worse, and AFAICT, that notification method is pretty much useless. I only say pretty much because I am allowing that there may be SOME case where doing nothing in the main() thread, and everything in the handler thread is useful, but I sure can't think of any such case.

Resources