yield between different processes - linux

I have two C++ codes one called a and one called b. I am running in in a 64 bits Linux, using the Boost threading library.
The a code creates 5 threads which stay in a non-ending loop doing some operation.
The b code creates 5 threads which stay in a non-ending loop invoking yield().
I am on a quadcore machine... When a invoke the a code alone, it gets almost 400% of the CPU usage. When a invoke the b code alone, it gets almost 400% of the CPU usage. I already expected it.
But when running both together, I was expecting that the b code used almost nothing of CPU and a use the 400%. But actually both are using equals slice of the CPU, almost 200%.
My question is, doesn't yield() works between different process? Is there a way to make it work the way I expected?

So you have 4 cores running 4 threads that belong to A. There are 6 threads in the queue - 1 A and 5 B. One of running A threads exhausts its timeslice and returns to the queue. The scheduler chooses the next runnable thread from the queue. What is the probability that this tread belongs to B? 5/6. Ok, this thread is started, it calls sched_yield() and returns back to the queue. What is the probability that the next thread will be a B thread again? 5/6 again!
Process B gets CPU time again and again, and also forces the kernel to do expensive context switches.
sched_yield is intended for one particular case - when one thread makes another thread runnable (for example, unlocks a mutex). If you want to make B wait while A is working on something important - use some synchronization mechanism that can put B to sleep until A wakes it up

Linux uses dynamic thread priority. The static priority you set with nice is just to limit the dynamic priority.
When a thread use his whole timeslice, the kernel will lower it's priority and when a thread do not use his whole timeslice (by doing IO, calling wait/yield, etc) the kernel will increase it's priority.
So my guess is that process b threads have higher priority, so they execute more often.

Related

Do we count the main thread when we compute the recommended number of threads that we can create in C using Pthreads?

I have a computer with 1 cpu, 4 cores and 2 threads per core that can run. So I have effiency with maximum 8 running threads.
When I write a program in C and create threads using pthred_create function, how many threads is recommended to be created: 7 or 8? Do I have to substract the main thread, thus create 7, or main thread should not be counted and I can effiently crete 8? I know that in theory you can create much more, like thousands, but I want to be effiently planned, according with my computer architecture.
Which thread started which is not much relevant. A program's initial thread is a thread: while it is scheduled on an execution unit, no other thread can use that execution unit. You cannot have more threads executing concurrently than you have execution units, and if you have more than that eligible to run at any given time then you will pay the cost of extra context switches without receiving any offsetting gain from additional concurrency.
To a first approximation, then, yes, you must count the initial thread. But read the above carefully. The relevant metric is not how many threads exist at any given time, but rather how many are contending for execution resources. Threads that are currently blocked (on I/O, on acquiring a mutex, on pthread_join(), etc.) do not contend for execution resources.
More precisely, then, it depends on your threads' behavior. For example, if the initial thread follows the pattern of launching a bunch of other threads and then joining them all without itself performing any other work, then no, you do not count that thread, because it does not contend for CPU to any significant degree while the other threads are doing so.

At which point in the thread can the scheduler leave and start another thread?

Let's say I have two threads running and each one has a for loop , is it possible for the scheduler to shift from thread1 to thread2 while thread1 is in the middle of an iteration in the for loop ? So let's say for loop is running the nth iteration and while in the middle of it scheduler schedules the other thread. Is that possible ?
Of course it is. The scheduler can switch threads any time it wants to.
is it possible for the scheduler to shift from thread1 to thread2
while thread1 is in the middle of an iteration in the for loop ?
Assuming you are running your program on an operating system that supports pre-emptive multitasking(1), yes. In particular, the operating system's scheduler will have a timer set up that triggers an interrupt once per quantum (where a quantum is typically on the range of 5-10 milliseconds), and when the interrupt routine is called, it will decide whether or not to continue running the current thread or switch to a different thread. This is what allows the computer to appear to be doing multiple operations at once, even on a single-core CPU.
Note that there were (and probably still are) OS's (such as the classic Macintosh OS) that did not have pre-emptive multitasking; when writing a program for that OS, you had to be careful to explicitly call WaitNextEvent() every so often, otherwise no other programs would ever be allowed to execute. This made writing timing-sensitive programs like mp3 players difficult, and also meant that a single poorly-written program could easily freeze up the computer (requiring a power cycle to recover) simply by entering into an infinite loop that didn't include a call to WaitNextEvent().
(1) which is to say, pretty much any modern consumer-facing OS

golang thread count misleading

I have written a small application on go, which starts 4 threads for doing various things + one main thread. So in total there are 5 threads. But if I'll start activity monitor and monitor the process, this is what I see
First of all why 7 threads. And it is not constant. Sometimes it is 5 and other times it is 7. Also all 4 threads started by main thread ends after doing hat they are suppose to. I verify that threads end by putting a differ statement on the top of thread. Still thread count in Activity monitor stays 7.
Does anyone knows what is going on over here? Are these extra threads started by go runtime? Is there a way to find out how many threads are active my program that are started by my code and not by go runtime.
Yes they are started by the runtime, for example http://play.golang.org/p/c0cIngo_sO it will print 4 goroutines are running.
Goroutines aren't threads, 1 OS thread can handle 100s of goroutines, however if you're doing something heavy or using a blocking system call, the runtime will start a new thread to handle the other goroutines.
I suppose you mean Goroutines when you say threads.
The Go runtime transparently multiplexes lightweight Goroutines onto OS threads. That's also why you don't need to call functions like select()—that's the runtime's job.
If you spawn 7 Go routines and some of them block, the runtime might decide to terminate the idle OS threads. This is why you see less threads than Go routines.
I think you mistake Goroutines for thread.
In your go program, the thread you mean is actually goroutine ,which is a coroutine and is not a real thread , which is implemented by go's runtime(you need to know about go runtime, every go program is running on a runtime, and runtime actually use thread to implement goroutines).Diffrent goroutine may be running in the same thread, or may be not ,but you never know . You can use runtime.GOMAXPROCS for multi-core cpu .
And the threads you see in the monitor are real threads .

In Linux scheduler, how do different processes containing multiple threads get fair time quota?

I know linux scheduler will schedule the task_struct which is a thread. Then if we have two processes, e.g., A contains 100 threads while B is single thread, how can the two processes be scheduled fairly, considering if each thread would be scheduled fairly?
In addition, so in Linux, context switch between threads from the same process would be faster than that between threads from different processes, right? Since the latter will have something to do with process control block while the former wouldn't.
The point you are missing here is, how scheduler looks at threads or tasks. Well, the Linux kernel scheduler will treat them as individual scheduling entity, therefore will be counted and scheduled differently.
Now let's see what CFS documentation says - it has a simplistic approach of giving out even slice of CPU time to each runnable process, therefore, if there are 4 runnable process/threads they'll get 25% of cpu time each. But on real hardware it's not possible and to fix the issue vruntime was introduced (take more on this from here
Now come back to your example, if process A creates 100 threads and B creates 1 thread then the # of running processes or threads becomes 103 (assuming all are runnable state) then CFS will evenly share the cpu using formula 1/103 (cpu/number of running tasks). And the context switching is same for all the scheduling entities, threads only shares task's internal mm_struct and when they run they have their own sets of registers, task status to load up to start with. Hope this will help to understand better.

Behavior of sched_yield

I have few questions about the sched_yield function because I'm seeing that it is not functioning as intended in my code. Many times I see that the same thread runs again and again, even in the presence of other threads, when I try to yield it by calling sched_yield.
Also If I have multicores, will sched_yield yield for threads running on all cores, or only one core. Say for example I have Threads 1, 2 and 3 running on core 1 and Threads 4, 5 and 6 on core 2 and If sched_yield is called from Thread 2, will it be replaced by Thread 1 and 3 only, or 1, 3, 4, 5 and 6 are all possible? I am asking this because in .Net Thread.Yield only yields to threads running on the same core/processor.
http://www.kernel.org/doc/man-pages/online/pages/man2/sched_yield.2.html
sched_yield() causes the calling thread to relinquish the CPU. The thread is
moved to the end of the queue for its static priority and a new thread gets to
run.
If the calling thread is the only thread in the highest priority list at that
time, it will continue to run after a call to sched_yield().
The sched_yield is not a .Net call and the threading/process model is different to. The scheduler of Windows/.NET is not the same as scheduler of Linux. Linux even have several possible schedulers.
So, your expectations about sched_yield is wrong.
If you want to control, how threads are run, you can bind each thread to CPU. Then, threads will run only on binded CPU. If you will have several threads, binded to the same CPU, the using of sched_yield MAY switch to another thread which is binded to current CPU and is ready to run.
Also it can be bad idea to use several threads per CPU if each thread want to do a lot of CPU-intensive work.
If you want to get full control, how threads are run, you can use RealTime threads. http://www.linuxjournal.com/magazine/real-time-linux-kernel-scheduler - SCHED_FIFO and SCHED_RR RT policies.
Why would you want to give up the CPU? Well...
I am fixing a bug in client code. Basically, they have a shared struct that holds information:
how many coins in escrow - return them
how many bills in escrow - return them
The above are in process A. the rest of the code is in process B. Process B sends messages to Process A to compute and return the money in escrow (this is a vending machine). Without getting into the history of why the code is written this way, the original code sequence is:
(Process B)
send message RETURN_ESCROWED_BILLS to Process A
send message RETURN_ESCROWED COINS to Process A
Zero out the common structure
This gets executed as:
(Process B):
send the messages;
zero out the struct;
(later .. Process A):
get the messages;
all fields in common struct are 0;
nothing to do;
oops ... the money is still in escrow, but the process A code has lost that knowledge. What is really needed (other than a massive restructuring of the code) is:
(Process B):
send the messages;
yield the CPU;
(Process A):
determine the escrowed money and return;
yield the CPU; (could just go to the end of the timeslice, no special method needed)
(Process B):
zero out the common struct;
Any time you have IPC messages and the processes that send/receive the messages are tightly coupled. the best way is to have a two-way handshake. However, there are cases out there (usually as the result of a poor or non-existent design) where you must tightly couple processes that are really supposed to be loosely coupled. Usually the yield of the CPU is a hack because you do not have a choice. The addition of multicore CPUs is a source of pain, especially when porting from a single core to a multi-core.

Resources