Big-O of a multi-threading project

Big-O of a multi-threading project - multithreading

Let's suppose that I create a project with 2 threads.
The Big O of them are n! and n respectively and they run at the same time.
When one of them returns what I want, both of them stops.
With that said, it would make sense that the complexity of the algorithm is O(n), although one of the threads has a Big-O of n!, am I right?
P.S. I did my research but none of the answers serve my need, since all of them talk about a problem that is cut in half (O(n/2) per thread instead of O(n) with one thread), while I want to start solving 2 problems at once but both stop when the first one is done.

The analysis of this needs to be more careful.
The thread scheduler may not guarantee that all threads will get a "fair" amount of execution time. Imagine two threads that are both counting up from 1, but the thread scheduler wakes thread A up for 1 step, then B for 1 step, then A for 2 steps, then B for 1 step, then A for 4 steps, and so on.
Thread A will do exponentially more work than thread B in this case, because it is given exponentially more time by the scheduler to do its work. So if thread B signals for thread A to stop after B counts up to n, then thread A would stop after counting up to 2n - 1. The scheduler could be even more unfair, so A's running time cannot be bounded by any function of n.
Given that, if thread A chooses to terminate itself after n! operations, then its running time can only be bounded by O(n!), because we can't guarantee that thread B will have completed its n operations and sent the termination signal within that time.
Now suppose the thread scheduler does guarantee that one thread is never favoured over another by more than some constant factor. In this case, the algorithm in thread B will send a signal to thread A after thread B completes O(n) steps. Since thread A can only complete O(n) steps in the same time (otherwise it would be favoured over thread B by more than a constant factor), then thread A will terminate in O(n) time.
That said, the fact that the algorithm in thread A is checking for a signal and terminating when it receives that signal, implies that O(n!) can't be derived as a tight upper bound just by looking at what thread A does; because it has instructions to terminate when it receives a signal from outside. So at least, there isn't a contradiction.

Related

Do we count the main thread when we compute the recommended number of threads that we can create in C using Pthreads?

I have a computer with 1 cpu, 4 cores and 2 threads per core that can run. So I have effiency with maximum 8 running threads.
When I write a program in C and create threads using pthred_create function, how many threads is recommended to be created: 7 or 8? Do I have to substract the main thread, thus create 7, or main thread should not be counted and I can effiently crete 8? I know that in theory you can create much more, like thousands, but I want to be effiently planned, according with my computer architecture.

Which thread started which is not much relevant. A program's initial thread is a thread: while it is scheduled on an execution unit, no other thread can use that execution unit. You cannot have more threads executing concurrently than you have execution units, and if you have more than that eligible to run at any given time then you will pay the cost of extra context switches without receiving any offsetting gain from additional concurrency.
To a first approximation, then, yes, you must count the initial thread. But read the above carefully. The relevant metric is not how many threads exist at any given time, but rather how many are contending for execution resources. Threads that are currently blocked (on I/O, on acquiring a mutex, on pthread_join(), etc.) do not contend for execution resources.
More precisely, then, it depends on your threads' behavior. For example, if the initial thread follows the pattern of launching a bunch of other threads and then joining them all without itself performing any other work, then no, you do not count that thread, because it does not contend for CPU to any significant degree while the other threads are doing so.

Why is multi threading not faster on single core?

This question is not a duplicate of any question related to why multithreading is not faster on single-core, read the rest to figure out what I actually want to know
As far as I know, multithreading is only faster on a CPU with multiple cores, since each thread can run in parallel. However, as my understanding of how preemption and multithreading on single-core works, it should also be faster. The image below can describe what I mean better. Consider that our app is a simple loop that takes exactly 4 seconds to execute. In this example, the time slice is constant, but, I don't think it makes any difference because, in the end, all threads with the same priority will get equal time by the scheduler. The first timeline is single-threaded, but the second one has 4 threads. The cycle also means when the preemption ends and the scheduler goes back to the queue of threads from start. I/O has also been removed since that just adds complexity and even if it changes the results, let's assume I'm talking about some code that does not require any sort of I/O.
The red threads are threads related to my process, and others (black) are the ones for other processes and apps
There are a couple of questions here:
Why isn't it faster? What's wrong with my timeline?
What's that cycle point called?
Since the time slice is not fixed, does that means the Cycle time is fixed, or the time slice gets calculated and the cycle will be as much time required to spend the calculated time slice on each thread?
Is the slice time based on time or instruction? I mean, is it like 0.1 sec for each thread or like 10 instructions for each thread?
The CPU utilization is based on CPU time, so why isn't it always on 100% because when a thread's time reaches, it moves to the next thread, and if a thread stops on I/O, it does not wait but executes the next one, so the CPU always tries to find a thread to execute and minimalize the time spent IDLE. Is the time for I/O so significant that more than 50% of CPU time is spent doing nothing because all threads are waiting for something, mostly I/O and the CPU time is elapsed waiting for a thread to become in a ready state?
Note: This timeline is simplified, the time spent on I/O, thread creation, etc. is not calculated and it's assumed that other threads do not finish before the end of the timeline and have the same priority/nice value as our process

Starvation Free Mutex In Little Book of Semaphores

Background:
The Little Book of Semaphores by Allen B. Downey talks about assumptions needed to prevent thread starvation.
He states that the scheduler needs to guarantee the following:
Property 2: if a thread is ready to run, then the time it waits until it runs is bounded.
And a weak semaphore guarantees:
Property 3: if there are threads waiting on a semaphore when a thread executes signal, then one of the waiting threads has to be woken.
However, he states that even with these properties, the following code when run for 3 or more threads (Thread A,B,C) can cause starvation:
while True:
mutex.wait()
# critical section
mutex.signal()
The argument is that if A executes first, then wakes up B, A could wait on the mutex again before B releases it. At this point, the A could be woken up again reacquire the mutex and repeat this cycle with B. C would be starved.
Question:
Wouldn't Property 2 guarantee that C would have to be woken up by the scheduler in some finite amount of time? If so, then Thread C couldn't be starved. Even if weak semaphore does not guarantee that Thread C will be woken up, shouldn't the scheduler run it?

I thought about it a little bit more and realized that Property 2 is guarantees that Threads in a RUNNABLE state will be scheduled in a finite amount of time.
The argument in the book states that Thread C would never get to a RUNNABLE state so Property 2 and 3 do not guarantee no starvation.

yield between different processes

I have two C++ codes one called a and one called b. I am running in in a 64 bits Linux, using the Boost threading library.
The a code creates 5 threads which stay in a non-ending loop doing some operation.
The b code creates 5 threads which stay in a non-ending loop invoking yield().
I am on a quadcore machine... When a invoke the a code alone, it gets almost 400% of the CPU usage. When a invoke the b code alone, it gets almost 400% of the CPU usage. I already expected it.
But when running both together, I was expecting that the b code used almost nothing of CPU and a use the 400%. But actually both are using equals slice of the CPU, almost 200%.
My question is, doesn't yield() works between different process? Is there a way to make it work the way I expected?

So you have 4 cores running 4 threads that belong to A. There are 6 threads in the queue - 1 A and 5 B. One of running A threads exhausts its timeslice and returns to the queue. The scheduler chooses the next runnable thread from the queue. What is the probability that this tread belongs to B? 5/6. Ok, this thread is started, it calls sched_yield() and returns back to the queue. What is the probability that the next thread will be a B thread again? 5/6 again!
Process B gets CPU time again and again, and also forces the kernel to do expensive context switches.
sched_yield is intended for one particular case - when one thread makes another thread runnable (for example, unlocks a mutex). If you want to make B wait while A is working on something important - use some synchronization mechanism that can put B to sleep until A wakes it up

Linux uses dynamic thread priority. The static priority you set with nice is just to limit the dynamic priority.
When a thread use his whole timeslice, the kernel will lower it's priority and when a thread do not use his whole timeslice (by doing IO, calling wait/yield, etc) the kernel will increase it's priority.
So my guess is that process b threads have higher priority, so they execute more often.

Behavior of sched_yield

I have few questions about the sched_yield function because I'm seeing that it is not functioning as intended in my code. Many times I see that the same thread runs again and again, even in the presence of other threads, when I try to yield it by calling sched_yield.
Also If I have multicores, will sched_yield yield for threads running on all cores, or only one core. Say for example I have Threads 1, 2 and 3 running on core 1 and Threads 4, 5 and 6 on core 2 and If sched_yield is called from Thread 2, will it be replaced by Thread 1 and 3 only, or 1, 3, 4, 5 and 6 are all possible? I am asking this because in .Net Thread.Yield only yields to threads running on the same core/processor.

http://www.kernel.org/doc/man-pages/online/pages/man2/sched_yield.2.html
sched_yield() causes the calling thread to relinquish the CPU. The thread is
moved to the end of the queue for its static priority and a new thread gets to
run.
If the calling thread is the only thread in the highest priority list at that
time, it will continue to run after a call to sched_yield().
The sched_yield is not a .Net call and the threading/process model is different to. The scheduler of Windows/.NET is not the same as scheduler of Linux. Linux even have several possible schedulers.
So, your expectations about sched_yield is wrong.
If you want to control, how threads are run, you can bind each thread to CPU. Then, threads will run only on binded CPU. If you will have several threads, binded to the same CPU, the using of sched_yield MAY switch to another thread which is binded to current CPU and is ready to run.
Also it can be bad idea to use several threads per CPU if each thread want to do a lot of CPU-intensive work.
If you want to get full control, how threads are run, you can use RealTime threads. http://www.linuxjournal.com/magazine/real-time-linux-kernel-scheduler - SCHED_FIFO and SCHED_RR RT policies.

Why would you want to give up the CPU? Well...
I am fixing a bug in client code. Basically, they have a shared struct that holds information:
how many coins in escrow - return them
how many bills in escrow - return them
The above are in process A. the rest of the code is in process B. Process B sends messages to Process A to compute and return the money in escrow (this is a vending machine). Without getting into the history of why the code is written this way, the original code sequence is:
(Process B)
send message RETURN_ESCROWED_BILLS to Process A
send message RETURN_ESCROWED COINS to Process A
Zero out the common structure
This gets executed as:
(Process B):
send the messages;
zero out the struct;
(later .. Process A):
get the messages;
all fields in common struct are 0;
nothing to do;
oops ... the money is still in escrow, but the process A code has lost that knowledge. What is really needed (other than a massive restructuring of the code) is:
(Process B):
send the messages;
yield the CPU;
(Process A):
determine the escrowed money and return;
yield the CPU; (could just go to the end of the timeslice, no special method needed)
(Process B):
zero out the common struct;
Any time you have IPC messages and the processes that send/receive the messages are tightly coupled. the best way is to have a two-way handshake. However, there are cases out there (usually as the result of a poor or non-existent design) where you must tightly couple processes that are really supposed to be loosely coupled. Usually the yield of the CPU is a hack because you do not have a choice. The addition of multicore CPUs is a source of pain, especially when porting from a single core to a multi-core.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string