Can we have race conditions in a single-thread program?

Can we have race conditions in a single-thread program? - multithreading

You can find on here a very good explanation about what is a race condition.
I have seen recently many people making confusing statements about race conditions and threads.
I have learned that race conditions could only occur between threads. But I saw code that looked like race conditions, in event and asynchronous based languages, even if the program was single thread, like in Node.js, in GTK+, etc.
Can we have a race condition in a single thread program?

All examples are in a fictional language very close to Javascript.
Short:
A race condition can only occur between two or more threads / external state (one of them can be the OS). We cannot have race conditions inside a single thread process, non I/O doing program.
But a single thread program can in many cases :
give situations which looks similar to race conditions, like in event based program with an event loop, but are not real race conditions
trigger a race condition between or with other thread(s), for example, or because the execution of some parts of the program depends on external state :
other programs, like clients
library threads or servers
system clock
I) Race conditions can only occur with two or more threads
A race condition can only occur when two or more threads try to access a shared resource without knowing it is modified at the same time by unknown instructions from the other thread(s). This gives an undetermined result. (This is really important.)
A single thread process is nothing more than a sequence of known instructions which therefore results in a determined result, even if the execution order of instructions is not easy to read in the code.
II) But we are not safe
II.1) Situations similar to race conditions
Many programming languages implements asynchronous programming features through events or signals, handled by a main loop or event loop which check for the event queue and trigger the listeners. Example of this are Javascript, libuevent, reactPHP, GNOME GLib... Sometimes, we can find situations which seems to be race conditions, but they are not.
The way the event loop is called is always known, so the result is determined, even if the execution order of instructions is not easy to read (or even cannot be read if we do not know the library).
Example:
setTimeout(
function() { console.log("EVENT LOOP CALLED"); },
1
); // We want to print EVENT LOOP CALLED after 1 milliseconds
var now = new Date();
while(new Date() - now < 10) //We do something during 10 milliseconds
console.log("EVENT LOOP NOT CALLED");
in Javascript output is always (you can test in node.js) :
EVENT LOOP NOT CALLED
EVENT LOOP CALLED
because, the event loop is called when the stack is empty (all functions have returned).
Be aware that this is just an example and that in languages that implements events in a different way, the result might be different, but it would still be determined by the implementation.
II.2) Race condition between other threads, for example :
II.2.i) With other programs like clients
If other processes are requesting our process, that our program do not treat requests in an atomic way, and that our process share some resources between the requests, there might be a race condition between clients.
Example:
var step;
on('requestOpen')(
function() {
step = 0;
}
);
on('requestData')(
function() {
step = step + 1;
}
);
on('requestEnd')(
function() {
step = step +1; //step should be 2 after that
sendResponse(step);
}
);
Here, we have a classical race condition setup. If a request is opened just before another ends, step will be reset to 0. If two requestData events are triggered before the requestEnd because of two concurrent requests, step will reach 3. But this is because we take the sequence of events as undetermined. We expect that the result of a program is most of the time undetermined with an undetermined input.
In fact, if our program is single thread, given a sequence of events the result is still always determined. The race condition is between clients.
There is two ways to understand the thing :
We can consider clients as part of our program (why not ?) and in this case, our program is multi thread. End of the story.
More commonly we can consider that clients are not part of our program. In this case they are just input. And when we consider if a program has a determined result or not, we do that with input given. Otherwise even the simplest program return input; would have a undetermined result.
Note that :
if our process treat request in an atomic way, it is the same as if there was a mutex between client, and there is no race condition.
if we can identify request and attach the variable to a request object which is the same at every step of the request, there is no shared resource between clients and no race condition
II.2.ii) With library thread(s)
In our programs, we often use libraries which spawn other processes or threads, or that just do I/O with other processes (and I/O is always undetermined).
Example :
databaseClient.sendRequest('add Me to the database');
databaseClient.sendRequest('remove Me from the database');
This can trigger a race condition in an asynchronous library. This is the case if sendRequest() returns after having sent the request to the database, but before the request is really executed. We immediately send another request and we cannot know if the first will be executed before the second is evaluated, because database works on another thread. There is a race condition between the program and the database process.
But, if the database was on the same thread as the program (which in real life does not happen often) is would be impossible that sendRequest returns before the request is processed. (Unless the request is queued, but in this case, the result is still determined as we know exactly how and when the queue is read.)
II.2.i) With system clock
#mingwei-samuel answer gives an example of a race condition with a single thread JS program, between to setTimeout callback. Actually, once both setTimeout are called, the execution order is already determined. This order depends on the system clock state (so, an external thread) at the time of setTimeout call.
Conclusion
In short, single-thread programs are not free from trigerring race conditions. But they can only occur with or between other threads of external programs. The result of our program might be undetermined, because the input our program receive from those other programs is undetermined.

Race conditions can occur with any system that has concurrently executing processes that create state changes in external processes, examples of which include :
multithreading,
event loops,
multiprocessing,
instruction level parallelism where out-of-order execution of instructions has to take care to avoid race conditions,
circuit design,
dating (romance),
real races in e.g. the olympic games.

Yes.
A "race condition" is a situation when the result of a program can change depending on the order operations are run (threads, async tasks, individual instructions, etc).
For example, in Javascript:
setTimeout(() => console.log("Hello"), 10);
setTimeout(() => setTimeout(() => console.log("World"), 4), 4);
// VM812:1 Hello
// VM812:2 World
setTimeout(() => console.log("Hello"), 10);
setTimeout(() => setTimeout(() => console.log("World"), 4), 4);
// VM815:2 World
// VM815:1 Hello
So clearly this code depends on how the JS event loop works, how tasks are ordered/chosen, what other events occurred during execution, and even how your operating system chose to schedule the JS runtime process.
This is contrived, but a real program could have a situation where "Hello" needs to be run before "World", which could result in some nasty non-deterministic bugs. How people could consider this not a "real" race condition, I'm not sure.
Data Races
It is not possible to have data races in single threaded code.
A "data race" is multiple threads accessing a shared resource at the same time in an inconstant way, or specifically for memory: multiple threads accessing the same memory, where one (or more) is writing. Of course, with a single thread this is not possible.
This seems to be what #jillro's answer is talking about.
Note: the exact definitions of "race condition" and "data race" are not agreed upon. But if it looks like a race condition, acts like a race condition, and causes nasty non-deterministic bugs like a race condition, then I think it should be called a race condition.

Related

Synchronize multiple pthreads in a

I'm discovering the pthread library (in C) and I'm having some trouble understanding well a few things.
First of all, I understand what a mutex is, I understand how it works, ok, I also understand the concept of the cond, but I can't manage to use it properly (I don't really get how to combine the mutex and the cond)
This is, in pseudo-code, what I want to do :
thread :
loop :
// do something
end loop
end thread
So there is n threads, but each thread uses the same function. I want the inside of the loop to be executed in parallel by all the threads BUT each thread must be in the same iteration of the loop, meaning I don't care in what order the instructions inside the loop are executed between threads, but to start iteration 2 of a thread, all the other threads must have finished iteration 1 (etc).
So my question is : how do you do that ? Not particularly in a specific example, but theoretically.
EDIT
I manage to do it, I don't know if it's the proper way, but it's working :
global nbOfThreads
global nbOfIterations
thread :
lock(mutex0)
unlock(mutex0)
loop :
// Do something
lock(mutex1)
nbOfIterations++
if (nbOfIterations == nbOfThread) :
nbOfIterations = 0
broadcast(cond)
unlock(mutex1)
continue
end if
wait(cond, mutex1)
unlock(mutex1)
end loop
end thread
main (n) :
nbOfThreads = n
nbOfIterations = 0
lock(mutex0)
do nbOfThreads times : create(thread)
unlock(mutex0)
end main
I obviously tried to understand myself, but there are some things I don't understand :
The main one : WHY does a cond need to be pair with a mutex
In some examples I saw something like this :
// thread A :
while (!condition)
wait(&cond)
// thread B :
if (condition)
signal(&cond)
well I really don't get the point of this while loop, I thought wait put the thread in pause until the condition is true (until the other thread send the signal). I mean I would get it if it was an if instead of a while.
Thank you

WHY does a cond need.... because the (!condition) you reference almost certainly depends upon some bits of the object not changing while you reference them. Correspondingly, modifying the state of the object should be done in such a way as to appear atomic to any observer; thus a mutex. While you could rely on too-clever-by-half hackery like atomic types, there is also the problem of ‘what if it was modified just after you checked it’ -- a race condition. Thus the idiomatic lock(); while (!cond) { wait(); }.
The point of the while... The signal+wait is not a handoff of control; after the signal, any number of things could happen to the object before a particular thread returns from wait. Even though the condition might have been in the correct state, by the time thread A examines it, it may no longer be. At the point of exiting the while loop, thread A knows: The condition is in the state I desire, and I have exclusive access to the object.

Condition variables can have spurious wake-ups. The condition might not actually be true when the wait function returns.
Depending on your task, a different synchronization primitive, such as a barrier (see pthread_barrier_init) or a semaphore (sem_init) might be easier to use.

Can two callbacks execute the code at the same time(Parrallely) or not?

I am doing an IO wait operation inside a for loop now the thing is when all of the operations terminates I want to send the response to the server. Now I was just wondering that suppose two IO operation terminates exactly at the same time now can they execute code at the same time(parallel) or will they execute serially?
As far as I know, as Node is Concurrent but not the Parallel language so I don't think they will execute at the same time.

node.js runs Javascript with a single thread. That means that two pieces of Javascript can never be running at the exact same moment.
node.js processes I/O completion using an event queue. That means when an I/O operation completes, it places an event in the event queue and when that event gets to the front of the event queue and the JS interpreter has finished whatever else it was doing, then it will pull that event from the event queue and call the callback associated with it.
Because of this process, even if two I/O operations finish at basically the same moment, one of them will put its completion event into the event queue before the other (access to the event queue internally is likely controlled by a mutex so one will get the mutex before the other) and that one's completion callback will get into the event queue first and then called before the other. The two completion callbacks will not run at the exact same time.
Keep in mind that more than one piece of Javascript can be "in flight" or "in process" at the same time if it contains non-blocking I/O operations or other asynchronous operations. This is because when you "wait" for an asynchronous operation to complete in Javscript, you return control back to the system and you then resume processing only when your completion callback is called. While the JS interpreter is waiting for an asynchronous I/O operation to complete and the associated callback to be called, then other Javascript can run. But, there's still only one piece of Javascript actually ever running at a time.
As far as I know, as Node is Concurrent but not the Parallel language so I don't think they will execute at the same time.
Yes, that's correct. That's not exactly how I'd describe it since "concurrent" and "parallel" don't have strict technical definitions, but based on what I think you mean by them, that is correct.

you can use Promise.all :
let promises = [];
for(...)
{
promises.push(somePromise); // somePromise represents your IO operation
}
Promise.all(promises).then((results) => { // here you send the response }
You don't have to worry about the execution order.

Node.js is designed to be single thread. So basically there is no way that 'two IO operation terminates exactly at the same time' could happen. They will just finish one by one.

How to control multi-threads synchronization in Perl

I got array with [a-z,A-Z] ASCII numbers like so: my #alphabet = (65..90,97..122);
So main thread functionality is checking each character from alphabet and return string if condition is true.
Simple example :
my #output = ();
for my $ascii(#alphabet){
thread->new(\sub{ return chr($ascii); });
}
I want to run thread on every ASCII number, then put letter from thread function into array in the correct order.
So in out case array #output should be dynamic and contain [a..z,A-Z] after all threads finish their job.
How to check, is all threads is done and keep the order?

You're looking for $thread->join, which waits for a thread to finish. It's documented here, and this SO question may also help.
Since in your case it looks like the work being done in the threads is roughly equal in cost (no thread is going to take a long time more than any other), you can just join each thread in order, like so, to wait for them all to finish:
# Store all the threads for each letter in an array.
my #threads = map { thread->new(\sub{ return chr($_); }) } #alphabet;
my #results = map { $_->join } #threads;
Since, when the first thread returns from join, the others are likely already done and just waiting for "join" to grab their return code, or about to be done, this gets you pretty close to "as fast as possible" parallelism-wise, and, since the threads were created in order, #results is ordered already for free.
Now, if your threads can take variable amounts of time to finish, or if you need to do some time-consuming processing in the "main"/spawning thread before plugging child threads' results into the output data structure, joining them in order might not be so good. In that case, you'll need to somehow either: a) detect thread "exit" events as they happen, or b) poll to see which threads have exited.
You can detect thread "exit" events using signals/notifications sent from the child threads to the main/spawning thread. The easiest/most common way to do that is to use the cond_wait and cond_signal functions from threads::shared. Your main thread would wait for signals from child threads, process their output, and store it into the result array. If you take this approach, you should preallocate your result array to the right size, and provide the output index to your threads (e.g. use a C-style for loop when you create your threads and have them return ($result, $index_to_store) or similar) so you can store results in the right place even if they are out of order.
You can poll which threads are done using the is_joinable thread instance method, or using the threads->list(threads::joinable) and threads->list(threads::running) methods in a loop (hopefully not a busy-waiting one; adding a sleep call--even a subsecond one from Time::HiRes--will save a lot of performance/battery in this case) to detect when things are done and grab their results.
Important Caveat: spawning a huge number of threads to perform a lot of work in parallel, especially if that work is small/quick to complete, can cause performance problems, and it might be better to use a smaller number of threads that each do more than one "piece" of work (e.g. spawn a small number of threads, and each thread uses the threads::shared functions to lock and pop the first item off of a shared array of "work to do" and do it rather than map work to threads as 1:1). There are two main performance problems that arise from a 1:1 mapping:
the overhead (in memory and time) of spawning and joining each thread is much higher than you'd think (benchmark it on threads that don't do anything, just return, to see). If the work you need to do is fast, the overhead of thread management for tons of threads can make it much slower than just managing a few re-usable threads.
If you end up with a lot more threads than there are logical CPU cores and each thread is doing CPU-intensive work, or if each thread is accessing the same resource (e.g. reading from the same disks or the same rows in a database), you hit a performance cliff pretty quickly. Tuning the number of threads to the "resources" underneath (whether those are CPUs or hard drives or whatnot) tends to yield much better throughput than trusting the thread scheduler to switch between many more threads than there are available resources to run them on. The reasons this is slow are, very broadly:
Because the thread scheduler (part of the OS, not the language) can't know enough about what each thread is trying to do, so preemptive scheduling cannot optimize for performance past a certain point, given that limited knowledge.
The OS usually tries to give most threads a reasonably fair shot, so it can't reliably say "let one run to completion and then run the next one" unless you explicitly bake that into the code (since the alternative would be unpredictably starving certain threads for opportunities to run). Basically, switching between "run a slice of thread 1 on resource X" and "run a slice of thread 2 on resource X" doesn't get you anything once you have more threads than resources, and adds some overhead as well.
TL;DR threads don't give you performance increases past a certain point, and after that point they can make performance worse. When you can, reuse a number of threads corresponding to available resources; don't create/destroy individual threads corresponding to tasks that need to be done.

Building on Zac B's answer, you can use the following if you want to reuse threads:
use strict;
use warnings;
use Thread::Pool::Simple qw( );
$| = 1;
my $pool = Thread::Pool::Simple->new(
do => [ sub {
select(undef, undef, undef, (200+int(rand(8))*100)/1000);
return chr($_[0]);
} ],
);
my #alphabet = ( 65..90, 97..122 );
print $pool->remove($_) for map { $pool->add($_) } #alphabet;
print "\n";
The results are returned in order, as soon as they become available.

I'm the author of Parallel::WorkUnit so I'm partial to it. And I thought adding ordered responses was actually a great idea. It does it with forks, not threads, because forks are more widely supported and they often perform better in Perl.
my $wu = Parallel::WorkUnit->new();
for my $ascii(#alphabet){
$wu->async(sub{ return chr($ascii); });
}
#output = $wu->waitall();
If you want to limit the number of simultaneous processes:
my $wu = Parallel::WorkUnit->new(max_children => 5);
for my $ascii(#alphabet){
$wu->queue(sub{ return chr($ascii); });
}
#output = $wu->waitall();

Serial Dispatch Queue with Asynchronous Blocks

Is there ever any reason to add blocks to a serial dispatch queue asynchronously as opposed to synchronously?
As I understand it a serial dispatch queue only starts executing the next task in the queue once the preceding task has completed executing. If this is the case, I can't see what you would you gain by submitting some blocks asynchronously - the act of submission may not block the thread (since it returns straight-away), but the task won't be executed until the last task finishes, so it seems to me that you don't really gain anything.
This question has been prompted by the following code - taken from a book chapter on design patterns. To prevent the underlying data array from being modified simultaneously by two separate threads, all modification tasks are added to a serial dispatch queue. But note that returnToPool adds tasks to this queue asynchronously, whereas getFromPool adds its tasks synchronously.
class Pool<T> {
private var data = [T]();
// Create a serial dispath queue
private let arrayQ = dispatch_queue_create("arrayQ", DISPATCH_QUEUE_SERIAL);
private let semaphore:dispatch_semaphore_t;
init(items:[T]) {
data.reserveCapacity(data.count);
for item in items {
data.append(item);
}
semaphore = dispatch_semaphore_create(items.count);
}
func getFromPool() -> T? {
var result:T?;
if (dispatch_semaphore_wait(semaphore, DISPATCH_TIME_FOREVER) == 0) {
dispatch_sync(arrayQ, {() in
result = self.data.removeAtIndex(0);
})
}
return result;
}
func returnToPool(item:T) {
dispatch_async(arrayQ, {() in
self.data.append(item);
dispatch_semaphore_signal(self.semaphore);
});
}
}

Because there's no need to make the caller of returnToPool() block. It could perhaps continue on doing other useful work.
The thread which called returnToPool() is presumably not just working with this pool. It presumably has other stuff it could be doing. That stuff could be done simultaneously with the work in the asynchronously-submitted task.
Typical modern computers have multiple CPU cores, so a design like this improves the chances that CPU cores are utilized efficiently and useful work is completed sooner. The question isn't whether tasks submitted to the serial queue operate simultaneously — they can't because of the nature of serial queues — it's whether other work can be done simultaneously.

Yes, there are reasons why you'd add tasks to serial queue asynchronously. It's actually extremely common.
The most common example would be when you're doing something in the background and want to update the UI. You'll often dispatch that UI update asynchronously back to the main queue (which is a serial queue). That way the background thread doesn't have to wait for the main thread to perform its UI update, but rather it can carry on processing in the background.
Another common example is as you've demonstrated, when using a GCD queue to synchronize interaction with some object. If you're dealing with immutable objects, you can dispatch these updates asynchronously to this synchronization queue (i.e. why have the current thread wait, but rather instead let it carry on). You'll do reads synchronously (because you're obviously going to wait until you get the synchronized value back), but writes can be done asynchronously.
(You actually see this latter example frequently implemented with the "reader-writer" pattern and a custom concurrent queue, where reads are performed synchronously on concurrent queue with dispatch_sync, but writes are performed asynchronously with barrier with dispatch_barrier_async. But the idea is equally applicable to serial queues, too.)
The choice of synchronous v asynchronous dispatch has nothing to do with whether the destination queue is serial or concurrent. It's simply a question of whether you have to block the current queue until that other one finishes its task or not.
Regarding your code sample code, that is correct. The getFromPool should dispatch synchronously (because you have to wait for the synchronization queue to actually return the value), but returnToPool can safely dispatch asynchronously. Obviously, I'm wary of seeing code waiting for semaphores if that might be called from the main thread (so make sure you don't call getFromPool from the main thread!), but with that one caveat, this code should achieve the desired purpose, offering reasonably efficient synchronization of this pool object, but with a getFromPool that will block if the pool is empty until something is added to the pool.

multithreading: how to process data in a vector, while the vector is being populated?

I have a single-threaded linux app which I would like to make parallel. It reads a data file, creates objects, and places them in a vector. Then it calls a compute-intensive method (.5 second+) on each object. I want to call the method in parallel with object creation. While I've looked at qt and tbb, I am open to other options.
I planned to start the thread(s) while the vector was empty. Each one would call makeSolids (below), which has a while loop that would run until interpDone==true and all objects in the vector have been processed. However, I'm a n00b when it comes to threading, and I've been looking for a ready-made solution.
QtConcurrent::map(Iter begin,Iter end,function()) looks very easy, but I can't use it on a vector that's changing in size, can I? And how would I tell it to wait for more data?
I also looked at intel's tbb, but it looked like my main thread would halt if I used parallel_for or parallel_while. That stinks, since their memory manager was recommended (open cascade's mmgt has poor performance when multithreaded).
/**intended to be called by a thread
\param start the first item to get from the vector
\param skip how many to skip over (4 for 4 threads)
*/
void g2m::makeSolids(uint start, uint incr) {
uint curr = start;
while ((!interpDone) || (lineVector.size() > curr)) {
if (lineVector.size() > curr) {
if (lineVector[curr]->isMotion()) {
((canonMotion*)lineVector[curr])->setSolidMode(SWEPT);
((canonMotion*)lineVector[curr])->computeSolid();
}
lineVector[curr]->setDispMode(BEST);
lineVector[curr]->display();
curr += incr;
} else {
uio::sleep(); //wait a little bit for interp
}
}
}
EDIT: To summarize, what's the simplest way to process a vector at the same time that the main thread is populating the vector?

Firstly, to benefit from threading you need to find similarly slow tasks for each thread to do. You said your per-object processing takes .5s+, how long does your file reading / object creation take? It could easily be a tenth or a thousandth of that time, in which case your multithreading approach is going to produce neglegible benefit. If that's the case, (yes, I'll answer your original question soon incase it's not) then think about simultaneously processing multiple objects. Given your processing takes quite a while, the thread creation overhead isn't terribly significant, so you could simply have your main file reading/object creation thread spawn a new thread and direct it at the newly created object. The main thread then continues reading/creating subsequent objects. Once all objects are read/created, and all the processing threads launched, the main thread "joins" (waits for) the worker threads. If this will create too many threads (thousands), then put a limit on how far ahead the main thread is allowed to get: it might read/create 10 objects then join 5, then read/create 10, join 10, read/create 10, join 10 etc. until finished.
Now, if you really want the read/create to be in parallel with the processing, but the processing to be serialised, then you can still use the above approach but join after each object. That's kind of weird if you're designing this with only this approach in mind, but good because you can easily experiment with the object processing parallelism above as well.
Alternatively, you can use a more complex approach that just involves the main thread (that the OS creates when your program starts), and a single worker thread that the main thread must start. They should be coordinated using a mutex (a variable ensuring mutually-exclusive, which means not-concurrent, access to data), and a condition variable which allows the worker thread to efficiently block until the main thread has provided more work. The terms - mutex and condition variable - are the standard terms in the POSIX threading that Linux uses, so should be used in the explanation of the particular libraries you're interested in. Summarily, the worker thread waits until the main read/create thread broadcasts it a wake-up signal indicating another object is ready for processing. You may want to have a counter with index of the last fully created, ready-for-processing object, so the worker thread can maintain it's count of processed objects and move along the ready ones before once again checking the condition variable.

It's hard to tell if you have been thinking about this problem deeply and there is more than you are letting on, or if you are just over thinking it, or if you are just wary of threading.
Reading the file and creating the objects is fast; the one method is slow. The dependency is each consecutive ctor depends on the outcome of the previous ctor - a little odd - but otherwise there are no data integrity issues so there doesn't seem to be anything that needs to be protected by mutexes and such.
Why is this more complicated than something like this (in crude pseudo-code):
while (! eof)
{
readfile;
object O(data);
push_back(O);
pthread_create(...., O, makeSolid);
}
while(x < vector.size())
{
pthread_join();
x++;
}
If you don't want to loop on the joins in your main then spawn off a thread to wait on them by passing a vector of TIDs.
If the number of created objects/threads is insane, use a thread pool. Or put a counter is the creation loop to limit the number of threads that can be created before running ones are joined.

#Caleb: quite -- perhaps I should have emphasized active threads. The GUI thread should always be considered one.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string