Can static arrays be safely accessed from multiple threads? - multithreading

If each thread is guaranteed to only read/write to a specific subset of the array can multiple threads work on the same (static) array without resorting to critical sections, etc?
EDIT - This is for the specific case of arrays of non-reference-counted types and record/packed-records thereof.
If yes, any caveats?
My gut feeling is yes but my gut can sometimes be an unreliable source of information.

Suppose that:
You have a single instance of an array (static or dynamic), and
The elements of the array are pure value types (i.e. contain no references), and
Each thread operates on disjoint sub-arrays, and
Nothing else in the system writes to the array whilst the threads are operating on it.
With these conditions, which I believe are met by your data structure and threading pattern, then all algorithms are thread-safe.

No, this could not be thread safe, in some situations.
I see at least two reasons.
1. It will depend on the static array content.
If you use some non-reference counted types (like double, integer, bytes, shortstring), there won't be any issue in most case (at least if data is read/only).
But if you use some reference-counted types (like string, interface, or a nested dynamic array), you'll have to take care of thread safety.
That is:
TMyType1: array[0..1] of integer; // thread-safe on reading
TMyType2: array[0..1] of string; // may be confusing
Additional note: if your string is in fact shared among some sub-parts of the static array, you could have the reference count be confused. Unless you explicitly call UniqueString() for each one (inside a critical section, I suspect). For an array of double or integer, you won't have this issue.
2. It will depend on the access concurrency
Read access should be thread safe, even for reference counted type, but concurrent write may be confusing. For a string, you may have GPF issues in some random cases, especially on a multi-core CPU.
Some safe implementation may be:
Use critical sections (smaller as possible, to reduce overhead) or other protection structures;
Use Copy-On-Write or a private per-thread copy of the content, to be sure;
Latest note (not about safety, but performance): Sharing an array among multiple CPUs may lead into performance penalties due to cache synchronization between CPUs. Performance is sometimes much better when you use separated arrays, ensuring their L1 caching window won't be shared among CPUs.
Be aware that such issues may be a nightmare to debug, on client side: multi-thread concurrency issues may occur randomly, and are very difficult to track. The safer, the better, unless you have explicit and proven performance issues.
Additional note: For your specific case of static array of double, with sub-part of the array accessed by one thread only, it is thread-safe. But there is no absolute rule of thread safeness in all situations, even for a static array. As soon as you use some reference-counted types, or some pointers, you may have random issues.

Related

Why use an AtomicU32 in Rust, given that U32 already implements Sync?

The std::sync::atomic module contains a number of atomic variants of primitive types, with the stated purpose that these types are now thread-safe. However, all the primatives that correspond to the atomic types already implement Send and Sync, and should therefore already be thread-safe. What's the reasoning behind the Atomic types?
Generally, non-atomic integers are safe to share across threads because they're immutable. If you attempt to modify the value, you implicitly create a new one in most cases because they're Copy. However, it isn't safe to share a mutable reference to a u32 across threads (or have both mutable and immutable references to the same value), which practically means that you won't be able to modify the variable and have another thread see the results. An atomic type has some additional behavior which makes it safe.
In the more general case, using non-atomic operations doesn't guarantee that a change made in one thread will be visible in another. Many architectures, especially RISC architectures, do not guarantee that behavior without additional instructions.
In addition, compilers often reorder accesses to memory in functions and in some cases, across functions, and an atomic type with an appropriate barrier is required to indicate to the compiler that such behavior is not wanted.
Finally, atomic operations are often required to logically update the contents of a variable. For example, I may want to atomically add 1 to a variable. On a load-store architecture such as ARM, I cannot modify the contents of memory with an add instruction; I can only perform arithmetic on registers. Consequently, an atomic add is multiple instructions, usually consisting of a load-linked, which loads a memory location, the add operation on the register, and then a store-conditional, which stores the value if the memory location has not changed. There's also a loop to retry if it has.
These are why atomic operations are needed and generally useful across languages. So while one can use non-atomic operations in non-Rust languages, they don't generally produce useful results, and since one typically wants one's code to function correctly, atomic operations are desirable for correctness. Rust's atomic types guarantee this behavior by generating suitable instructions and therefore can be safely shared across threads.

Are atomic objects protected against race conditions?

As far as I know they aren't.
Atomic objects are free of data races, yet they can still suffer from race conditions: two threads might start in an unpredictable order making the program outcome non-deterministic.
Shared data would be "safe" (protected by atomics) but the sequence or timing could still be wrong.
Can you confirm this?
Yes, you are correct that non-atomic operations may still have race condition. If you have non-atomic operations that depend on the state of the atomic object without interference from other threads, you need to use another synchronization technique to maintain consistency.
Atomic operations on the atomic object will be consistent, but not race-free. Non-atomic operations using the atomic object are not race-free.
Not just atomic objects, any primitive that can be used with operations performed by threads running concurrently:
mutex
condition variable
semaphore
barrier
atomic objects...
by definition are only useful if there is a race, an unpredictability in the access pattern. If the accesses where well ordered in a predictable way, you would have used a regular mutable object in the programming language.
But even if the order is a priori unknown, the end result can be deterministic: consider the concurrently running threads serving pages for a static Web server, with a count of pages and bytes served as the only mutable data structure. The statistics can be kept in a data structure protected by a mutex (a mutex isn't needed, it's just a simple example): the order of mutex locking is unpredictable, but the end result is that the data structure contains the sum of pages and bytes served; it doesn't matter in which order each thread adds the counts to the shared data.

Handling concurrent reads?

I'm new to concurrent programming and I have a specific situation in mind that I'd like some input on. If I have a variable that I will be accessing from multiple threads but only to read the value (the only reason it's wouldn't be a constant is because I'd need to set it at runtime), do I need a mutex for it? Or do you only need to worry about race conditions when there are also writes going out to a shared resource?
If you set the value before you start up the threads, you do not need a mutex.
If you set the value after you start up the threads, you will need a mutex to ensure they all the threads read the same value.
Logically,if you are only reading a shared data then you may not need to use mutex.But,in case of large programmes you must use it to avoid confusions.
That depends on what language and machine architecture you're talking about and what "reading the variable" means in that language. When "reading the variable" translates into only reading from memory at the machine-level, concurrent reads are in themselves generally safe. You need to be sure, of course, that nothing else in your program translates into writing to those same memory areas.
Many mainstream languages (Java, C#, C, C++) gives only very weak guarantees about how your program translates into memory accesses. At the same time, the guarantees you get tend to take the form of very complex rules, say, about which sequences of statements may be re-ordered when. To avoid introducing really difficult to find bugs, it's a very often better to require the synchronisation properties you need in as un-subtle and concrete a form as possible, that is, use mutexes.

lookup tables in C++ 11 with multithreading

I have 2 similar situations in a multithreaded C++11 software :
an array that I'm using as a lookup table inside a method declaration
an array that I'm using as a lookup table declared outside a method and that is being used by different and several methods by reference or with pointers.
now if we forget for a minute about this LUTs and we just consider C++11 and a multithreaded approach for a generic method, the most appropriate qualifier for this methods in terms of storage duration is probably thread_local.
This way if i feed a method foo() that is thread_local to 3 threads I basically end up having 3 instances of foo() for each thread, this move "solves" the problem with foo() being shared and accessed between 3 different threads, avoiding cache missings, but I basically have 3 possible different behaviours for my foo(), for example if I have the same PRNG implemented in foo() and i provide a seed that is time-dependant with a really good and high resolution, I probably will get 3 different results with each thread and a real mess in terms of consistency.
But let's say that I'm fine with how thread_local works, how I can write down the fact that I need to keep a LUT always ready and cached for my methods ?
I read something about a relaxed or less relaxed memory model, but in C++11 I have never seen a keyword or a practical application that can inject the caching of an array/LUT .
I'm on x86 or ARM.
I probably need something that is the opposite thing of volatile basically.
If the LUTs are read-only, so that you can share them without locks, you should just use one of them (i.e. declare them static).
Threads do not have their own caches. But even if they did (cores typically have their own L1 cache, and you might be able to lock a thread to a core), there would be no problem for two different threads to cache different parts of the same memory structure.
"Thread-local storage" does not mean that the memory is somehow physically tied to the thread. Rather, it's a way to let the same name refer to a different object in each thread. In no way does it restrict the ability of any thread to access the object, if given its address.
The CPU cache is not programmable. It uses its own internal logic to determine which memory regions to cache. Typically it will cache the memory that either has just been accessed by the CPU, or its prediction logic determines will shortly be accessed by the CPU. In a multiprocessor system, each CPU may have its own cache, or different CPUs may share a cache. If there are multiple caches, a memory region may be cached in more than one simultaneously.
If all threads must see the same values in the look-up tables, then a single table would be best. This could be achieved with a variable with static storage duration. If the data can be modified then you would probably also need a std::mutex to protect accesses to the table and avoid data races. Read-only data can be shared without additional synchronization; in this case it is best to declare it const to make the read-only nature explicit and avoid accidental modifications.
void foo(){
static const int lut[]={...};
}
You use thread_local where each thread must have its own copy of the data, usually because each copy will be modified independently. For example, you may choose to use thread_local for your random-number generator, so that each thread has its own RNG which is independent of the other threads, and does not require synchronization.
void bar(){
thread_local RandomNumberGenerator rng; // one per thread
auto val=rng.nextRandomNumber(); // use the instance for the current thread
}

Are "data races" and "race condition" actually the same thing in context of concurrent programming

I often find these terms being used in context of concurrent programming . Are they the same thing or different ?
No, they are not the same thing. They are not a subset of one another. They are also neither the necessary, nor the sufficient condition for one another.
The definition of a data race is pretty clear, and therefore, its discovery can be automated. A data race occurs when 2 instructions from different threads access the same memory location, at least one of these accesses is a write and there is no synchronization that is mandating any particular order among these accesses.
A race condition is a semantic error. It is a flaw that occurs in the timing or the ordering of events that leads to erroneous program behavior. Many race conditions can be caused by data races, but this is not necessary.
Consider the following simple example where x is a shared variable:
Thread 1 Thread 2
lock(l) lock(l)
x=1 x=2
unlock(l) unlock(l)
In this example, the writes to x from thread 1 and 2 are protected by locks, therefore they are always happening in some order enforced by the order with which the locks are acquired at runtime. That is, the writes' atomicity cannot be broken; there is always a happens before relationship between the two writes in any execution. We just cannot know which write happens before the other a priori.
There is no fixed ordering between the writes, because locks cannot provide this. If the programs' correctness is compromised, say when the write to x by thread 2 is followed by the write to x in thread 1, we say there is a race condition, although technically there is no data race.
It is far more useful to detect race conditions than data races; however this is also very difficult to achieve.
Constructing the reverse example is also trivial. This blog post also explains the difference very well, with a simple bank transaction example.
According to Wikipedia, the term "race condition" has been in use since the days of the first electronic logic gates. In the context of Java, a race condition can pertain to any resource, such as a file, network connection, a thread from a thread pool, etc.
The term "data race" is best reserved for its specific meaning defined by the JLS.
The most interesting case is a race condition that is very similar to a data race, but still isn't one, like in this simple example:
class Race {
static volatile int i;
static int uniqueInt() { return i++; }
}
Since i is volatile, there is no data race; however, from the program correctness standpoint there is a race condition due to the non-atomicity of the two operations: read i, write i+1. Multiple threads may receive the same value from uniqueInt.
TL;DR: The distinction between data race and race condition depends on the nature of problem formulation, and where to draw the boundary between undefined behavior and well-defined but indeterminate behavior. The current distinction is conventional and best reflects the interface between processor architect and programming language.
1. Semantics
Data race specifically refers to the non-synchronized conflicting "memory accesses" (or actions, or operations) to the same memory location. If there is no conflict in the memory accesses, while there is still indeterminate behavior caused by operation ordering, that is a race condition.
Note "memory accesses" here have specific meaning. They refer to the "pure" memory load or store actions, without any additional semantics applied. For example, a memory store from one thread does not (necessarily) know how long it takes for the data to be written into the memory, and finally propagates to another thread. For another example, a memory store to one location before another store to another location by the same thread does not (necessarily) guarantee the first data written in the memory be ahead of the second. As a result, the order of those pure memory accesses are not (necessarily) able to be "reasoned" , and anything could happen, unless otherwise well defined.
When the "memory accesses" are well defined in terms of ordering through synchronization, additional semantics can ensure that, even if the timing of the memory accesses are indeterminate, their order can be "reasoned" through the synchronizations. Note, although the ordering between the memory accesses can be reasoned, they are not necessarily determinate, hence the race condition.
2. Why the difference?
But if the order is still indeterminate in race condition, why bother to distinguish it from data race? The reason is in practical rather than theoretical. It is because the distinction does exist in the interface between the programming language and processor architecture.
A memory load/store instruction in modern architecture is usually implemented as "pure" memory access, due to the nature of out-of-order pipeline, speculation, multi-level of cache, cpu-ram interconnection, especially multi-core, etc. There are lots of factors leading to indeterminate timing and ordering. To enforce ordering for every memory instruction incurs huge penalty, especially in a processor design that supports multi-core. So the ordering semantics are provided with additional instructions like various barriers (or fences).
Data race is the situation of processor instruction execution without additional fences to help reasoning the ordering of conflicting memory accesses. The result is not only indeterminate, but also possibly very weird, e.g., two writes to the same word location by different threads may result with each writing half of the word, or may only operate upon their locally cached values. -- These are undefined behavior, from the programmer's point of view. But they are (usually) well defined from the processor architect's point of view.
Programmers have to have a way to reason their code execution. Data race is something they cannot make sense, therefore should always avoid (normally). That is why the language specifications that are low level enough usually define data race as undefined behavior, different from the well-defined memory behavior of race condition.
3. Language memory models
Different processors may have different memory access behavior, i.e., processor memory model. It is awkward for programmers to study the memory model of every modern processor and then develop programs that can benefit from them. It is desirable if the language can define a memory model so that the programs of that language always behave as expected as the memory model defines. That is why Java and C++ have their memory models defined. It is the burden of the compiler/runtime developers to ensure the language memory models are enforced across different processor architectures.
That said, if a language does not want to expose the low level behavior of the processor (and is willing to sacrifice certain performance benefits of the modern architectures), they can choose to define a memory model that completely hide the details of "pure" memory accesses, but apply ordering semantics for all their memory operations. Then the compiler/runtime developers may choose to treat every memory variable as volatile in all processor architectures. For these languages (that support shared memory across threads), there are no data races, but may still be race conditions, even with a language of complete sequential consistence.
On the other hand, the processor memory model can be stricter (or less relaxed, or at higher level), e.g., implementing sequential consistency as early-days processor did. Then all memory operations are ordered, and no data race exists for any languages running in the processor.
4. Conclusion
Back to the original question, IMHO it is fine to define data race as a special case of race condition, and race condition at one level may become data race at a higher level. It depends on the nature of problem formulation, and where to draw the boundary between undefined behavior and well-defined but indeterminate behavior. Just the current convention defines the boundary at language-processor interface, does not necessarily mean that is always and must be the case; but the current convention probably best reflects the state-of-the-art interface (and wisdom) between processor architect and programming language.
No, they are different & neither of them is a subset of one or vice-versa.
The term race condition is often confused with the related term data
race, which arises when synchronization is not used to coordinate all
access to a shared nonfinal field. You risk a data race whenever a
thread writes a variable that might next be read by another thread or
reads a variable that might have last been written by another thread
if both threads do not use synchronization; code with data races has
no useful defined semantics under the Java Memory Model. Not all race
conditions are data races, and not all data races are race conditions,
but they both can cause concurrent programs to fail in unpredictable
ways.
Taken from the excellent book - Java Concurrency in Practice by Brian Goetz & Co.
Data races and Race condition
[Atomicity, Visibility, Ordering]
In my opinion definitely it is two different things.
Data races is a situation when same memory is shared between several threads(at least one of them change it (write access)) without synchronoization
Race condition is a situation when not synchronized blocks of code(may be the same) which use same shared resource are run simultaneously on different threads and result of which is unpredictable.
Race condition examples:
//increment variable
1. read variable
2. change variable
3. write variable
//cache mechanism
1. check if exists in cache and if not
2. load
3. cache
Solution:
Data races and Race condition are problem with atomicity and they can be solved by synchronization mechanism.
Data races - When write access to shared variable will be synchronized
Race condition - When block of code is run as an atomic operation

Resources