What about race condition in multithreaded reading? - multithreading

According to an article on IBM.com, "a race condition is a situation in which two or more threads or processes are reading or writing some shared data, and the final result depends on the timing of how the threads are scheduled. Race conditions can lead to unpredictable results and subtle program bugs." . Although the article concerns Java, I have in general been taught the same definition.
As far as I know, simple operation of reading from RAM is composed of setting the states of specific input lines (address, read etc.) and reading the states of output lines. This is an operation that obviously cannot be executed simultaneously by two devices and has to be serialized.
Now let's suppose we have a situation when a couple of threads access an object in memory. In theory, this access should be serialized in order to prevent race conditions. But e.g. the readers/writers algorithm assumes that an arbitrary number of readers can use the shared memory at the same time.
So, the question is: does one have to implement an exclusive lock for read when using multithreading (in WinAPI e.g.)? If not, why? Where is this control implemented - OS, hardware?
Best regards,
Kuba

Reading memory at hardware level is done sequentially - you don't need to worry about concurrency at this level. Two threads issue read instructions and all the necessary stuff - setting addresses on the address bus and actual reads are implemented by the memory access hardware in such way that reads will always work right.
In fact the same is true for read/write scenarios except that when read and write requests are interleaved you will get different results depending on timing and this is why you need synchronization.

As long as there's nothing changing the data, it's perfectly safe to be reading it from several threads. Even if two CPUs (or cores) race to access the memory for reading at the exact same clock cycle, their accesses will be serialized by the memory controller and they won't interfere with each other. This feature is essential for HW working correctly.

There is not a simple answer to this question. Different API's (and different environments) will have different levels of multhreaded-awareness and multithreaded safety.

Related

Is it safe to update an object in a thread without locks if other threads will not access it?

I have a vector of entities. At update cycle I iterate through vector and update each entity: read it's position, calculate current speed, write updated position. Also, during updating process I can change some other objects in other part of program, but each that object related only to current entity and other entities will not touch that object.
So, I want to run this code in threads. I separate vector into few chunks and update each chunk in different threads. As I see, threads are fully independent. Each thread on each iteration works with independent memory regions and doesn't affect other threads work.
Do I need any locks here? I assume, that everything should work without any mutexes, etc. Am I right?
Short answer
No, you do not need any lock or synchronization mechanism as your problem appear to be a embarrassingly parallel task.
Longer answer
A race conditions that can only appear if two threads might access the same memory at the same time and at least one of the access is a write operation. If your program exposes this characteristic, then you need to make sure that threads access the memory in an ordered fashion. One way to do it is by using locks (it is not the only one though). Otherwise the result is UB.
It seems that you found a way to split the work among your threads s.t. each thread can work independently from the others. This is the best case scenario for concurrent programming as it does not require any synchronization. The complexity of the code is dramatically decreased and usually speedup will jump up.
Please note that as #acelent pointed out in the comment section, if you need changes made by one thread to be visible in another thread, then you might need some sort of synchronization due to the fact that depending on the memory model and on the HW changes made in one thread might not be immediately visible in the other.
This means that you might write from Thread 1 to a variable and after some time read the same memory from Thread 2 and still not being able to see the write made by Thread 1.
"I separate vector into few chunks and update each chunk in different threads" - in this case you do not need any lock or synchronization mechanism, however, the system performance might degrade considerably due to false sharing depending on how the chunks are allocated to threads. Note that the compiler may eliminate false sharing using thread-private temporal variables.
You can find plenty of information in books and wiki. Here is some info https://software.intel.com/en-us/articles/avoiding-and-identifying-false-sharing-among-threads
Also there is a stackoverflow post here does false sharing occur when data is read in openmp?

Why do we need semaphores on single cpu?

I have read that we use semaphores inside the linux kerenl,and i have read that semaphores has advantages even in one single cpu (we can run only one process\thread). Can anyone please give me an example of a problem that semaphore solves(inside the kernel)?
In my view, there can be a problem only if we have more than one cpu, because two process may call system calls that use the same data structure, and probablly cause problems.
Thank you for your help!
You don't really need more than one CPU for concurrency. The multiple CPUs are really "an implementation detail," a piece of hardware quirkiness that you can abstract away from. Concurrency is a logical property of programs. You can have concurrency without multiple CPUs, and use multiple CPUs without "real concurrency".
Consider a web server. It has to be "concurrent," in the sense that it must serve multiple clients at once, hold information about multiple connections and once, and process multiple requests at once. You can have it literally do this, by having multiple CPUs all working at the same time. Yet, the program only has to appear to do multiple things at once. It could just as well be running on one CPU and context switching to fairly service all the work put to it. The fact that a web-server does multiple things at once is part of its interface: the I/O for the connections are interleaved, if a request has exclusively locked a resource, another request won't start trying to manipulate that same resource, etc. Writing a web server without concurrency produces a program that is wrong.
Semaphores help you with concurrency, by letting you control the way processes access resources. You asked, if you had one process running, how another could run at the same time with only a single core. Well, as I said, concurrency doesn't need multiple cores. The first process can be paused, and the second one started while the first one is still unfinished. This is just an implementation detail; logically, to the program writer, the two processes are running simultaneously, whether there are multiple cores or not. If the program was written without semaphores (or had broken concurrency in some other way), it would be wrong, even on a single core. Physically, this will be because context switching can abruptly pause one computation and start another at any time, and, without semaphores, the newly live thread won't know what resources it can and cannot access. Logically, this will be because the processes are running simultaneously, once you abstract yourself away from the implementation, and, in general, processes running simultaneously can walk over each other if not properly synchronized.
For an example applicable to an OS kernel, consider that every process is logically running concurrently with every other process. A kernel provides the implementation that makes this concurrency work. A resource that two processes may want simultaneously is a hard drive. A semaphore might be used in the kernel to track whether a given drive is currently busy with a read or write. A process trying to read or write to the same disk will ask the kernel to do so, and the kernel can check the semaphore to see that the disk is still busy and force the offending process to wait. Now, an operating system does count as low level code, so in some places, yes, you might want to omit some otherwise vital concurrency safeguards when running on a single CPU, because your job is to handle such implementation details, but higher level parts may still use them.
In contrast, consider a number-crunching program. Let's say it's processing each element of a huge array of data into an equal-sized array of modified data (a functional map operation). It can use multiple CPUs to do this more quickly, but it can also work one CPU. The observable behavior of the program is the same, and you never get any idea that it's doing multiple things at once from its behavior. Numbers go in, numbers come out, who cares what happens in the middle? Writing such a program without the ability to do multiple things at once does not produce a logically incorrect program, just a slow one. Such a program probably does not need semaphores when running on a single CPU, because it didn't need concurrency in the first place.

Do concurrent data structures and thread safe data structures mean the same?

Concurrent data structures:
[1] http://www.cs.tau.ac.il/~shanir/concurrent-data-structures.pdf
[2] http://en.wikipedia.org/wiki/Concurrent_data_structure
Are concurrent data structures and thread safe data structures interchangeable terms?
My understanding differs from #xxa (the answer is: No). Though I have no a regious definition either. Concurrent implies thread-safety. But nowadays, it implies simultaneous access as well. While thread-safety gives no such assumptions. See this quote form the mentioned Wiki-article:
Today, as multiprocessor computer architectures that provide parallelism become the dominant computing platform (through the proliferation of multi-core processors), the term has come to stand mainly for data structures that can be accessed by multiple threads which may actually access the data simultaneously because they run on different processors that communicate with one another.
For example, STL containers are claimed to be thread-safe for the given conditions (read-only), moreover, it allows simultaneous reads by a number of threads (STL says them are "as safe as int"), but only one thread can modify them and in absence of readers. Can we name them 'concurrent'? No. While practical concurrent containers (see tbb for example) allow at least two or more threads to work with the container (including modification) at the same time.
And one more point. You could implement std::queue so that methods push() and pop() will not trigger a failure when used by different threads. But does this make it a concurrent_queue? No. Because, queue::front() and queue::pop() do not provide a way to get elements simultaneously by two or more threads without external synchronization. To become a concurrent_queue it needs different interface, which takes care of atomicity by combining pop() with returning the value.
These terms do not have a rigorous definition, but in general it can be asserted that the answer to your questions is yes.
Both refer to data structures that are stored in shared memory (http://en.wikipedia.org/wiki/Shared_memory) and manipulated by several threads or processes.
In fact [1] asserts the same when it refers to threads explicitly:
The primary source of this additional difficulty is concurrency: Because threads are executed concurrently on different processors, ..."

Are "data races" and "race condition" actually the same thing in context of concurrent programming

I often find these terms being used in context of concurrent programming . Are they the same thing or different ?
No, they are not the same thing. They are not a subset of one another. They are also neither the necessary, nor the sufficient condition for one another.
The definition of a data race is pretty clear, and therefore, its discovery can be automated. A data race occurs when 2 instructions from different threads access the same memory location, at least one of these accesses is a write and there is no synchronization that is mandating any particular order among these accesses.
A race condition is a semantic error. It is a flaw that occurs in the timing or the ordering of events that leads to erroneous program behavior. Many race conditions can be caused by data races, but this is not necessary.
Consider the following simple example where x is a shared variable:
Thread 1 Thread 2
lock(l) lock(l)
x=1 x=2
unlock(l) unlock(l)
In this example, the writes to x from thread 1 and 2 are protected by locks, therefore they are always happening in some order enforced by the order with which the locks are acquired at runtime. That is, the writes' atomicity cannot be broken; there is always a happens before relationship between the two writes in any execution. We just cannot know which write happens before the other a priori.
There is no fixed ordering between the writes, because locks cannot provide this. If the programs' correctness is compromised, say when the write to x by thread 2 is followed by the write to x in thread 1, we say there is a race condition, although technically there is no data race.
It is far more useful to detect race conditions than data races; however this is also very difficult to achieve.
Constructing the reverse example is also trivial. This blog post also explains the difference very well, with a simple bank transaction example.
According to Wikipedia, the term "race condition" has been in use since the days of the first electronic logic gates. In the context of Java, a race condition can pertain to any resource, such as a file, network connection, a thread from a thread pool, etc.
The term "data race" is best reserved for its specific meaning defined by the JLS.
The most interesting case is a race condition that is very similar to a data race, but still isn't one, like in this simple example:
class Race {
static volatile int i;
static int uniqueInt() { return i++; }
}
Since i is volatile, there is no data race; however, from the program correctness standpoint there is a race condition due to the non-atomicity of the two operations: read i, write i+1. Multiple threads may receive the same value from uniqueInt.
TL;DR: The distinction between data race and race condition depends on the nature of problem formulation, and where to draw the boundary between undefined behavior and well-defined but indeterminate behavior. The current distinction is conventional and best reflects the interface between processor architect and programming language.
1. Semantics
Data race specifically refers to the non-synchronized conflicting "memory accesses" (or actions, or operations) to the same memory location. If there is no conflict in the memory accesses, while there is still indeterminate behavior caused by operation ordering, that is a race condition.
Note "memory accesses" here have specific meaning. They refer to the "pure" memory load or store actions, without any additional semantics applied. For example, a memory store from one thread does not (necessarily) know how long it takes for the data to be written into the memory, and finally propagates to another thread. For another example, a memory store to one location before another store to another location by the same thread does not (necessarily) guarantee the first data written in the memory be ahead of the second. As a result, the order of those pure memory accesses are not (necessarily) able to be "reasoned" , and anything could happen, unless otherwise well defined.
When the "memory accesses" are well defined in terms of ordering through synchronization, additional semantics can ensure that, even if the timing of the memory accesses are indeterminate, their order can be "reasoned" through the synchronizations. Note, although the ordering between the memory accesses can be reasoned, they are not necessarily determinate, hence the race condition.
2. Why the difference?
But if the order is still indeterminate in race condition, why bother to distinguish it from data race? The reason is in practical rather than theoretical. It is because the distinction does exist in the interface between the programming language and processor architecture.
A memory load/store instruction in modern architecture is usually implemented as "pure" memory access, due to the nature of out-of-order pipeline, speculation, multi-level of cache, cpu-ram interconnection, especially multi-core, etc. There are lots of factors leading to indeterminate timing and ordering. To enforce ordering for every memory instruction incurs huge penalty, especially in a processor design that supports multi-core. So the ordering semantics are provided with additional instructions like various barriers (or fences).
Data race is the situation of processor instruction execution without additional fences to help reasoning the ordering of conflicting memory accesses. The result is not only indeterminate, but also possibly very weird, e.g., two writes to the same word location by different threads may result with each writing half of the word, or may only operate upon their locally cached values. -- These are undefined behavior, from the programmer's point of view. But they are (usually) well defined from the processor architect's point of view.
Programmers have to have a way to reason their code execution. Data race is something they cannot make sense, therefore should always avoid (normally). That is why the language specifications that are low level enough usually define data race as undefined behavior, different from the well-defined memory behavior of race condition.
3. Language memory models
Different processors may have different memory access behavior, i.e., processor memory model. It is awkward for programmers to study the memory model of every modern processor and then develop programs that can benefit from them. It is desirable if the language can define a memory model so that the programs of that language always behave as expected as the memory model defines. That is why Java and C++ have their memory models defined. It is the burden of the compiler/runtime developers to ensure the language memory models are enforced across different processor architectures.
That said, if a language does not want to expose the low level behavior of the processor (and is willing to sacrifice certain performance benefits of the modern architectures), they can choose to define a memory model that completely hide the details of "pure" memory accesses, but apply ordering semantics for all their memory operations. Then the compiler/runtime developers may choose to treat every memory variable as volatile in all processor architectures. For these languages (that support shared memory across threads), there are no data races, but may still be race conditions, even with a language of complete sequential consistence.
On the other hand, the processor memory model can be stricter (or less relaxed, or at higher level), e.g., implementing sequential consistency as early-days processor did. Then all memory operations are ordered, and no data race exists for any languages running in the processor.
4. Conclusion
Back to the original question, IMHO it is fine to define data race as a special case of race condition, and race condition at one level may become data race at a higher level. It depends on the nature of problem formulation, and where to draw the boundary between undefined behavior and well-defined but indeterminate behavior. Just the current convention defines the boundary at language-processor interface, does not necessarily mean that is always and must be the case; but the current convention probably best reflects the state-of-the-art interface (and wisdom) between processor architect and programming language.
No, they are different & neither of them is a subset of one or vice-versa.
The term race condition is often confused with the related term data
race, which arises when synchronization is not used to coordinate all
access to a shared nonfinal field. You risk a data race whenever a
thread writes a variable that might next be read by another thread or
reads a variable that might have last been written by another thread
if both threads do not use synchronization; code with data races has
no useful defined semantics under the Java Memory Model. Not all race
conditions are data races, and not all data races are race conditions,
but they both can cause concurrent programs to fail in unpredictable
ways.
Taken from the excellent book - Java Concurrency in Practice by Brian Goetz & Co.
Data races and Race condition
[Atomicity, Visibility, Ordering]
In my opinion definitely it is two different things.
Data races is a situation when same memory is shared between several threads(at least one of them change it (write access)) without synchronoization
Race condition is a situation when not synchronized blocks of code(may be the same) which use same shared resource are run simultaneously on different threads and result of which is unpredictable.
Race condition examples:
//increment variable
1. read variable
2. change variable
3. write variable
//cache mechanism
1. check if exists in cache and if not
2. load
3. cache
Solution:
Data races and Race condition are problem with atomicity and they can be solved by synchronization mechanism.
Data races - When write access to shared variable will be synchronized
Race condition - When block of code is run as an atomic operation

Does multithreading make sense for IO-bound operations?

When performing many disk operations, does multithreading help, hinder, or make no difference?
For example, when copying many files from one folder to another.
Clarification: I understand that when other operations are performed, concurrency will obviously make a difference. If the task was to open an image file, convert to another format, and then save, disk operations can be performed concurrently with the image manipulation. My question is when the only operations performed are disk operations, whether concurrently queuing and responding to disk operations is better.
Most of the answers so far have had to do with the OS scheduler. However, there is a more important factor that I think would lead to your answer. Are you writing to a single physical disk, or multiple physical disks?
Even if you parallelize with multiple threads...IO to a single physical disk is intrinsically a serialized operation. Each thread would have to block, waiting for its chance to get access to the disk. In this case, multiple threads are probably useless...and may even lead to contention problems.
However, if you are writing multiple streams to multiple physical disks, processing them concurrently should give you a boost in performance. This is particularly true with managed disks, like RAID arrays, SAN devices, etc.
I don't think the issue has much to do with the OS scheduler as it has more to do with the physical aspects of the disk(s) your writing to.
That depends on your definition of "I/O bound" but generally multithreading has two effects:
Use multiple CPUs concurrently (which won't necessarily help if the bottleneck is the disk rather than the CPU[s])
Use a CPU (with a another thread) even while one thread is blocked (e.g. waiting for I/O completion)
I'm not sure that Konrad's answer is always right, however: as a counter-example, if "I/O bound" just means "one thread spends most of its time waiting for I/O completion instead of using the CPU", but does not mean that "we've hit the system I/O bandwidth limit", then IMO having multiple threads (or asynchronous I/O) might improve performance (by enabling more than one concurrent I/O operation).
I would think it depends on a number of factors, like the kind of application you are running, the number of concurrent users, etc.
I am currently working on a project that has a high degree of linear (reading files from start to finish) operations. We use a NAS for storage, and were concerned about what happens if we run multiple threads. Our initial thought was that it would slow us down because it would increase head seeks. So we ran some tests and found out that the ideal number of threads is the same as the number of cores in the computer.
But your mileage may vary.
It can do, simply because whenever there is more work for a thread to do (identifying the next file to copy) the OS wakes it up, so threads are a simple way to hook into the OS scheduler and yet still write code in a traditional sequential way, instead of having to break it up into a state machine with callbacks.
This is mainly an assistance with clear programming rather than performance.
In most cases, using multi-thread for disk IO will not benefit efficiency. Let's imagine 2 circumstances:
Lock-Free File: We can split the file for each thread by giving them different IO offset. For instance, a 1024B bytes file is split into n pieces and each thread writes the 1024/n respectively. This will cause a lot of verbose disk head movement because of the different offset.
Lock File: Actually lock the IO operation for each critical section. This will cause a lot of verbose thread switches and it turns out that only one thread can write the file simultaneously.
Correct me if I' wrong.
No, it makes no sense. At some point, the operations have to be serialized (by the OS). On the other hand, since modern OS's have to cope with multiple processes anyway I doubt that there's an added overhead.
I'd think it would hinder the operations... You only have one controller and one drive.
You could use a second thread to do the operation, and a main thread that shows an updated UI.
I think it could worsen the performance, because the multiple threads will compete for the same resources.
You can test the impact of doing concurrent IO operations on the same device by copying a set of files from one place to another and measuring the time, then split the set in two parts and make the copies in parallel... the second option will be sensibly slower.

Resources