Handling concurrent reads?

Handling concurrent reads? - multithreading

I'm new to concurrent programming and I have a specific situation in mind that I'd like some input on. If I have a variable that I will be accessing from multiple threads but only to read the value (the only reason it's wouldn't be a constant is because I'd need to set it at runtime), do I need a mutex for it? Or do you only need to worry about race conditions when there are also writes going out to a shared resource?

If you set the value before you start up the threads, you do not need a mutex.
If you set the value after you start up the threads, you will need a mutex to ensure they all the threads read the same value.

Logically,if you are only reading a shared data then you may not need to use mutex.But,in case of large programmes you must use it to avoid confusions.

That depends on what language and machine architecture you're talking about and what "reading the variable" means in that language. When "reading the variable" translates into only reading from memory at the machine-level, concurrent reads are in themselves generally safe. You need to be sure, of course, that nothing else in your program translates into writing to those same memory areas.
Many mainstream languages (Java, C#, C, C++) gives only very weak guarantees about how your program translates into memory accesses. At the same time, the guarantees you get tend to take the form of very complex rules, say, about which sequences of statements may be re-ordered when. To avoid introducing really difficult to find bugs, it's a very often better to require the synchronisation properties you need in as un-subtle and concrete a form as possible, that is, use mutexes.

Related

Perl ithreads :shared variables - multiprocessor kernel threads - visibility

perlthrtut excerpt:
Note that a shared variable guarantees that if two or more threads try
to modify it at the same time, the internal state of the variable will
not become corrupted. However, there are no guarantees beyond this, as
explained in the next section.
Working on Linux supporting multiprocessor kernel threads.
Is there a guarantee that all threads will see the updated shared variable value ?
Consulting the perlthrtut doc as stated above there is no such guarantee.
Now the question: What can be done programmatically to guarantee that?

You ask
Is there a guarantee that all threads will see the updated shared variable value ?
Yes. :shared is that guarantee. The value will be safely and consistently and freshly updated.
The problem is simply that, without other synchronization, you don't know the order of these updates.
Consulting the perlthrtut doc as stated above there is no such guarantee.
You didn't read far enough. :)
The very next section in perlthrtut explains the kind of pitfalls you do face with perl threads: data races, which is to say, application logic races concerning shared data. Again, the shared data will be consistent and fresh and immune to corruption from (more-or-less) atomic perl opcodes. However, the high-level perl operations you perform on that shared data are not guaranteed to be atomic. $shared_var++, for instance, might be more than one atomic operation.
(If I may hazard a guess, you are perhaps thinking too much about other languages' lower level threading interfaces with their cache inconsistencies, torn words, reordered instructions, and lions and tigers and bears. Perl's model takes care of those low-level concerns for you.)

Using :shared on a variable causes all threads to reference it in the same physical memory address, so it doesn't matter which processor/core/hyper-thread they happen to be executing in. The perlthrtut talk of guarantees is in reference to race conditions, and in short, that you need to take into account that shared variables can be modified by any thread at any time. If this is a problem you'll need to make use of synchronization functions (e.g. lock() and cond_wait()) to control access.

You seem to be confused as to what :shared does. It makes it so a variable is shared by all threads.
A variable is indeed guaranteed to have the value it has, no matter which thread accesses it. It's a tautology, so nothing can be done to programmatically guarantee that.

Usage of registers by the compiler in multithreaded program

It is a general question but:
In a multithreaded program, is it safe for the compiler to use registers to temporarily store global variables?
I think its not, since storing global variables in registers may change saved values
for other threads.
And how about using registers to store local variables defined within a function?
I think it is ok,since no other thread will be able to get these variables.
Please correct me if im wrong.
Thank you!

Things are much more complicated than you think they are.
Even if the compiler stores a value to memory, the CPU generally does not immediately push the data out to RAM. It stores it in a cache (and some systems have 2 or 3 levels of caches between the processor and the memory).
To make things worse, the order of instructions that the compiler decides, may not be what actually gets executed as many processors can reorder instructions (and even sub-parts of instructions) in their own pipelines.
In general, in a multithreaded environment you should personally take care to never access (either read or write) the same memory from two separate threads unless one of the following is true:
you are using one of several special atomic operations that ensure proper synchronization.
you have used one of several synchronization operations to "reserve" access to shared data and then to "relinquish" it. These do include the required memory barriers that also guarantee the data is what it's supposed to be.
You may want to read http://en.wikipedia.org/wiki/Memory_ordering#Memory_barrier_types and http://en.wikipedia.org/wiki/Memory_barrier
If you are ready for a little headache and want to see how complicated things can actually get, here is your evening lecture Memory Barriers: a Hardware View for Software Hackers.

'Safe' is not really the right word to use. Many higher level languages (eg. C) do not have a threading model and so the language specification says nothing about mutli-threaded interactions.
If you are not using any kind of locking primitives then you have no guarantees what so ever about how the different threads interact. So the compiler is within its rights to use registers for global variables.
Even if you are using locking the behaviour can still be tricky: if you read a variable, then grab a lock and then read the variable again the compiler still has no way of knowing if it has to read the variable from memory again, or can use the earlier value it stored in a register.
In C/C++ declaring a variable as volatile will force the compiler to always reload the variable from memory and solve this particular instance.
There are also 'Interlocked*' primitives on most systems that have guaranteed atomicity semantics which can be used to ensure certain operations are threadsafe. Locking primitives are typically built on these low level operations.

In a multithreaded program, you have one of two cases: if it's running on a uniprocessor (single core, single CPU), then switching between threads is handled like switching between processes (although it's not quite as much work since the threads operate in the same virtual memory space) - all registers of one thread are saved during the transition to another thread, so using registers for whatever purpose is fine. This is the job of the context switch routines that the OS uses, and the register set is considered part of a threads (or processes) context. If you have a multiprocessor system - either multiple CPUs or multiple cores on a single CPU - each processor has its own distinct set of registers, so again, using registers for storing things is fine. On top of that, of course, context switching on a particular CPU will save the registers of the old thread/process before switching to the new one, so everything is preserved.
That said, on some architectures and/or with some OSes, there might be specific exceptions to that, because certain registers are reserved by the ABI for specific uses by the OS or by the libraries that provide an interface to the OS, but your compiler(s) generally have that type of knowledge of your platform built in. You need to be aware of them, though, if you're doing inline assembly or certain other "low-level" things...

are simultaneous reads of a variable thread-safe?

Assuming that the variable isn't in any risk of being modified during the reads, are there any inherent problems in a variable being read by 2 or more threads at the same time?

No this operation is not inherently thread safe.
Even though the variable is not currently being written to, previous writes to the variable may not yet be visible to all threads. This means two threads can read the same value and get different results creating a race condition.
This can be prevented though memory barriers, correct use of volatile or a few other mechanisms. We'd need to know more about your environment, in particular the language, to give a complete explanation.
A slight restating of your question though yields a better answer. Assuming there are no more writes and all previous writes are visible to the current thread, then yes reading the value from multiple threads is safe.

If your assumption holds, then there are no problems.

As long as it's a plain variable, it's no risk.
If it is a property, reading it can possibly have side effects, so is not guaranteed to be thread safe.

Yes in three characters.
Edit:
Whoops. Yes, it's thread safe. No, there are no problems. People usually ask if something is thread-safe, not if it's thread unsafe.

Given that databases can commonly use shared read locks, where any number of clients may read the same block, I would suggest that there are no direct inherent problems.

How to define threadsafe?

Threadsafe is a term that is thrown around documentation, however there is seldom an explanation of what it means, especially in a language that is understandable to someone learning threading for the first time.
So how do you explain Threadsafe code to someone new to threading?
My ideas for options are the moment are:
Do you use a list of what makes code
thread safe vs. thread unsafe
The book definition
A useful metaphor

Multithreading leads to non-deterministic execution - You don't know exactly when a certain piece of parallel code is run.
Given that, this wonderful multithreading tutorial defines thread safety like this:
Thread-safe code is code which has no indeterminacy in the face of any multithreading scenario. Thread-safety is achieved primarily with locking, and by reducing the possibilities for interaction between threads.
This means no matter how the threads are run in particular, the behaviour is always well-defined (and therefore free from race conditions).

Eric Lippert says:
When I'm asked "is this code thread safe?" I always have to push back and ask "what are the exact threading scenarios you are concerned about?" and "exactly what is correct behaviour of the object in every one of those scenarios?".
It is unhelpful to say that code is "thread safe" without somehow communicating what undesirable behaviors the utilized thread safety mechanisms do and do not prevent.

G'day,
A good place to start is to have a read of the POSIX paper on thread safety.
Edit: Just the first few paragraphs give you a quick overview of thread safety and re-entrant code.
HTH
cheers,

i maybe wrong but one of the criteria for being thread safe is to use local variables only. Using global variables can have undefined result if the same function is called from different threads.

A thread safe function / object (hereafter referred to as an object) is an object which is designed to support multiple concurrent calls. This can be achieved by serialization of the parallel requests or some sort of support for intertwined calls.
Essentially, if the object safely supports concurrent requests (from multiple threads), it is thread safe. If it is not thread safe, multiple concurrent calls could corrupt its state.
Consider a log book in a hotel. If a person is writing in the book and another person comes along and starts to concurrently write his message, the end result will be a mix of both messages. This can also be demonstrated by several threads writing to an output stream.

I would say to understand thread safe, start with understanding difference between thread safe function and reentrant function.
Please check The difference between thread-safety and re-entrancy for details.

Tread-safe code is code that won't fail because the same data was changed in two places at once. Thread safe is a smaller concept than concurrency-safe, because it presumes that it was in fact two threads of the same program, rather than (say) hardware modifying data, or the OS.

A particularly valuable aspect of the term is that it lies on a spectrum of concurrent behavior, where thread safe is the strongest, interrupt safe is a weaker constraint than thread safe, and reentrant even weaker.
In the case of thread safe, this means that the code in question conforms to a consistent api and makes use of resources such that other code in a different thread (such as another, concurrent instance of itself) will not cause an inconsistency, so long as it also conforms to the same use pattern. the use pattern MUST be specified for any reasonable expectation of thread safety to be had.
The interrupt safe constraint doesn't normally appear in modern userland code, because the operating system does a pretty good job of hiding this, however, in kernel mode this is pretty important. This means that the code will complete successfully, even if an interrupt is triggered during its execution.
The last one, reentrant, is almost guaranteed with all modern languages, in and out of userland, and it just means that a section of code may be entered more than once, even if execution has not yet preceeded out of the code section in older cases. This can happen in the case of recursive function calls, for instance. It's very easy to violate the language provided reentrancy by accessing a shared global state variable in the non-reentrant code.

How can I write a lock free structure?

In my multithreaded application and I see heavy lock contention in it, preventing good scalability across multiple cores. I have decided to use lock free programming to solve this.
How can I write a lock free structure?

Short answer is:
You cannot.
Long answer is:
If you are asking this question, you do not probably know enough to be able to create a lock free structure. Creating lock free structures is extremely hard, and only experts in this field can do it. Instead of writing your own, search for an existing implementation. When you find it, check how widely it is used, how well is it documented, if it is well proven, what are the limitations - even some lock free structure other people published are broken.
If you do not find a lock free structure corresponding to the structure you are currently using, rather adapt the algorithm so that you can use some existing one.
If you still insist on creating your own lock free structure, be sure to:
start with something very simple
understand memory model of your target platform (including read/write reordering constraints, what operations are atomic)
study a lot about problems other people encountered when implementing lock free structures
do not just guess if it will work, prove it
heavily test the result
More reading:
Lock free and wait free algorithms at Wikipedia
Herb Sutter: Lock-Free Code: A False Sense of Security

Use a library such as Intel's Threading Building Blocks, it contains quite a few lock -free structures and algorithms. I really wouldn't recommend attempting to write lock-free code yourself, it's extremely error prone and hard to get right.

Writing thread-safe lock free code is hard; but this article from Herb Sutter will get you started.

As sblundy pointed out, if all objects are immutable, read-only, you don't need to worry about locking, however, this means you may have to copy objects a lot. Copying usually involves malloc and malloc uses locking to synchronize memory allocations across threads, so immutable objects may buy you less than you think (malloc itself scales rather badly and malloc is slow; if you do a lot of malloc in a performance critical section, don't expect good performance).
When you only need to update simple variables (e.g. 32 or 64 bit int or pointers), perform simply addition or subtraction operations on them or just swap the values of two variables, most platforms offer "atomic operations" for that (further GCC offers these as well). Atomic is not the same as thread-safe. However, atomic makes sure, that if one thread writes a 64 bit value to a memory location for example and another thread reads from it, the reading one either gets the value before the write operation or after the write operation, but never a broken value in-between the write operation (e.g. one where the first 32 bit are already the new, the last 32 bit are still the old value! This can happen if you don't use atomic access on such a variable).
However, if you have a C struct with 3 values, that want to update, even if you update all three with atomic operations, these are three independent operations, thus a reader might see the struct with one value already being update and two not being updated. Here you will need a lock if you must assure, the reader either sees all values in the struct being either the old or the new values.
One way to make locks scale a lot better is using R/W locks. In many cases, updates to data are rather infrequent (write operations), but accessing the data is very frequent (reading the data), think of collections (hashtables, trees). In that case R/W locks will buy you a huge performance gain, as many threads can hold a read-lock at the same time (they won't block each other) and only if one thread wants a write lock, all other threads are blocked for the time the update is performed.
The best way to avoid thread-issues is to not share any data across threads. If every thread deals most of the time with data no other thread has access to, you won't need locking for that data at all (also no atomic operations). So try to share as little data as possible between threads. Then you only need a fast way to move data between threads if you really have to (ITC, Inter Thread Communication). Depending on your operating system, platform and programming language (unfortunately you told us neither of these), various powerful methods for ITC might exist.
And finally, another trick to work with shared data but without any locking is to make sure threads don't access the same parts of the shared data. E.g. if two threads share an array, but one will only ever access even, the other one only odd indexes, you need no locking. Or if both share the same memory block and one only uses the upper half of it, the other one only the lower one, you need no locking. Though it's not said, that this will lead to good performance; especially not on multi-core CPUs. Write operations of one thread to this shared data (running one core) might force the cache to be flushed for another thread (running on another core) and these cache flushes are often the bottle neck for multithread applications running on modern multi-core CPUs.

As my professor (Nir Shavit from "The Art of Multiprocessor Programming") told the class: Please don't. The main reason is testability - you can't test synchronization code. You can run simulations, you can even stress test. But it's rough approximation at best. What you really need is mathematical correctness proof. And very few capable understanding them, let alone writing them.
So, as others had said: use existing libraries. Joe Duffy's blog surveys some techniques (section 28). The first one you should try is tree-splitting - break to smaller tasks and combine.

Immutability is one approach to avoid locking. See Eric Lippert's discussion and implementation of things like immutable stacks and queues.

in re. Suma's answer, Maurice Herlithy shows in The Art of Multiprocessor Programming that actually anything can be written without locks (see chapter 6). iirc, This essentially involves splitting tasks into processing node elements (like a function closure), and enqueuing each one. Threads will calculate the state by following all nodes from the latest cached one. Obviously this could, in worst case, result in sequential performance, but it does have important lockless properties, preventing scenarios where threads could get scheduled out for long peroids of time when they are holding locks. Herlithy also achieves theoretical wait-free performance, meaning that one thread will not end up waiting forever to win the atomic enqueue (this is a lot of complicated code).
A multi-threaded queue / stack is surprisingly hard (check the ABA problem). Other things may be very simple. Become accustomed to while(true) { atomicCAS until I swapped it } blocks; they are incredibly powerful. An intuition for what's correct with CAS can help development, though you should use good testing and maybe more powerful tools (maybe SKETCH, upcoming MIT Kendo, or spin?) to check correctness if you can reduce it to a simple structure.
Please post more about your problem. It's difficult to give a good answer without details.
edit immutibility is nice but it's applicability is limited, if I'm understanding it right. It doesn't really overcome write-after-read hazards; consider two threads executing "mem = NewNode(mem)"; they could both read mem, then both write it; not the correct for a classic increment function. Also, it's probably slow due to heap allocation (which has to be synchronized across threads).

Inmutability would have this effect. Changes to the object result in a new object. Lisp works this way under the covers.
Item 13 of Effective Java explains this technique.

Cliff Click has dome some major research on lock free data structures by utilizing finite state machines and also posted a lot of implementations for Java. You can find his papers, slides and implementations at his blog: http://blogs.azulsystems.com/cliff/

Use an existing implementation, as this area of work is the realm of domain experts and PhDs (if you want it done right!)
For example there is a library of code here:
http://www.cl.cam.ac.uk/research/srg/netos/lock-free/

Most lock-free algorithms or structures start with some atomic operation, i.e. a change to some memory location that once begun by a thread will be completed before any other thread can perform that same operation. Do you have such an operation in your environment?
See here for the canonical paper on this subject.
Also try this wikipedia article article for further ideas and links.

The basic principle for lock-free synchronisation is this:
whenever you are reading the structure, you follow the read with a test to see if the structure was mutated since you started the read, and retry until you succeed in reading without something else coming along and mutating while you are doing so;
whenever you are mutating the structure, you arrange your algorithm and data so that there is a single atomic step which, if taken, causes the entire change to become visible to the other threads, and arrange things so that none of the change is visible unless that step is taken. You use whatever lockfree atomic mechanism exists on your platform for that step (e.g. compare-and-set, load-linked+store-conditional, etc.). In that step you must then check to see if any other thread has mutated the object since the mutation operation began, commit if it has not and start over if it has.
There are plenty of examples of lock-free structures on the web; without knowing more about what you are implementing and on what platform it is hard to be more specific.

If you are writing your own lock-free data structures for a multi-core cpu, do not forget about memory barriers! Also, consider looking into Software Transaction Memory techniques.

Well, it depends on the kind of structure, but you have to make the structure so that it carefully and silently detects and handles possible conflicts.
I doubt you can make one that is 100% lock-free, but again, it depends on what kind of structure you need to build.
You might also need to shard the structure so that multiple threads work on individual items, and then later on synchronize/recombine.

As mentioned, it really depends on what type of structure you're talking about. For instance, you can write a limited lock-free queue, but not one that allows random access.

Reduce or eliminate shared mutable state.

In Java, utilize the java.util.concurrent packages in JDK 5+ instead of writing your own. As was mentioned above, this is really a field for experts, and unless you have a spare year or two, rolling your own isn't an option.

Can you clarify what you mean by structure?
Right now, I am assuming you mean the overall architecture. You can accomplish it by not sharing memory between processes, and by using an actor model for your processes.

Take a look at my link ConcurrentLinkedHashMap for an example of how to write a lock-free data structure. It is not based on any academic papers and doesn't require years of research as others imply. It simply takes careful engineering.
My implementation does use a ConcurrentHashMap, which is a lock-per-bucket algorithm, but it does not rely on that implementation detail. It could easily be replaced with Cliff Click's lock-free implementation. I borrowed an idea from Cliff, but used much more explicitly, is to model all CAS operations with a state machine. This greatly simplifies the model, as you'll see that I have psuedo locks via the 'ing states. Another trick is to allow laziness and resolve as needed. You'll see this often with backtracking or letting other threads "help" to cleanup. In my case, I decided to allow dead nodes on the list be evicted when they reach the head, rather than deal with the complexity of removing them from the middle of the list. I may change that, but I didn't entirely trust my backtracking algorithm and wanted to put off a major change like adopting a 3-node locking approach.
The book "The Art of Multiprocessor Programming" is a great primer. Overall, though, I'd recommend avoiding lock-free designs in the application code. Often times it is simply overkill where other, less error prone, techniques are more suitable.

If you see lock contention, I would first try to use more granular locks on your data structures rather than completely lock-free algorithms.
For example, I currently work on multithreaded application, that has a custom messaging system (list of queues for each threads, the queue contains messages for thread to process) to pass information between threads. There is a global lock on this structure. In my case, I don't need speed so much, so it doesn't really matter. But if this lock would become a problem, it could be replaced by individual locks at each queue, for example. Then adding/removing element to/from the specific queue would didn't affect other queues. There still would be a global lock for adding new queue and such, but it wouldn't be so much contended.
Even a single multi-produces/consumer queue can be written with granular locking on each element, instead of having a global lock. This may also eliminate contention.

If you read several implementations and papers regarding the subject, you'll notice there is the following common theme:
1) Shared state objects are lisp/clojure style inmutable: that is, all write operations are implemented copying the existing state in a new object, make modifications to the new object and then try to update the shared state (obtained from a aligned pointer that can be updated with the CAS primitive). In other words, you NEVER EVER modify an existing object that might be read by more than the current thread. Inmutability can be optimized using Copy-on-Write semantics for big, complex objects, but thats another tree of nuts
2) you clearly specify what allowed transitions between current and next state are valid: Then validating that the algorithm is valid become orders of magnitude easier
3) Handle discarded references in hazard pointer lists per thread. After the reference objects are safe, reuse if possible
See another related post of mine where some code implemented with semaphores and mutexes is (partially) reimplemented in a lock-free style:
Mutual exclusion and semaphores

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string