Thread Locking in Large Parralel Applications

Thread Locking in Large Parralel Applications - multithreading

I have a slightly more general question about parallelisation and threadlocking synchronization in large applications. I am working on an application with a large number of object types with a deep architecture that also utilises parallelisation of most key tasks. At present synchronisation is done with thread locking management inside each object of the system. The problem is that the locking scope is only as large as each object, whereas the object attibutes are being passed through many different objects where the attributes are losing synchronisation protection.
What is best-practice on thread management, 'synchronization contexts' &c. in large applications? It seems the only foolproof solution is to make data synchronization application wide such that data can be consumed safely by any object at any time, but this seems to violate object oriented coding concepts.
How is this problem best managed?

One approach is to make your objects read-only; a read-only object doesn't need any synchronization because there is no chance of any thread reading it while another thread writes to it (because no thread ever writes to it). Object lifetime issues can be handled using lock-free reference counting (using atomic-counters for thread safety).
Of course the down side is that if you actually want to change an object's state you can't; you have to create a new object that is a copy of the old object except for the changed part. Depending on what your application does, that overhead may or may not be acceptable.

Related

Can a singleton object have a (thread-safe) method executing in different threads simultaneously?

If so, how will different threads share the same instance or chunk of memory that represents the object? Will different threads somehow "copy" the single-instance object method code to be run on its own CPU resources?
EDIT - to clarify this question further:
I understand that different threads can be in the "process" of executing a singleton object's method at the same time, while they might not all be actively executing - they may be waiting to be scheduled for execution by the OS and in various states of execution within the method. This question is specifically for multiple active threads that are executing each on a different processor. Can multiple threads be actively executing the same exact code path (same region of memory) at the same time?

Can a singleton object have a (thread-safe) method executing in different threads simultaneously?
Yes, of course. Note that this is no different from multiple threads running a method on the same non-singleton object instance.
how will different threads share the same instance or chunk of memory that represents the object?
(Ignoring NUMA and processor caches), the object exists in only one place in memory (hence, it's a singleton) so the different threads are all reading from the same memory addresses.
Now, if the object is immutable (and doesn't have external side-effects, like IO) then that's okay: multiple threads reading from the same memory and never changing it doesn't introduce any problems.
By analogy, this is like having a single (single-sided) piece of paper on a desk and 10 people reading it simultaneously: no problem.
If the object is mutable or (does have external side-effects like IO) then that's a problem :) You need to synchronize each thread's changes otherwise they'll be overwriting each other - this is bad.
Note that in many languages and platforms, like C#/.NET and C++, documentation can often assert or claim that a method is "thread-safe" but this is not necessarily absolutely guaranteed by the runtime (excepting the case where a function is provably "const-correct" and has no IO, which is something C++ can do, but C# cannot currently do that).
Note that just because a single method is thread-safe, doesn't mean an entire logical operation (e.g. calling multiple object methods in sequence) is thread-safe; for example, consider an Dictionary<K,V>: many threads can simultaneously call TryGetValue without issues - but as soon as one thread calls .Add then that messes up all of the other threads calling TryGetValue (because Dictionary<K,V> does not guarantee that .Add is atomic and blocks TryGetValue, this is why we use ConcurrentDictionary which has a different API - or we wrap our logical business/domain operations in lock (Monitor) statements).
Will different threads somehow "copy" the single-instance object method code to be run on its own CPU resources?
No.
You can make that happen with [ThreadLocal] and [ThreadStatic] (and [AsyncLocal]) but proponents of functional programming techniques (like myself) will strongly recommend against this for many reasons. If you want a thread to "own" something then pass the object on the stack as a parameter and not inside hidden static state.
(Also note this does not apply to value-types (e.g. struct) which are always copied between uses, but I assume you're referring to class object instances, fwiw - as .NET discourages using mutable structs anyway).

How to synchronize insert/removal of elements to/from a data structure, the Functional Way?

I have a data structure, say a MinHeap. It has methods like peek(), removeElement() and addElement(). removeElement() and addElement() can produce inconsistent states if they are not made thread safe (because they involve increasing/decreasing the currentHeapSize).
Now, I want to implement this data structure, the functional way. I have read that in functional programming immutability is the key which leads to thread safety. How do I implement that here? Should I avoid incrementing/decrementing the currentHeapSize? If so, how? I would like some direction with this.
Edit #1
#YuvalItzchakov and #Dima have pointed out saying that I need to return a new collection everytime I do an insert/delete, which makes sense. But wouldn't that hamper the performance critically?
My use case is that I will be getting a stream of data and I keep adding it to the heap. When ever someone requests data, the root of the min heap is returned. So here insertion happens very rapidly. Wouldn't creating a new Heap for every insert prove to be costly? I think it would. If so, how does functional programming really help? Is it just a theoretical concept or does it have practical implications as well?

The problem of parallel access to the same data structure is twofold. First, we need to serialize parallel updates. #Tim gave a comprehensive answer to this. Second, in the case there are many readers, we may want to allow them to read in parallel with writing. And in this case immutability plays its role. Without it, writers have to wait the readers to finish.

There isn't really a "functional" way to have a data structure that can be updated by multiple threads. In fact one reason that functional programming is so good in a multi-threaded environment is because there aren't any shared data structures like this.
However in the real world this problem comes up all the time, so you need some way to serialise access to the shared data structure. The most crude way is simply to put a big lock around the whole code and only allow one thread to run at once (e.g. with a Mutex). With clever design this can be made reasonably efficient, but it can be difficult to get right and complicated to maintain.
A more sophisticated approach is to have a thread-safe queue of requests to your data structure and a single worker thread that processes these requests one-by-one. One popular framework that supports this model is Akka Actors. You wrap your data structure in an Actor which then receives requests to read or modify the data structure. The Akka framework will ensure that only one message is processed at once.
In your case your the actor would manage the heap and receive updates from the stream which would go into the heap. Other threads can then make requests that will be processes in a sequential, thread-safe way. It is best if these request perform specific queries on the heap, rather than just returning the whole heap every time.

You may use cats Ref type class
https://typelevel.org/cats-effect/concurrency/ref.html
But it is that AtomicReference realization, or write some java.util.concurent.ConcurentHashMap wrapper

Does garbage collection lead to bad software design?

I have heard that garbage collection leads to bad software design. Is that true? It is true that we don't care about the lifetime of objects in garbage collected languages, but does that have an effect in program design?

If an object asks other objects to do something on its behalf until further notice, the object's owner should notify it when its services are not required. Having code abandon such objects without notifying them that their services won't be required anymore would be bad design, but that would be just as true in a garbage-collected framework as in a non-GC framework.
Garbage-collected frameworks, used properly, offer two advantages:
In many cases, objects are created for the purpose of encapsulating values therein, and references to the objects are passed around as proxies for that data. Code receiving such references shouldn't care about whether other references exist to those same objects or whether it holds the last surviving references. As long as someone holds a reference to an object, the data should be kept. Once nobody needs the data anymore, it should cease to exist, but nobody should particularly notice.
In non-GC frameworks, an attempt to use a disposed object will usually generate Undefined Behavior that cannot be reliably trapped (and may allow code to violate security policies). In many GC frameworks, it's possible to ensure that attempts to use disposed resources will be trapped deterministically and cannot undermine security.
In some cases, garbage-collection will allow a programmer to "get away with" designs that are sloppier than would be tolerable in a non-GC system. A GC-based framework will also, however, allow the use of many good programming patterns which would could not be implemented as efficiently in a non-GC system. For example, if a program uses multiple worker threads to find the optimal solution for a problem, and has a UI thread which periodically wants to show the best solution found so far, the UI thread would want to know that when it asks for a status update it will get a solution that has been found, but won't want to burden the worker threads with the synchronization necessary to ensure that it has the absolute-latest solution.
In a non-GC system, thread synchronization would be unavoidable, since the UI thread and worker thread would have to coordinate who was going to delete a status object that becomes obsolete while it's being shown. In a GC-based system, however, the GC would be able to tell whether a UI thread was able to grab a reference to a status object before it got replaced, and thus resolve whether the object needed to be kept alive long enough for the UI thread to display it. The GC would sometimes have to force thread synchronization to find all reachable references, but occasional synchronization for the GC may pose less of a performance drain better than the frequent thread synchronization required in a non-GC system.

Are "benaphores" worth implementing on modern OS's?

Back in my days as a BeOS programmer, I read this article by Benoit Schillings, describing how to create a "benaphore": a method of using atomic variable to enforce a critical section that avoids the need acquire/release a mutex in the common (no-contention) case.
I thought that was rather clever, and it seems like you could do the same trick on any platform that supports atomic-increment/decrement.
On the other hand, this looks like something that could just as easily be included in the standard mutex implementation itself... in which case implementing this logic in my program would be redundant and wouldn't provide any benefit.
Does anyone know if modern locking APIs (e.g. pthread_mutex_lock()/pthread_mutex_unlock()) use this trick internally? And if not, why not?

What your article describes is in common use today. Most often it's called "Critical Section", and it consists of an interlocked variable, a bunch of flags and an internal synchronization object (Mutex, if I remember correctly). Generally, in the scenarios with little contention, the Critical Section executes entirely in user mode, without involving the kernel synchronization object. This guarantees fast execution. When the contention is high, the kernel object is used for waiting, which releases the time slice conductive for faster turnaround.
Generally, there is very little sense in implementing synchronization primitives in this day and age. Operating systems come with a big variety of such objects, and they are optimized and tested in significantly wider range of scenarios than a single programmer can imagine. It literally takes years to invent, implement and test a good synchronization mechanism. That's not to say that there is no value in trying :)

Java's AbstractQueuedSynchronizer (and its sibling AbstractQueuedLongSynchronizer) works similarly, or at least it could be implemented similarly. These types form the basis for several concurrency primitives in the Java library, such as ReentrantLock and FutureTask.
It works by way of using an atomic integer to represent state. A lock may define the value 0 as unlocked, and 1 as locked. Any thread wishing to acquire the lock attempts to change the lock state from 0 to 1 via an atomic compare-and-set operation; if the attempt fails, the current state is not 0, which means that the lock is owned by some other thread.
AbstractQueuedSynchronizer also facilitates waiting on locks and notification of conditions by maintaining CLH queues, which are lock-free linked lists representing the line of threads waiting either to acquire the lock or to receive notification via a condition. Such notification moves one or all of the threads waiting on the condition to the head of the queue of those waiting to acquire the related lock.
Most of this machinery can be implemented in terms of an atomic integer representing the state as well as a couple of atomic pointers for each waiting queue. The actual scheduling of which threads will contend to inspect and change the state variable (via, say, AbstractQueuedSynchronizer#tryAcquire(int)) is outside the scope of such a library and falls to the host system's scheduler.

Thread safety... what's my "best" course of action?

I'm wondering what is the "best" way to make data thread-safe.
Specifically, I need to protect a linked-list across multiple threads -- one thread might try to read from it while another thread adds/removes data from it, or even frees the entire list. I've been reading about locks; they seem to be the most commonly used approach, but apparently they can be problematic (deadlocks). I've also read about atomic-operations as well as thread-local storage.
In your opinion, what would be my best course of action? What's the approach that most programmers use, and for what reason?

One approach that is not heavily used, but quite sound, is to designate one special purpose thread to own every "shared" structure. That thread generally sits waiting on a (thread-safe;-) queue, e.g. in Python a Queue.Queue instance, for work requests (reading or changing the shared structure), including both ones that request a response (they'll pass their own queue on which the response is placed when ready) and ones that don't. This approach entirely serializes all access to the shared resource, remaps easily to a multi-process or distributed architecture (almost brainlessly, in Python, with multiprocessing;-), and absolutely guarantees soundness and lack of deadlocks as well as race conditions as long as the underlying queue object is well-programmed once and for all.
It basically turns the hell of shared data structures into the paradise of message-passing concurrency architectures.
OTOH, it may be a tad higher-overhead than slugging it out the hard way with locks &c;-).

You could consider an immutable collection. Much like how a string in .net has methods such as Replace, Insert, etc. It doesn't modify the string but instead creates a new one, a LinkedList collection can be designed to be immutable as well. In fact, a LinkedList is actually fairly simple to implement this way as compared to some other collection data structures.
Here's a link to a blog post discussing immutable collections and a link to some implementations in .NET.
http://blogs.msdn.com/jaredpar/archive/2009/04/06/immutable-vs-mutable-collection-performance.aspx

Always remember the most important rule of thread safety. Know all the critical sections of your code inside out. And by that, know them like your ABCs. Only if you can identify them at go once asked will you know which areas to operate your thread safety mechanisms on.
After that, remember the rules of thumb:
Look out for all your global
variables / variables on the heap.
Make sure your subroutines are
re-entrant.
Make sure access to shared data is
serialized.
Make sure there are no indirect
accesses through pointers.
(I'm sure others can add more.)

The "best" way, from a safety point of view, is to put a lock on the entire data structure, so that only one thread can touch it at a time.
Once you decide to lock less than the entire structure, presumably for performance reasons, the details of doing this are messy and differ for every data structure, and even variants of the same structure.
My suggestion is to
Start with a global lock on your data structure. Profile your program to see if it's really a problem.
If it is a problem, consider whether there's some other way to distribute the problem. Can you minimize the amount of data in the data structure in question, so that it need not be accessed so often or for so long? If it's a queuing system, for example, perhaps you can keep a local queue per thread, and only move things into or out of a global queue when a local queue becomes over- or under-loaded.
Look at data structures designed to help reduce contention for the particular type of thing you're doing, and implement them carefully and precisely, erring on the side of safety. For the queuing example, work-stealing queues might be what you need.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Thread Locking in Large Parralel Applications - multithreading

Related

Can a singleton object have a (thread-safe) method executing in different threads simultaneously?

How to synchronize insert/removal of elements to/from a data structure, the Functional Way?

Does garbage collection lead to bad software design?

Are "benaphores" worth implementing on modern OS's?

Thread safety... what's my "best" course of action?

Categories

Resources