Lock-free memory reclamation with hazard pointers

Lock-free memory reclamation with hazard pointers - multithreading

Hazard pointers are a technique for safely reclaiming memory in lock-free code without garbage-collection.
The idea is that before accessing an object that can be deleted concurrently, a thread sets its hazard pointer to point to that object. A thread that wants to delete an object will first check whether any hazard pointers are set to point to that object. If so, deletion will be postponed, so that the accessing thread does not end up reading deleted data.
Now, imagine our deleting thread starts to iterate the list of hazard pointers and at the i+1 element it gets preempted. Now another thread sets the hazard pointer at i to the object that the deleting thread is currently trying to delete. Afterwards, the deleting thread resumes, checks the rest of the list, and deletes the object, even though there is now a hazard pointer at position i pointing to the object.
So clearly just setting the hazard pointer is not enough, as a deleting thread might already have checked our hazard pointer and decided that our thread does not want to access the object. How can I make sure, after setting a hazard pointer, that the object I'm trying to access won't be deleted from under my hands?

The Authoritative Answer
The original paper by Maged M. Michael places this important restriction on algorithms using hazard pointers:
The methodology requires lock-free algorithms to guarantee that no
thread can access a dynamic node at a time when it is possibly removed
from the object, unless at least one of the thread’s associated hazard
pointers has been pointing to that node continuously, from a time when
the node was guaranteed to be reachable from the object’s roots. The
methodology prevents the freeing of any retired node continuously
pointed to by one or more hazard pointers of one or more threads from
a point prior to its removal.
What it means for the deleting thread
As pointed out in Anton's answer, deletion is a two-phase operation: First you have to 'unpublish' the node, remove it from the data structure so that it can no longer be accessed from the public interface.
At this point, the node is possibly removed, in Michael's terms. It is no longer safe for concurrent threads to access it (unless they already have been holding a hazard pointer to it throughout).
Thus, once a node is possibly removed, it is safe for the deleting thread to iterate the list of hazard pointers. Even if the deleting thread gets preempted, a concurrent thread may not access the node anymore. After verifying that no hazard pointers are set to the node, the deleting thread can safely proceed to the second phase of deletion: The actual deallocation.
In summary, the order of operations for the deleting thread is
D-1. Remove the node from the data structure.
D-2. Iterate the list of hazard pointers.
D-3. If no hazards were found, delete the node.
The real algorithm is slightly more involved, as we need to maintain a list of those nodes that cannot be reclaimed and ensure that they get deleted eventually. This has been skipped here, as it is not relevant to explain the issue raised in the question.
What it means for the accessing thread(s)
Setting the hazard pointer is not enough to guarantee safe access to it. After all, the node might be possibly removed by the time we set our hazard pointer.
The only way to ensure safe access is if we can guarantee that our hazard pointer has been pointing to that node continuously, from a time when the node was guaranteed to be reachable from the object’s roots.
Since the code is supposed to be lock-free, there is only one way to achieve this: We optimistically set our hazard pointer to the node and then check whether that node has been marked as possibly deleted (that is, it is no longer reachable from the public root) afterwards.
Thus the order of operations for the accessing thread is
A-1. Obtain a pointer to the node by traversing the data structure.
A-2. Set the hazard pointer to point to the node.
A-3. Check that the node is still part of the data structure.
That is, it has not been possibly removed in the meantime.
A-4. If the node is still valid, access it.
Potential Races affecting the deleting thread
After a node has been possibly removed (D-1), the deleting thread could be preempted. Thus it is still possible for concurrent threads to optimistically set their hazard pointer to it (even though they are not allowed to access it) (A-2).
Therefore, the deleting thread might detect a spurious hazard, preventing it from deleting the node right away, even though none of the other threads will access the node anymore. This will simply delay deletion of the node in the same way a legitimate hazard would.
The important point is that the node will still be deleted eventually.
Potential Races affecting the accessing thread
An accessing thread may get preempted by a deleting thread before verifying that the node has not been potentially removed (A-3). In such a case, it is no longer allowed to access the object.
Note that in case the preemption occurs after A-2, it would even be safe for the accessing thread to access the node (since there was a hazard pointer pointing to the node throughout), but since it is impossible for the accessing thread to distinguish this case, it must fail spuriously.
The important point is that a node will only ever be accessed if it has not been deleted.

A thread that wants to delete an object will first check whether any hazard pointers are set to point to that object.
Here is the problem. 'delete' actually is two-phase operation:
remove from a container or any other public structure. Generally speaking, unpublish it.
deallocate the memory
So, the iteration through the hazard pointers must go between them to prevent the situation you described as:
another thread sets the hazard pointer at i to the object that the deleting thread is currently trying to delete
because there must be no way for another thread to acquire the object being deleted.

Related

Is it safe to update an object in a thread without locks if other threads will not access it?

I have a vector of entities. At update cycle I iterate through vector and update each entity: read it's position, calculate current speed, write updated position. Also, during updating process I can change some other objects in other part of program, but each that object related only to current entity and other entities will not touch that object.
So, I want to run this code in threads. I separate vector into few chunks and update each chunk in different threads. As I see, threads are fully independent. Each thread on each iteration works with independent memory regions and doesn't affect other threads work.
Do I need any locks here? I assume, that everything should work without any mutexes, etc. Am I right?

Short answer
No, you do not need any lock or synchronization mechanism as your problem appear to be a embarrassingly parallel task.
Longer answer
A race conditions that can only appear if two threads might access the same memory at the same time and at least one of the access is a write operation. If your program exposes this characteristic, then you need to make sure that threads access the memory in an ordered fashion. One way to do it is by using locks (it is not the only one though). Otherwise the result is UB.
It seems that you found a way to split the work among your threads s.t. each thread can work independently from the others. This is the best case scenario for concurrent programming as it does not require any synchronization. The complexity of the code is dramatically decreased and usually speedup will jump up.
Please note that as #acelent pointed out in the comment section, if you need changes made by one thread to be visible in another thread, then you might need some sort of synchronization due to the fact that depending on the memory model and on the HW changes made in one thread might not be immediately visible in the other.
This means that you might write from Thread 1 to a variable and after some time read the same memory from Thread 2 and still not being able to see the write made by Thread 1.

"I separate vector into few chunks and update each chunk in different threads" - in this case you do not need any lock or synchronization mechanism, however, the system performance might degrade considerably due to false sharing depending on how the chunks are allocated to threads. Note that the compiler may eliminate false sharing using thread-private temporal variables.
You can find plenty of information in books and wiki. Here is some info https://software.intel.com/en-us/articles/avoiding-and-identifying-false-sharing-among-threads
Also there is a stackoverflow post here does false sharing occur when data is read in openmp?

Does a PTHREAD mutex only avoid simultaneous access to a resource, or it does anything more?

Example:
A thread finishes writing to a shared variable, and then it unlocks it, but continues to use that variable's value (without changing it).
And immediately, another thread successfully unlocks() that mutex and reads the shared variable.
For my (mis-)understanding, some things could be happening on this situation:
On the WRITER thread:
A compiler optimization could make the write occur only at some later point
The written value could be retained in the current CPU core's cache, and flushed to the memory at some later point
On the READER thread:
The value of the variable may have been read before the mutex lock(), and because of some compiler optimization or just the usual work of the CPU cache, still be considered "already read from memory" and thus, not fetched from the memory again.
Thus, the value we have here is not the updated one from the other thread.
Does the pthread mutex lock/unlock() functions execute any code to "flush" the current cache to the memory and anything else needed to make sure the current thread is synchronized with everything else (I cannot think of anything else than the cache), or is it just not needed (at least in all known architectures)?
Because if all the mutexes do is just what the name does - mutual exclusion to it's reference - then, if I have thousands of threads dealing with the same data and from my algorithm's point of view, I already know that when one thread is using a variable, no other thread will try to use it at the same time, than it means I don't need a mutex? Or will my code be missing some low level and architecture-specific method(s) implemented inside the PTHREAD library to avoid the problems above?

The pthreads mutex lock and unlock functions are among the list of functions in POSIX "...that synchronize thread execution and also synchronize memory with respect to other threads". So yes, they do more than just interlock execution.
Whether or not they need to issue additional instructions to the hardware is of course architecture dependent (noting that almost every modern CPU architecture will at least happily reorder reads with respect to each other unless told otherwise), but in every case those functions must act as "compiler barriers" - that is, they ensure that the compiler won't reorder, coalesce or omit memory accesses in situations where it would otherwise be allowed to.
It is allowed to have multiple threads reading a shared value without mutual exclusion though - all you need to ensure is that both the writing and reading threads executed some synchronising function between the write and the read. For example, an allowable situation is to have many reading threads that defer reading the shared state until they have passed a barrier (pthread_barrier_wait()) and a writing thread that performs all its writes to the shared state before it passes the barrier. Reader-writer locks (pthread_rwlock_*) are also built around this idea.

multiple threads vs reference counting: does each thread count variables separately

I've been playing around with glib, which
utilizes reference counting to manage memory for its objects;
supports multiple threads.
What I can't understand is how they play together.
Namely:
In glib each thread doesn't seem to increase refcount of objects passed on its input, AFAIK (I'll call them thread-shared objects). Is it true? (or I've just failed to find the right piece of code?) Is it a common practice not to increase refcounts to thread-shared objects for each thread, that shares them, besides the main thread (responsible for refcounting them)?
Still, each thread increases reference counts for the objects, dynamically created by itself. Should the programmer bother not to give the same names of variables in each thread in order to prevent collision of names and memory leaks? (E.g. on my picture, thread2 shouldn't crate a heap variable called output_object or it will collide with thread1's heap variable of the same name)?
UPDATE: Answer to (question 2) is no, cause the visibility scope of
those variables doesn't intersect:
Is dynamically allocated memory (heap), local to a function or can all functions in a thread have access to it even without passing pointer as an argument.
An illustration to my questions:

I think that threads are irrelevant to understanding the use of reference counters. The point is rather ownership and lifetime, and a thread is just one thing that is affected by this. This is a bit difficult to explain, hopefully I'll make this clearer using examples.
Now, let's look at the given example where main() creates an object and starts two threads using that object. The question is, who owns the created object? The simple answer is that main() and both threads share this object, so this is shared ownership. In order to model this, you should increment the refcounter before each call to pthread_create(). If the call fails, you must decrement it again, otherwise it is the responsibility of the started thread to do that when it is done with the object. Then, when main() terminates, it should also release ownership, i.e. decrement the refcounter. The general rule is that when adding an owner, increment the refcounter. When an owner is done with the object, it decrements the refcounter and the last one destroys the object with that.
Now, why does the the code not do this? Firstly, you can get away with adding the first thread as owner and then passing main()'s ownership to the second thread. This will save one increment/decrement operation. This still isn't what's happening though. Instead, no reference counting is done at all, and the simple reason is that it isn't used. The point of refcounting is to coordinate the lifetime of a dynamically allocated object between different owners that are peers. Here though, the object is created and owned by main(), the two threads are not peers but rather slaves of main. Since main() is the master that controls start/stop of the threads, it doesn't have to coordinate the lifetime of the object with them.
Lastly, though that might be due to the example-ness of your code, I think that main simply leaks the reference, relying on the OS to clean up. While this isn't beautiful, it doesn't hurt. In general, you can allocate objects once and then use them forever without any refcounting in some cases. An example for this is the main window of an application, which you only need once and for the whole runtime. You shouldn't repeatedly allocate such objects though, because then you have a significant memory leak that will increase over time. Both cases will be caught by tools like valgrind though.
Concerning your second question, concerning the heap variable name clash you expect, it doesn't exist. Variable names that are function-local can not collide. This is not because they are used by different threads, but even if the same function is called twice by the same thread (think recursion!) the local variables in each call to the function are distinct. Also, variable names are for the human reader. The compiler completely eradicates these.

UPDATE:
As matthias says below, GObject is not thread-safe, only reference counting functions are.
Original content:
GObject is supposed to be thread safe, but I've never played with that myself…

What does "readback" mean in terms of computer memory?

I am messing with multiple threads accessing a resource (probably memory). What does "readback" mean in this context?
Any guides will be helpful... Google didn't give me any good results.

I can think of several possible meanings for "readback". Here's the most likely; in a multithreaded environment, a lot can happen between your thread reading a value from memory and writing a changed value back to that memory. A simple yet effective way to detect changes is simply to get the value from memory again just before writing, and if it has changed from the value you started with, you know someone else changed it while you were working.
"Readback" may also refer to "repeatable reads", in which a locking mechanism is used to ensure that within the scope of an atomic set of operations, only the thread that obtained the lock on the resource can read OR write to it, ensuring that no other thread can change the value from what would be expected by the task if it ran single-threaded. That way, a thread doesn't have to detect external changes; the locking mechanism prevents such a thing from happening.

When I've encountered that term, it's usually in the context of writing a value to
a register or memory location that may also be accessed by some other software or
hardware. To check whether someone else has changed it, you might keep a private
copy of the data you wrote, and some time later read that shared register or memory location
to compare its current value to the stored private copy. That's the "readback".

Multithread read and write to a ::stl::vector, vector resource hard to release

I am writing code in VS2005 using its STL.
I have one UI thread to read a vector, and a work thread to write to a vector.
I use ::boost::shared_ptr as vector element.
vector<shared_ptr<Class>> vec;
but I find, if I manipulate the vec in both thread in the same time(I can guarantee they do not visit the same area, UI Thread always read the area that has the information)
vec.clear() seem can not release the resource. problem happend in shared_ptr, it can not release its resource.
What is the problem?
Does it because when the vector reach its order capacity, it reallocates in memory, then the original part is invalidated.
As far as I know when reallocating, iterator will be invalid, why some problem also happened when I used vec[i].
//-----------------------------------------------
What kinds of lock is needed?
I mean: If the vector's element is a shared_ptr, when a thread A get the point smart_p, the other thread B will wait till A finishes the operation on smart_p right?
Or just simply add lock when thread is trying to read the point, when the read opeation is finished, thread B can continu to do something.

When you're accessing the same resource from more than one thread, locking is necessary. If you don't, you have all sorts of strange behaviour, like you're seeing.
Since you're using Boost, an easy way to use locking is to use the Boost.Thread library. The best kind of locks you can use for this scenario are reader/writer locks; they're called shared_mutex in Boost.Thread.
But yes, what you're seeing is essentially undefined behaviour, due to the lack of synchronisation between the threads. Hope this helps!
Edit to answer OP's second question: You should use a reader lock when reading the smart pointer out of the vector, and a writer lock when writing or adding an item to the vector (so, the mutex is for the vector only). If multiple threads will be accessing the pointed-to object (i.e., what the smart pointer points to), then separate locks should be set up for them. In that case, you're better off putting a mutex object in the object class as well.

Another alternative is to eliminate the locking altogether by ensuring that the vector is accessed in only one thread. For example, by having the worker thread send a message to the main thread with the element(s) to add to the vector.

It is possible to do simultaneous access to a list or array like this. However, std::vector is not a good choice because of its resize behavior. To do it right needs a fixed-size array, or special locking or copy-update behavior on resize. It also needs independent front and back pointers again with locking or atomic update.
Another answer mentioned message queues. A shared array as I described is a common and efficient way to implement those.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string