I have a cache object designed to hold variables shared across modules.
Concurrent threads are accessing & writing to that cache object using a cache handler that uses variable locking to keep the cache integrity.
The cache holds all variables under a unique session id. When a new session id is created, it has to reference the old one, copying some of the variables into a new session.
Some session ids will be running concurrently too.
I need to clear out the cache as soon as all concurrent threads are done referencing it, and all new sessions have copied the variables from it into their session.
The problem...
I don't know when it's safe to clear the cache.
I have hundreds of threads making API calls of variable time. New sessions will spawn new threads. I can't just look at active threads to determine when to clear the cache.
My cache will grow to infinite size & eventually crash the program.
I believe this must be a common problem. And those far smarter then me have through I through.
Any ideas how to best solve this?
You can solve this issue with locking... create a Cleaner thread that sometimes locks the cache (take all the locks that you have on it) and clears it, then release the locks...
(if you have many locks for different part of the cache, you can also lock & clear the cache piece by piece...)
Related
While looking at some codebases on github I have seen that sometimes a mutex table is used instead of local mutexes, selecting a mutex from this table based on, for example, pointer or hash of some other variable & an integer to differentiate the "order" of locks.
My guess is that it is done to avoid creating & destroying mutexes all over the place to improve efficiency, but wouldn't that also create situations where multiple unrelated objects are trying to lock the same mutex & thus need to wait for an unrelated lock to be released as well as waiting for their own lock?
This could potentially be solved by using a very large table, but wouldn't that potentially waste a lot of memory and potentially system resources to just have hundreds of unused mutexes sitting around? Or is this not a big issue overall?
I tried looking it up but all that google gives me is "how to use a mutex" and similar stuff.
EDIT: By "local" I mean, a mutex that is not from a global table and is individually created whenever it is needed instead of picking an existing one from a table.
I have a vector of entities. At update cycle I iterate through vector and update each entity: read it's position, calculate current speed, write updated position. Also, during updating process I can change some other objects in other part of program, but each that object related only to current entity and other entities will not touch that object.
So, I want to run this code in threads. I separate vector into few chunks and update each chunk in different threads. As I see, threads are fully independent. Each thread on each iteration works with independent memory regions and doesn't affect other threads work.
Do I need any locks here? I assume, that everything should work without any mutexes, etc. Am I right?
Short answer
No, you do not need any lock or synchronization mechanism as your problem appear to be a embarrassingly parallel task.
Longer answer
A race conditions that can only appear if two threads might access the same memory at the same time and at least one of the access is a write operation. If your program exposes this characteristic, then you need to make sure that threads access the memory in an ordered fashion. One way to do it is by using locks (it is not the only one though). Otherwise the result is UB.
It seems that you found a way to split the work among your threads s.t. each thread can work independently from the others. This is the best case scenario for concurrent programming as it does not require any synchronization. The complexity of the code is dramatically decreased and usually speedup will jump up.
Please note that as #acelent pointed out in the comment section, if you need changes made by one thread to be visible in another thread, then you might need some sort of synchronization due to the fact that depending on the memory model and on the HW changes made in one thread might not be immediately visible in the other.
This means that you might write from Thread 1 to a variable and after some time read the same memory from Thread 2 and still not being able to see the write made by Thread 1.
"I separate vector into few chunks and update each chunk in different threads" - in this case you do not need any lock or synchronization mechanism, however, the system performance might degrade considerably due to false sharing depending on how the chunks are allocated to threads. Note that the compiler may eliminate false sharing using thread-private temporal variables.
You can find plenty of information in books and wiki. Here is some info https://software.intel.com/en-us/articles/avoiding-and-identifying-false-sharing-among-threads
Also there is a stackoverflow post here does false sharing occur when data is read in openmp?
I have a program that requires several processes access a shared resource. This shared resource does not exist when they all start, so one of them needs to create it. Once the shared resource is created, important infrastructure is installed for use later. There is, however, a possibility that if the "creator" process is scheduled out before it can install the infrastructure in the shared resource, that other processes will try to use the uninitialized data (leading to undefined behavior).
In order to control this, I've created a named semaphore (sem_t *sem_init). Any process that is not the creator "downs" or "waits" on this zero-initialized semaphore. When the creator process has finished setup, it "up's" or "posts" the semaphore, releasing the processes. However, there remains one problem. I do not know exactly how many processes are waiting on it.
In order to solve this problem, I have the following options:
I create a counting semaphore. Each process "up's" or "posts" on this semaphore before blocking on the initialization semaphore. This way, I can know how many processes to release.
I just "post" on the initialization semaphore until it is the maximum allowed value.
I don't like these "solutions" though. For one, I am limited by the maximum size of a semaphore when it comes to the number of processes that I can count. It also seems like "posting" so many times would incur a nasty overhead. My question then, is whether there is any way in which I can instruct a semaphore to release all blocked processes, without me having to do any explicit bookkeeping on my end. I'd also not like to be constrained by the maximum value of a semaphore.
Something like: sem_releaseAll (sem_t *sem_p); would be ideal.
Note: I would greatly prefer a Linux-native solution.
Hazard pointers are a technique for safely reclaiming memory in lock-free code without garbage-collection.
The idea is that before accessing an object that can be deleted concurrently, a thread sets its hazard pointer to point to that object. A thread that wants to delete an object will first check whether any hazard pointers are set to point to that object. If so, deletion will be postponed, so that the accessing thread does not end up reading deleted data.
Now, imagine our deleting thread starts to iterate the list of hazard pointers and at the i+1 element it gets preempted. Now another thread sets the hazard pointer at i to the object that the deleting thread is currently trying to delete. Afterwards, the deleting thread resumes, checks the rest of the list, and deletes the object, even though there is now a hazard pointer at position i pointing to the object.
So clearly just setting the hazard pointer is not enough, as a deleting thread might already have checked our hazard pointer and decided that our thread does not want to access the object. How can I make sure, after setting a hazard pointer, that the object I'm trying to access won't be deleted from under my hands?
The Authoritative Answer
The original paper by Maged M. Michael places this important restriction on algorithms using hazard pointers:
The methodology requires lock-free algorithms to guarantee that no
thread can access a dynamic node at a time when it is possibly removed
from the object, unless at least one of the thread’s associated hazard
pointers has been pointing to that node continuously, from a time when
the node was guaranteed to be reachable from the object’s roots. The
methodology prevents the freeing of any retired node continuously
pointed to by one or more hazard pointers of one or more threads from
a point prior to its removal.
What it means for the deleting thread
As pointed out in Anton's answer, deletion is a two-phase operation: First you have to 'unpublish' the node, remove it from the data structure so that it can no longer be accessed from the public interface.
At this point, the node is possibly removed, in Michael's terms. It is no longer safe for concurrent threads to access it (unless they already have been holding a hazard pointer to it throughout).
Thus, once a node is possibly removed, it is safe for the deleting thread to iterate the list of hazard pointers. Even if the deleting thread gets preempted, a concurrent thread may not access the node anymore. After verifying that no hazard pointers are set to the node, the deleting thread can safely proceed to the second phase of deletion: The actual deallocation.
In summary, the order of operations for the deleting thread is
D-1. Remove the node from the data structure.
D-2. Iterate the list of hazard pointers.
D-3. If no hazards were found, delete the node.
The real algorithm is slightly more involved, as we need to maintain a list of those nodes that cannot be reclaimed and ensure that they get deleted eventually. This has been skipped here, as it is not relevant to explain the issue raised in the question.
What it means for the accessing thread(s)
Setting the hazard pointer is not enough to guarantee safe access to it. After all, the node might be possibly removed by the time we set our hazard pointer.
The only way to ensure safe access is if we can guarantee that our hazard pointer has been pointing to that node continuously, from a time when the node was guaranteed to be reachable from the object’s roots.
Since the code is supposed to be lock-free, there is only one way to achieve this: We optimistically set our hazard pointer to the node and then check whether that node has been marked as possibly deleted (that is, it is no longer reachable from the public root) afterwards.
Thus the order of operations for the accessing thread is
A-1. Obtain a pointer to the node by traversing the data structure.
A-2. Set the hazard pointer to point to the node.
A-3. Check that the node is still part of the data structure.
That is, it has not been possibly removed in the meantime.
A-4. If the node is still valid, access it.
Potential Races affecting the deleting thread
After a node has been possibly removed (D-1), the deleting thread could be preempted. Thus it is still possible for concurrent threads to optimistically set their hazard pointer to it (even though they are not allowed to access it) (A-2).
Therefore, the deleting thread might detect a spurious hazard, preventing it from deleting the node right away, even though none of the other threads will access the node anymore. This will simply delay deletion of the node in the same way a legitimate hazard would.
The important point is that the node will still be deleted eventually.
Potential Races affecting the accessing thread
An accessing thread may get preempted by a deleting thread before verifying that the node has not been potentially removed (A-3). In such a case, it is no longer allowed to access the object.
Note that in case the preemption occurs after A-2, it would even be safe for the accessing thread to access the node (since there was a hazard pointer pointing to the node throughout), but since it is impossible for the accessing thread to distinguish this case, it must fail spuriously.
The important point is that a node will only ever be accessed if it has not been deleted.
A thread that wants to delete an object will first check whether any hazard pointers are set to point to that object.
Here is the problem. 'delete' actually is two-phase operation:
remove from a container or any other public structure. Generally speaking, unpublish it.
deallocate the memory
So, the iteration through the hazard pointers must go between them to prevent the situation you described as:
another thread sets the hazard pointer at i to the object that the deleting thread is currently trying to delete
because there must be no way for another thread to acquire the object being deleted.
I had implemented a few methods that were being handled by individual background threads. I understand the complexity of doing things this way but when I tested the results it all seemed fine. Each thread accesses the same variables at times and there is maximum of 5 threads working at any given time and I guess I should have used synchlock but my question is whether there can be any way the threads could have been executing the processes without overwriting the variable contents. I was under the impression that each thread was allocated a site in memory for that variable and even though it is named the same, in memory it is a different location mapped with a specific thread, right? so if there were collisions you should be getting an error that it cannot access that variable if it were used by another thread.
Am I wrong on this?
If you are talking about local variables of a function - no, each thread has its own copy of those on its stack.
If you are talking about member variables of a class being accessed from different threads - yes, you need to protect them (unless they are read-only)