Related
I'm having trouble figuring out a key point in wait-free algorithm design. Suppose a data structure has a pointer to another data structure (e.g. linked list, tree, etc), how can the right time for releasing a data structure?
The problem is this, there are separate operations that can't be executed atomically without a lock. For example one thread reads the pointer to some memory, and increments the use count for that memory to prevent free while this thread is using the data, which might take long, and even if it doesn't, it's a race condition. What prevents another thread from reading the pointer, decrementing the use count and determining that it's no longer used and freeing it before the first thread incremented the use count?
The main issue is that current CPUs only have a single word CAS (compare & swap). Alternatively the problem is that I'm clueless about waitfree algorithms and data structures and after reading some papers I'm still not seeing the light.
IMHO Garbage collection can't be the answer, because it would either GC would have to be prevented from running if any single thread is inside an atomic block (which would mean it can't be guaranteed that the GC will ever run again) or the problem is simply pushed to the GC, in which case, please explain how the GC would figure out if the data is in the silly state (a pointer is read [e.g. stored in a local variable] but the the use count didn't increment yet).
PS, references to advanced tutorials on wait-free algorithms for morons are welcome.
Edit: You should assume that the problem is being solved in a non-managed language, like C or C++. After all if it were Java, we'd have no need to worry about releasing memory. Further assume that the compiler may generate code that will store temporary references to objects in registers (invisible to other threads) right before the usage counter increment, and that a thread can be interrupted between loading the object address and incrementing the counter. This of course doesn't mean that the solution must be limited to C or C++, rather that the solution should give a set of primitives that allowing the implementation of wait-free algorithms on linked data structures. I'm interested in the primitives and how they solve the problem of designing wait-free algorithms. With such primitives a wait-free algorithm can be implemented equally well in C++ and Java.
After some research I learned this.
The problem is not trivial to solve and there are several solutions each with advantages and disadvantages. The reason for the complexity comes from inter CPU synchronization issues. If not done right it might appear to work correctly 99.9% of the time, which isn't enough, or it might fail under load.
Three solutions that I found are 1) hazard pointers, 2) quiescence period based reclamation (used by the Linux kernel in the RCU implementation) 3) reference counting techniques. 4) Other 5) Combinations
Hazard pointers work by saving the currently active references in a well-known per thread location, so any thread deciding to free memory (when the counter appears to be zero) can check if the memory is still in use by anyone. An interesting improvement is to buffer request to release memory in a small array and free them up in a batch when the array is full. The advantage of using hazard pointers is that it can actually guarantee an upper bound on unreclaimed memory. The disadvantage is that it places extra burden on the reader.
Quiescence period based reclamation works by delaying the actual release of the memory until it's known that each thread has had a chance to finish working on any data that may need to be released. The way to know that this condition is satisfied is to check if each thread passed through a quiescent period (not in a critical section) after the object was removed. In the Linux kernel this means something like each task making a voluntary task switch. In a user space application it would be the end of a critical section. This can be achieved by a simple counter, each time the counter is even the thread is not in a critical section (reading shared data), each time the counter is odd the thread is inside a critical section, to move from a critical section or back all the thread needs to do is to atomically increment the number. Based on this the "garbage collector" can determine if each thread has had a chance to finish. There are several approaches, one simple one would be to queue up the requests to free memory (e.g. in a linked list or an array), each with the current generation (managed by the GC), when the GC runs it checks the state of the threads (their state counters) to see if each passed to the next generation (their counter is higher than the last time or is the same and even), any memory can be reclaimed one generation after it was freed. The advantage of this approach is that is places the least burden on the reading threads. The disadvantage is that it can't guarantee an upper bound for the memory waiting to be released (e.g. one thread spending 5 minutes in a critical section, while the data keeps changing and memory isn't released), but in practice it works out all right.
There is a number of reference counting solutions, many of them require double compare and swap, which some CPUs don't support, so can't be relied upon. The key problem remains though, taking a reference before updating the counter. I didn't find enough information to explain how this can be done simply and reliably though. So .....
There are of course a number of "Other" solutions, it's a very important topic of research with tons of papers out there. I didn't examine all of them. I only need one.
And of course the various approaches can be combined, for example hazard pointers can solve the problems of reference counting. But there's a nearly infinite number of combinations, and in some cases a spin lock might theoretically break wait-freedom, but doesn't hurt performance in practice. Somewhat like another tidbit I found in my research, it's theoretically not possible to implement wait-free algorithms using compare-and-swap, that's because in theory (purely in theory) a CAS based update might keep failing for non-deterministic excessive times (imagine a million threads on a million cores each trying to increment and decrement the same counter using CAS). In reality however it rarely fails more than a few times (I suspect it's because the CPUs spend more clocks away from CAS than there are CPUs, but I think if the algorithm returned to the same CAS on the same location every 50 clocks and there were 64 cores there could be a chance of a major problem, then again, who knows, I don't have a hundred core machine to try this). Another results of my research is that designing and implementing wait-free algorithms and data-structures is VERY challenging (even if some of the heavy lifting is outsourced, e.g. to a garbage collector [e.g. Java]), and might perform less well than a similar algorithm with carefully placed locks.
So, yeah, it's possible to free memory even without delays. It's just tricky. And if you forget to make the right operations atomic, or to place the right memory barrier, oh, well, you're toast. :-) Thanks everyone for participating.
I think atomic operations for increment/decrement and compare-and-swap would solve this problem.
Idea:
All resources have a counter which is modified with atomic operations. The counter is initially zero.
Before using a resource: "Acquire" it by atomically incrementing its counter. The resource can be used if and only if the incremented value is greater than zero.
After using a resource: "Release" it by atomically decrementing its counter. The resource should be disposed/freed if and only if the decremented value is equal to zero.
Before disposing: Atomically compare-and-swap the counter value with the minimum (negative) value. Dispose will not happen if a concurrent thread "Acquired" the resource in between.
You haven't specified a language for your question. Here goes an example in c#:
class MyResource
{
// Counter is initially zero. Resource will not be disposed until it has
// been acquired and released.
private int _counter;
public bool Acquire()
{
// Atomically increment counter.
int c = Interlocked.Increment(ref _counter);
// Resource is available if the resulting value is greater than zero.
return c > 0;
}
public bool Release()
{
// Atomically decrement counter.
int c = Interlocked.Decrement(ref _counter);
// We should never reach a negative value
Debug.Assert(c >= 0, "Resource was released without being acquired");
// Dispose when we reach zero
if (c == 0)
{
// Mark as disposed by setting counter its minimum value.
// Only do this if the counter remain at zero. Atomic compare-and-swap operation.
if (Interlocked.CompareExchange(ref _counter, int.MinValue, c) == c)
{
// TODO: Run dispose code (free stuff)
return true; // tell caller that resource is disposed
}
}
return false; // released but still in use
}
}
Usage:
// "r" is an instance of MyResource
bool acquired = false;
try
{
if (acquired = r.Acquire())
{
// TODO: Use resource
}
}
finally
{
if (acquired)
{
if (r.Release())
{
// Resource was disposed.
// TODO: Nullify variable or similar to let GC collect it.
}
}
}
I know this is not the best way but it works for me:
for shared dynamic data-structure lists I use usage counter per item
for example:
struct _data
{
DWORD usage;
bool delete;
// here add your data
_data() { usage=0; deleted=true; }
};
const int MAX = 1024;
_data data[MAX];
now when item is started to be used somwhere then
// start use of data[i]
data[i].cnt++;
after is no longer used then
// stop use of data[i]
data[i].cnt--;
if you want to add new item to list then
// add item
for (i=0;i<MAX;i++) // find first deleted item
if (data[i].deleted)
{
data[i].deleted=false;
data[i].cnt=0;
// copy/set your data
break;
}
and now in the background once in a while (on timer or whatever)
scann data[] an all undeleted items with cnt == 0 set as deleted (+ free its dynamic memory if it has any)
[Note]
to avoid multi-thread access problems implement single global lock per data list
and program it so you cannot scann data while any data[i].cnt is changing
one bool and one DWORD suffice for this if you do not want to use OS locks
// globals
bool data_cnt_locked=false;
DWORD data_cnt=0;
now any change of data[i].cnt modify like this:
// start use of data[i]
while (data_cnt_locked) Sleep(1);
data_cnt++;
data[i].cnt++;
data_cnt--;
and modify delete scan like this
while (data_cnt) Sleep(1);
data_cnt_locked=true;
Sleep(1);
if (data_cnt==0) // just to be sure
for (i=0;i<MAX;i++) // here scan for items to delete ...
if (!data[i].cnt)
if (!data[i].deleted)
{
data[i].deleted=true;
data[i].cnt=0;
// release your dynamic data ...
}
data_cnt_locked=false;
PS.
do not forget to play with the sleep times a little to suite your needs
lock free algorithm sleep times are sometimes dependent on OS task/scheduler
this is not really an lock free implementation
because while GC is at work then all is locked
but if ather than that multi access is not blocking to each other
so if you do not run GC too often you are fine
I'm having trouble understanding how any data structure can be "nonblocking".
Say you're making a "nonblocking" hashtable. At some point or another, your hashtable gets too full, so you have to re-hash into a larger table.
This implies you need to allocate memory, which is a global resource. So it seems that you must obtain some sort of lock to prevent global corruption of the heap... irrespective of possible problems with your data structure itself!
But then that means every other thread must block while you allocate your memory...
What am I missing here?
(How) can you allocate memory without blocking another thread which is doing the same?
Two examples for non blocking designs are optimistic design and Transactional Memory.
The idea of this is - in most of the cases, the blocking is redundant - since two OPs can concurrently occur without interrupting each other. However, sometimes when 2 OPs occur concurrently and the data becomes corrupted because of it - you can roll back to your previous state, and retry.
There might still be locks in these designs, but the time the data is locked is significantly shorter, and is limited only to the critical time where the affect of the OP is taking place.
Just for some definitions, additional information and to distinguish between non-blocking, lock-free and wait-free terms, I recommend reading the following article (I won't copy the relevant passages here as it's too long):
Definitions of Non-blocking, Lock-free and Wait-free
Most strategies have one fundamental pattern in common. They use a compare and swap (CAS) operation in a loop until it succeeds.
For example, lets consider a stack implemented with a linked list. I chose a linked list implementation because it is easy to make concurrent with a CAS, but there are other ways to do it. I will use C-like pseudocode.
Push(T item)
{
Node node = new Node(); // allocate node memory
Node initial;
do
{
initial = head;
node.Value = item;
node.Next = initial;
}
while (CompareAndSwap(head, node, initial) != initial);
}
Pop()
{
Node node;
Node initial;
do
{
initial = head;
node = initial.Next;
}
while (CompareAndSwap(head, node, initial) != initial);
T value = initial.Value;
delete initial; // deallocate node memory
return value;
}
In the above code CompareAndSwap is a non-blocking atomic operation that replaces the value in a memory address with a new value and returns the old value. If the old value does not match the expected value then you spin through the loop and try it all again.
All that non-blocking means is that you never wait indefinitely, not that you never wait at all. As long as your heap is also implemented using a non-blocking algorithm, you can implement other non-blocking algorithms on top of it.
Recently I tried to Access a textbox from a thread (other than the UI thread) and an exception was thrown. It said something about the "code not being thread safe" and so I ended up writing a delegate (sample from MSDN helped) and calling it instead.
But even so I didn't quite understand why all the extra code was necessary.
Update:
Will I run into any serious problems if I check
Controls.CheckForIllegalCrossThread..blah =true
Eric Lippert has a nice blog post entitled What is this thing you call "thread safe"? about the definition of thread safety as found of Wikipedia.
3 important things extracted from the links :
“A piece of code is thread-safe if it functions correctly during
simultaneous execution by multiple threads.”
“In particular, it must satisfy the need for multiple threads to
access the same shared data, …”
“…and the need for a shared piece of data to be accessed by only one
thread at any given time.”
Definitely worth a read!
In the simplest of terms threadsafe means that it is safe to be accessed from multiple threads. When you are using multiple threads in a program and they are each attempting to access a common data structure or location in memory several bad things can happen. So, you add some extra code to prevent those bad things. For example, if two people were writing the same document at the same time, the second person to save will overwrite the work of the first person. To make it thread safe then, you have to force person 2 to wait for person 1 to complete their task before allowing person 2 to edit the document.
Wikipedia has an article on Thread Safety.
This definitions page (you have to skip an ad - sorry) defines it thus:
In computer programming, thread-safe describes a program portion or routine that can be called from multiple programming threads without unwanted interaction between the threads.
A thread is an execution path of a program. A single threaded program will only have one thread and so this problem doesn't arise. Virtually all GUI programs have multiple execution paths and hence threads - there are at least two, one for processing the display of the GUI and handing user input, and at least one other for actually performing the operations of the program.
This is done so that the UI is still responsive while the program is working by offloading any long running process to any non-UI threads. These threads may be created once and exist for the lifetime of the program, or just get created when needed and destroyed when they've finished.
As these threads will often need to perform common actions - disk i/o, outputting results to the screen etc. - these parts of the code will need to be written in such a way that they can handle being called from multiple threads, often at the same time. This will involve things like:
Working on copies of data
Adding locks around the critical code
Opening files in the appropriate mode - so if reading, don't open the file for write as well.
Coping with not having access to resources because they're locked by other threads/processes.
Simply, thread-safe means that a method or class instance can be used by multiple threads at the same time without any problems occurring.
Consider the following method:
private int myInt = 0;
public int AddOne()
{
int tmp = myInt;
tmp = tmp + 1;
myInt = tmp;
return tmp;
}
Now thread A and thread B both would like to execute AddOne(). but A starts first and reads the value of myInt (0) into tmp. Now for some reason, the scheduler decides to halt thread A and defer execution to thread B. Thread B now also reads the value of myInt (still 0) into it's own variable tmp. Thread B finishes the entire method so in the end myInt = 1. And 1 is returned. Now it's Thread A's turn again. Thread A continues. And adds 1 to tmp (tmp was 0 for thread A). And then saves this value in myInt. myInt is again 1.
So in this case the method AddOne() was called two times, but because the method was not implemented in a thread-safe way the value of myInt is not 2, as expected, but 1 because the second thread read the variable myInt before the first thread finished updating it.
Creating thread-safe methods is very hard in non-trivial cases. And there are quite a few techniques. In Java you can mark a method as synchronized, this means that only one thread can execute that method at a given time. The other threads wait in line. This makes a method thread-safe, but if there is a lot of work to be done in a method, then this wastes a lot of space. Another technique is to 'mark only a small part of a method as synchronized' by creating a lock or semaphore, and locking this small part (usually called the critical section). There are even some methods that are implemented as lock-less thread-safe, which means that they are built in such a way that multiple threads can race through them at the same time without ever causing problems, this can be the case when a method only executes one atomic call. Atomic calls are calls that can't be interrupted and can only be done by one thread at a time.
In real world example for the layman is
Let's suppose you have a bank account with the internet and mobile banking and your account have only $10.
You performed transfer balance to another account using mobile banking, and the meantime, you did online shopping using the same bank account.
If this bank account is not threadsafe, then the bank allows you to perform two transactions at the same time and then the bank will become bankrupt.
Threadsafe means that an object's state doesn't change if simultaneously multiple threads try to access the object.
You can get more explanation from the book "Java Concurrency in Practice":
A class is thread‐safe if it behaves correctly when accessed from multiple threads, regardless of the scheduling or interleaving of the execution of those threads by the runtime environment, and with no additional synchronization or other coordination on the part of the calling code.
A module is thread-safe if it guarantees it can maintain its invariants in the face of multi-threaded and concurrence use.
Here, a module can be a data-structure, class, object, method/procedure or function. Basically scoped piece of code and related data.
The guarantee can potentially be limited to certain environments such as a specific CPU architecture, but must hold for those environments. If there is no explicit delimitation of environments, then it is usually taken to imply that it holds for all environments that the code can be compiled and executed.
Thread-unsafe modules may function correctly under mutli-threaded and concurrent use, but this is often more down to luck and coincidence, than careful design. Even if some module does not break for you under, it may break when moved to other environments.
Multi-threading bugs are often hard to debug. Some of them only happen occasionally, while others manifest aggressively - this too, can be environment specific. They can manifest as subtly wrong results, or deadlocks. They can mess up data-structures in unpredictable ways, and cause other seemingly impossible bugs to appear in other remote parts of the code. It can be very application specific, so it is hard to give a general description.
Thread safety: A thread safe program protects it's data from memory consistency errors. In a highly multi-threaded program, a thread safe program does not cause any side effects with multiple read/write operations from multiple threads on same objects. Different threads can share and modify object data without consistency errors.
You can achieve thread safety by using advanced concurrency API. This documentation page provides good programming constructs to achieve thread safety.
Lock Objects support locking idioms that simplify many concurrent applications.
Executors define a high-level API for launching and managing threads. Executor implementations provided by java.util.concurrent provide thread pool management suitable for large-scale applications.
Concurrent Collections make it easier to manage large collections of data, and can greatly reduce the need for synchronization.
Atomic Variables have features that minimize synchronization and help avoid memory consistency errors.
ThreadLocalRandom (in JDK 7) provides efficient generation of pseudorandom numbers from multiple threads.
Refer to java.util.concurrent and java.util.concurrent.atomic packages too for other programming constructs.
Producing Thread-safe code is all about managing access to shared mutable states. When mutable states are published or shared between threads, they need to be synchronized to avoid bugs like race conditions and memory consistency errors.
I recently wrote a blog about thread safety. You can read it for more information.
You are clearly working in a WinForms environment. WinForms controls exhibit thread affinity, which means that the thread in which they are created is the only thread that can be used to access and update them. That is why you will find examples on MSDN and elsewhere demonstrating how to marshall the call back onto the main thread.
Normal WinForms practice is to have a single thread that is dedicated to all your UI work.
I find the concept of http://en.wikipedia.org/wiki/Reentrancy_%28computing%29 to be what I usually think of as unsafe threading which is when a method has and relies on a side effect such as a global variable.
For example I have seen code that formatted floating point numbers to string, if two of these are run in different threads the global value of decimalSeparator can be permanently changed to '.'
//built in global set to locale specific value (here a comma)
decimalSeparator = ','
function FormatDot(value : real):
//save the current decimal character
temp = decimalSeparator
//set the global value to be
decimalSeparator = '.'
//format() uses decimalSeparator behind the scenes
result = format(value)
//Put the original value back
decimalSeparator = temp
To understand thread safety, read below sections:
4.3.1. Example: Vehicle Tracker Using Delegation
As a more substantial example of delegation, let's construct a version of the vehicle tracker that delegates to a thread-safe class. We store the locations in a Map, so we start with a thread-safe Map implementation, ConcurrentHashMap. We also store the location using an immutable Point class instead of MutablePoint, shown in Listing 4.6.
Listing 4.6. Immutable Point class used by DelegatingVehicleTracker.
class Point{
public final int x, y;
public Point() {
this.x=0; this.y=0;
}
public Point(int x, int y) {
this.x = x;
this.y = y;
}
}
Point is thread-safe because it is immutable. Immutable values can be freely shared and published, so we no longer need to copy the locations when returning them.
DelegatingVehicleTracker in Listing 4.7 does not use any explicit synchronization; all access to state is managed by ConcurrentHashMap, and all the keys and values of the Map are immutable.
Listing 4.7. Delegating Thread Safety to a ConcurrentHashMap.
public class DelegatingVehicleTracker {
private final ConcurrentMap<String, Point> locations;
private final Map<String, Point> unmodifiableMap;
public DelegatingVehicleTracker(Map<String, Point> points) {
this.locations = new ConcurrentHashMap<String, Point>(points);
this.unmodifiableMap = Collections.unmodifiableMap(locations);
}
public Map<String, Point> getLocations(){
return this.unmodifiableMap; // User cannot update point(x,y) as Point is immutable
}
public Point getLocation(String id) {
return locations.get(id);
}
public void setLocation(String id, int x, int y) {
if(locations.replace(id, new Point(x, y)) == null) {
throw new IllegalArgumentException("invalid vehicle name: " + id);
}
}
}
If we had used the original MutablePoint class instead of Point, we would be breaking encapsulation by letting getLocations publish a reference to mutable state that is not thread-safe. Notice that we've changed the behavior of the vehicle tracker class slightly; while the monitor version returned a snapshot of the locations, the delegating version returns an unmodifiable but “live” view of the vehicle locations. This means that if thread A calls getLocations and thread B later modifies the location of some of the points, those changes are reflected in the Map returned to thread A.
4.3.2. Independent State Variables
We can also delegate thread safety to more than one underlying state variable as long as those underlying state variables are independent, meaning that the composite class does not impose any invariants involving the multiple state variables.
VisualComponent in Listing 4.9 is a graphical component that allows clients to register listeners for mouse and keystroke events. It maintains a list of registered listeners of each type, so that when an event occurs the appropriate listeners can be invoked. But there is no relationship between the set of mouse listeners and key listeners; the two are independent, and therefore VisualComponent can delegate its thread safety obligations to two underlying thread-safe lists.
Listing 4.9. Delegating Thread Safety to Multiple Underlying State Variables.
public class VisualComponent {
private final List<KeyListener> keyListeners
= new CopyOnWriteArrayList<KeyListener>();
private final List<MouseListener> mouseListeners
= new CopyOnWriteArrayList<MouseListener>();
public void addKeyListener(KeyListener listener) {
keyListeners.add(listener);
}
public void addMouseListener(MouseListener listener) {
mouseListeners.add(listener);
}
public void removeKeyListener(KeyListener listener) {
keyListeners.remove(listener);
}
public void removeMouseListener(MouseListener listener) {
mouseListeners.remove(listener);
}
}
VisualComponent uses a CopyOnWriteArrayList to store each listener list; this is a thread-safe List implementation particularly suited for managing listener lists (see Section 5.2.3). Each List is thread-safe, and because there are no constraints coupling the state of one to the state of the other, VisualComponent can delegate its thread safety responsibilities to the underlying mouseListeners and keyListeners objects.
4.3.3. When Delegation Fails
Most composite classes are not as simple as VisualComponent: they have invariants that relate their component state variables. NumberRange in Listing 4.10 uses two AtomicIntegers to manage its state, but imposes an additional constraint—that the first number be less than or equal to the second.
Listing 4.10. Number Range Class that does Not Sufficiently Protect Its Invariants. Don't do this.
public class NumberRange {
// INVARIANT: lower <= upper
private final AtomicInteger lower = new AtomicInteger(0);
private final AtomicInteger upper = new AtomicInteger(0);
public void setLower(int i) {
//Warning - unsafe check-then-act
if(i > upper.get()) {
throw new IllegalArgumentException(
"Can't set lower to " + i + " > upper ");
}
lower.set(i);
}
public void setUpper(int i) {
//Warning - unsafe check-then-act
if(i < lower.get()) {
throw new IllegalArgumentException(
"Can't set upper to " + i + " < lower ");
}
upper.set(i);
}
public boolean isInRange(int i){
return (i >= lower.get() && i <= upper.get());
}
}
NumberRange is not thread-safe; it does not preserve the invariant that constrains lower and upper. The setLower and setUpper methods attempt to respect this invariant, but do so poorly. Both setLower and setUpper are check-then-act sequences, but they do not use sufficient locking to make them atomic. If the number range holds (0, 10), and one thread calls setLower(5) while another thread calls setUpper(4), with some unlucky timing both will pass the checks in the setters and both modifications will be applied. The result is that the range now holds (5, 4)—an invalid state. So while the underlying AtomicIntegers are thread-safe, the composite class is not. Because the underlying state variables lower and upper are not independent, NumberRange cannot simply delegate thread safety to its thread-safe state variables.
NumberRange could be made thread-safe by using locking to maintain its invariants, such as guarding lower and upper with a common lock. It must also avoid publishing lower and upper to prevent clients from subverting its invariants.
If a class has compound actions, as NumberRange does, delegation alone is again not a suitable approach for thread safety. In these cases, the class must provide its own locking to ensure that compound actions are atomic, unless the entire compound action can also be delegated to the underlying state variables.
If a class is composed of multiple independent thread-safe state variables and has no operations that have any invalid state transitions, then it can delegate thread safety to the underlying state variables.
I'm attempting to learn Clojure from the API and documentation available on the site. I'm a bit unclear about mutable storage in Clojure and I want to make sure my understanding is correct. Please let me know if there are any ideas that I've gotten wrong.
Edit: I'm updating this as I receive comments on its correctness.
Disclaimer: All of this information is informal and potentially wrong. Do not use this post for gaining an understanding of how Clojure works.
Vars always contain a root binding and possibly a per-thread binding. They are comparable to regular variables in imperative languages and are not suited for sharing information between threads. (thanks Arthur Ulfeldt)
Refs are locations shared between threads that support atomic transactions that can change the state of any number of refs in a single transaction. Transactions are committed upon exiting sync expressions (dosync) and conflicts are resolved automatically with STM magic (rollbacks, queues, waits, etc.)
Agents are locations that enable information to be asynchronously shared between threads with minimal overhead by dispatching independent action functions to change the agent's state. Agents are returned immediately and are therefore non-blocking, although an agent's value isn't set until a dispatched function has completed.
Atoms are locations that can be synchronously shared between threads. They support safe manipulation between different threads.
Here's my friendly summary based on when to use these structures:
Vars are like regular old variables in imperative languages. (avoid when possible)
Atoms are like Vars but with thread-sharing safety that allows for immediate reading and safe setting. (thanks Martin)
An Agent is like an Atom but rather than blocking it spawns a new thread to calculate its value, only blocks if in the middle of changing a value, and can let other threads know that it's finished assigning.
Refs are shared locations that lock themselves in transactions. Instead of making the programmer decide what happens during race conditions for every piece of locked code, we just start up a transaction and let Clojure handle all the lock conditions between the refs in that transaction.
Also, a related concept is the function future. To me, it seems like a future object can be described as a synchronous Agent where the value can't be accessed at all until the calculation is completed. It can also be described as a non-blocking Atom. Are these accurate conceptions of future?
It sounds like you are really getting Clojure! good job :)
Vars have a "root binding" visible in all threads and each individual thread can change the value it sees with out affecting the other threads. If my understanding is correct a var cannot exist in just one thread with out a root binding that is visible to all and it cant be "rebound" until it has been defined with (def ... ) the first time.
Refs are committed at the end of the (dosync ... ) transaction that encloses the changes but only when the transaction was able to finish in a consistent state.
I think your conclusion about Atoms is wrong:
Atoms are like Vars but with thread-sharing safety that blocks until the value has changed
Atoms are changed with swap! or low-level with compare-and-set!. This never blocks anything. swap! works like a transaction with just one ref:
the old value is taken from the atom and stored thread-local
the function is applied to the old value to generate a new value
if this succeeds compare-and-set is called with old and new value; only if the value of the atom has not been changed by any other thread (still equals old value), the new value is written, otherwise the operation restarts at (1) until is succeeds eventually.
I've found two issues with your question.
You say:
If an agent is accessed while an action is occurring then the value isn't returned until the action has finished
http://clojure.org/agents says:
the state of an Agent is always immediately available for reading by any thread
I.e. you never have to wait to get the value of an agent (I assume the value changed by an action is proxied and changed atomically).
The code for the deref-method of an Agent looks like this (SVN revision 1382):
public Object deref() throws Exception{
if(errors != null)
{
throw new Exception("Agent has errors", (Exception) RT.first(errors));
}
return state;
}
No blocking is involved.
Also, I don't understand what you mean (in your Ref section) by
Transactions are committed on calls to deref
Transactions are committed when all actions of the dosync block have been completed, no exceptions have been thrown and nothing has caused the transaction to be retried. I think deref has nothing to do with it, but maybe I misunderstand your point.
Martin is right when he say that Atoms operation restarts at 1. until is succeeds eventually.
It is also called spin waiting.
While it is note really blocking on a lock the thread that did the operation is blocked until the operation succeeds so it is a blocking operation and not an asynchronously operation.
Also about Futures, Clojure 1.1 has added abstractions for promises and futures.
A promise is a synchronization construct that can be used to deliver a value from one thread to another. Until the value has been delivered, any attempt to dereference the promise will block.
(def a-promise (promise))
(deliver a-promise :fred)
Futures represent asynchronous computations. They are a way to get code to run in another thread, and obtain the result.
(def f (future (some-sexp)))
(deref f) ; blocks the thread that derefs f until value is available
Vars don't always have a root binding. It's legal to create a var without a binding using
(def x)
or
(declare x)
Attempting to evaluate x before it has a value will result in
Var user/x is unbound.
[Thrown class java.lang.IllegalStateException]
Slightly modified version of canonical broken double-checked locking from Wikipedia:
class Foo {
private Helper helper = null;
public Helper getHelper() {
if (helper == null) {
synchronized(this) {
if (helper == null) {
// Create new Helper instance and store reference on
// stack so other threads can't see it.
Helper myHelper = new Helper();
// Atomically publish this instance.
atomicSet(helper, myHelper);
}
}
}
return helper;
}
}
Does simply making the publishing of the newly created Helper instance atomic make this double checked locking idiom safe, assuming that the underlying atomic ops library works properly? I realize that in Java, one could just use volatile, but even though the example is in pseudo-Java, this is supposed to be a language-agnostic question.
See also:
Double checked locking Article
It entirely depends on the exact memory model of your platform/language.
My rule of thumb: just don't do it. Lock-free (or reduced lock, in this case) programming is hard and shouldn't be attempted unless you're a threading ninja. You should only even contemplate it when you've got profiling proof that you really need it, and in that case you get the absolute best and most recent book on threading for that particular platform and see if it can help you.
I don't think you can answer the question in a language-agnostic fashion without getting away from code completely. It all depends on how synchronized and atomicSet work in your pseudocode.
The answer is language dependent - it comes down to the guarantees provided by atomicSet().
If the construction of myHelper can be spread out after the atomicSet() then it doesn't matter how the variable is assigned to the shared state.
i.e.
// Create new Helper instance and store reference on
// stack so other threads can't see it.
Helper myHelper = new Helper(); // ALLOCATE MEMORY HERE BUT DON'T INITIALISE
// Atomically publish this instance.
atomicSet(helper, myHelper); // ATOMICALLY POINT UNINITIALISED MEMORY from helper
// other thread gets run at this time and tries to use helper object
// AT THE PROGRAMS LEISURE INITIALISE Helper object.
If this is allowed by the language then the double checking will not work.
Using volatile would not prevent a multiple instantiations - however using the synchronize will prevent multiple instances being created. However with your code it is possible that helper is returned before it has been setup (thread 'A' instantiates it, but before it is setup thread 'B' comes along, helper is non-null and so returns it straight away. To fix that problem, remove the first if (helper == null).
Most likely it is broken, because the problem of a partially constructed object is not addressed.
To all the people worried about a partially constructed object:
As far as I understand, the problem of partially constructed objects is only a problem within constructors. In other words, within a constructor, if an object references itself (including it's subclass) or it's members, then there are possible issues with partial construction. Otherwise, when a constructor returns, the class is fully constructed.
I think you are confusing partial construction with the different problem of how the compiler optimizes the writes. The compiler can choose to A) allocate the memory for the new Helper object, B) write the address to myHelper (the local stack variable), and then C) invoke any constructor initialization. Anytime after point B and before point C, accessing myHelper would be a problem.
It is this compiler optimization of the writes, not partial construction that the cited papers are concerned with. In the original single-check lock solution, optimized writes can allow multiple threads to see the member variable between points B and C. This implementation avoids the write optimization issue by using a local stack variable.
The main scope of the cited papers is to describe the various problems with the double-check lock solution. However, unless the atomicSet method is also synchronizing against the Foo class, this solution is not a double-check lock solution. It is using multiple locks.
I would say this all comes down to the implementation of the atomic assignment function. The function needs to be truly atomic, it needs to guarantee that processor local memory caches are synchronized, and it needs to do all this at a lower cost than simply always synchronizing the getHelper method.
Based on the cited paper, in Java, it is unlikely to meet all these requirements. Also, something that should be very clear from the paper is that Java's memory model changes frequently. It adapts as better understanding of caching, garbage collection, etc. evolve, as well as adapting to changes in the underlying real processor architecture that the VM runs on.
As a rule of thumb, if you optimize your Java code in a way that depends on the underlying implementation, as opposed to the API, you run the risk of having broken code in the next release of the JVM. (Although, sometimes you will have no choice.)
dsimcha:
If your atomicSet method is real, then I would try sending your question to Doug Lea (along with your atomicSet implementation). I have a feeling he's the kind of guy that would answer. I'm guessing that for Java he will tell you that it's cheaper to always synchronize and to look to optimize somewhere else.