How to make atomic exchange -- Scala way? - multithreading

Problem
I have such code
var ls = src.iter.toList
src.iter = ls.iterator
(this is part of copy constructor of my iterator-wrapper) which reads the source iterator, and in next line set it back. The problem is, those two lines have to be atomic (especially if you consider that I change the source of copy constructor -- I don't like it, but well...).
I've read about Actors but I don't see how they fit here -- they look more like a mechanism for asynchronous execution. I've read about Java solutions and using them in Scala, for example: http://naedyr.blogspot.com/2011/03/atomic-scala.html
My question is: what is the most Scala way to make some operations atomic? I don't want to use some heavy artillery for this, and also I would not like to use some external resources. In other words -- something that looks and feels "right".
I kind like the solution presented in the above link, because this is what I exactly do -- exchange references. And if I understand correctly, I would guard only those 2 lines, and other code does not have to be altered! But I will wait for definitive answer.
Background
Because every Nth question, instead of answer I read "but why do you use...", here:
How to copy iterator in Scala? :-)
I need to copy iterator (make a fork) and such solution is the most "right" I read about. The problem is, it destroys the original iterator.
Solutions
Locks
For example here:
http://www.ibm.com/developerworks/java/library/j-scala02049/index.html
The only problem I see here, that I have to put lock on those two lines, and every other usage on iter. It is minor thing now, but when I add some code, it is easy to forget to add additional lock.
I am not saying "no", but I have no experience, so I would like to get answer from someone who is familiar with Scala, to point a direction -- which solution is the best for such task, and in long-run.
Immutable iterator
While I appreciate the explanation by Paradigmatic, I don't see how such approach fits my problem. The thing is IteratorWrapper class has to wrap iterator -- i.e. raw iterator should be hidden within the class (usually it is done by making it private). Such methods as hasNext() and next() should be wrapped as well. Normally next() alters the state of the object (iterator) so in case of immutable IteratorWrapper it should return both new IteratorWrapper and status of next() (successful or not). Another solution would be returning NULL if raw next() fails, anyway, this makes using such IteratorWrapper not very handy.
Worse, there is still not easy way to copy such IteratorWrapper.
So either I miss something, or actually classic approach with making piece of code atomic is cleaner. Because all the burden is contained inside the class, and the user does not have to pay the price of they way IteratorWrapper handles the data (raw iterator in this case).

Scala approach is to favor immutability whenever it is possible (and it's very often possible). Then you do not need anymore copy constructors, locks, mutex, etc.
For example, you can convert the iterator to a List at object construction. Since lists are immutable, you can safely share them without having to lock:
class IteratorWrapper[A]( iter: Iterator[A] ) {
val list = iter.toList
def iteratorCopy = list.iterator
}
Here, the IteratorWrapper is also immutable. You can safely pass it around. But if you really need to change the wrapped iterator, you will need more demanding approaches. For instance you could:
Use locks
Transform the wrapper into an Actor
Use STM (akka or other implementations).
Clarifications: I lack information on your problem constraints. But here is how I understand it.
Several threads must traverse simultaneously an Iterator. A possible approach is to copy it before passing the reference to the threads. However, Scala practice aims at sharing immutable objects that do not need to be copied.
With the copy strategy, you would write something like:
//A single iterator producer
class Producer {
val iterator: Iterator[Foo] = produceIterator(...)
}
//Several consumers, living on different threads
class Consumer( p: Producer ) {
def consumeIterator = {
val iteratorCopy = copy( p.iterator ) //BROKEN !!!
while( iteratorCopy.hasNext ) {
doSomething( iteratorCopy.next )
}
}
}
However, it is difficult (or slow) to implement a copy method which is thread-safe. A possible solution using immutability will be:
class Producer {
val lst: List[Foo] = produceIterator(...).toList
def iteratorCopy = list.iterator
}
class Consumer( p: Producer ) {
def consumeIterator = {
val iteratorCopy = p.iteratorCopy
while( iteratorCopy.hasNext ) {
doSomething( iteratorCopy.next )
}
}
}
The producer will call produceIterator once at construction. It it immutable because its state is only a list which is also immutable. The iteratorCopy is also thread-safe, because the list is not modified when creating the copy (so several thread can traverse it simultaneously without having to lock).
Note that calling list.iterator does not traverse the list. So it will not decrease performances in any way (as opposed to really copying the iterator each time).

Related

Is there an alternative to Arc<> wrapping for long-running threads which share data?

I have some threads, which are long running, they are fed by a Deque, which has data pushed into it by another long running thread. Currently, I'm using std::thread::spawn, and have to wrap the Deque in an Arc<> to share it between the threads. If I use &deque, I run into the classic 'static lifetime issue, hence the Arc<>. I've looked at scoped threads, however, the closure which the threads run it won't return for a very long time, so I don't think that will work for this case. Is anyone aware of an alternative solution -- short of using Unsafe? I'm not satisfied with the Arc<> solution. Each time I touch the Deque the code digs into Arc<>'s inner to get to the Deque, incurring overhead which I'd like to avoid. I've also considered making the Deque static, however it would need to be a lazy static due to the allocation restriction on static, and that comes with its own access overhead.
You can get a &Dequq out of the Arc<Deque> just once at the beginning of your long-running thread and keep using that immutable reference throughout its life. Something like this:
let dq: Arc<Deque<T>> = ....;
....
{
let dq2 = Arc::clone(dq);
thread.spawn(move || {
let dq_ref: &Deque<T> = *dq2;
// long-running calculation using dq_ref
// dq2 is dropped
});
}

Confused about safe publishing and visibility in Java, especially with Immutable Objects

When I read the Java Concurrency in Practice by Brian Goetz, I recall him saying "Immutable objects, on the other hand, can be safely accessed even when synchronization is not used to publish the object reference" in the chapter about visibility.
I thought that this implies that if you publish an immutable object, all fields(mutable final references included) are visible to other threads that might make use of them and at least up to date to when that object finished construction.
Now, I read in https://www.cs.umd.edu/~pugh/java/memoryModel/jsr-133-faq.html that
"Now, having said all of this, if, after a thread constructs an immutable object (that is, an object that only contains final fields), you want to ensure that it is seen correctly by all of the other thread, you still typically need to use synchronization. There is no other way to ensure, for example, that the reference to the immutable object will be seen by the second thread. The guarantees the program gets from final fields should be carefully tempered with a deep and careful understanding of how concurrency is managed in your code."
They seem to contradict each other and I am not sure which to believe.
I have also read that if all fields are final then we can ensure safe publication even if the object is not per say immutable.
For example, I always thought that this code in Brian Goetz's concurrency in practice was fine when publishing an object of this class due to this guarantee.
#ThreadSafe
public class MonitorVehicleTracker {
#GuardedBy("this")
private final Map<String, MutablePoint> locations;
public MonitorVehicleTracker(
Map<String, MutablePoint> locations) {
this.locations = deepCopy(locations);
}
public synchronized Map<String, MutablePoint> getLocations() {
return deepCopy(locations);
}
public synchronized MutablePoint getLocation(String id) {
MutablePoint loc = locations.get(id);
return loc == null ? null : new MutablePoint(loc);
}
public synchronized void setLocation(String id, int x, int y) {
MutablePoint loc = locations.get(id);
if (loc == null)
throw new IllegalArgumentException("No such ID: " + id);
loc.x = x;
loc.y = y;
}
private static Map<String, MutablePoint> deepCopy(
Map<String, MutablePoint> m) {
Map<String, MutablePoint> result =
new HashMap<String, MutablePoint>();
for (String id : m.keySet())
result.put(id, new MutablePoint(m.get(id)));
return Collections.unmodifiableMap(result);
}
}
public class MutablePoint { /* Listing 4.5 */ }
For example, in this code example, what if that final guarantee is false and a thread made an instance of this class and then the reference to that object is not null, but the field locations is null at the time another thread uses that class?
Once again, I don't know which is correct or if I happened to misinterpret both the article or Goetz
This question has been answered a few times before but I feel that many of those answers are inadequate. See:
https://stackoverflow.com/a/14617582
https://stackoverflow.com/a/35169705
https://stackoverflow.com/a/7887675
Effectively Immutable Object
etc...
In short, Goetz's statement in the linked JSR 133 FAQ page is more "correct", although not in the way that you are thinking.
When Goetz says that immutable objects are safe to use even when published without synchronization, he means to say that immutable objects that are visible to different threads are guaranteed to retain their original state/invariants, all else remaining the same. In other words, properly synchronized publication is not necessary to maintain state consistency.
In the JSR-133 FAQ, when he says that:
you want to ensure that it is seen correctly by all of the other thread (sic)
He is not referring to the state of the immutable object. He means that you must synchronize publication in order for another thread to see the reference to the immutable object. There's a subtle difference to what the two statements are talking about: while JCIP is referring to state consistency, the FAQ page is referring to access to a reference of an immutable object.
The code sample you provided has nothing, really, to do with anything that Goetz says here, but to answer your question, a correctly initializedfinal field will hold its expected value if the object is properly initialized (beware the difference between initialization and publication). The code sample also synchronizes access to the locations field so as to ensure updates to the final field are thread-safe.
In fact, to elaborate further, I suggest that you look at JCIP listing 3.13 (VolatileCachedFactorizer). Notice that even though OneValueCache is immutable, that it is stored in a volatile field. To illustrate the FAQ statement, VolatileCachedFactorizer will not work correctly without volatile. "Synchronization" is referring to using a volatile field in order to ensure that updates made to it are visible to other threads.
A good way to illustrate the first JCIP statement is to remove volatile. In this case, the CachedFactorizer won't work. Consider this: what if one thread set a new cache value, but another thread tried to read the value and the field was not volatile? The reader might not see the updated OneValueCache. BUT, recalling that Goetz refers to the state of the immutable object, IF the reader thread happened to see an up-to-date instance of OneValueCache stored at cache, then the state of that instance would be visible and correctly constructed.
So although it is possible to lose updates to cache, it is impossible to lose the state of the OneValueCache if it is read, because it is immutable. I suggest reading the accompanying text stating that "volatile reference used to ensure timely visibility."
As a final example, consider a singleton that uses FinalWrapper for thread safety. Note that FinalWrapper is effectively immutable (depending on whether the singleton is mutable), and that the helperWrapper field is in fact non-volatile. Recalling the second FAQ statement, that synchronization is required for access the reference, how can this "correct" implementation possibly be correct!?
In fact, it is possible to do this here because it is not necessary for threads to immediately see the up-to-date value for helperWrapper. If the value that is held by helperWrapper is non-null, then great! Our first JCIP statement guarantees that the state of FinalWrapper is consistent, and that we have a fully initialized Foo singleton that can be readily returned. If the value is actually null, there are 2 possibilities: firstly, it is possible that it is the first call and it has not been initialized; secondly, it could just be a stale value.
In the case that it is the first call, the field itself is checked again in a synchronized context, as suggested by the second FAQ statement. It will find that this value is still null, and will initialize a new FinalWrapper and publish with synchronization.
In the case that it is just a stale value, by entering the synchronized block, the thread can setup a happens-before order with a preceding write to the field. By definition, if a value is stale, then some writer has already written to the helperWrapper field, and that the current thread just has not seen it yet. By entering into the synchronized block, a happens-before relationship is established with that previous write, since according to our first scenario, a truly uninitialized helperWrapper will be initialized by the same lock. Therefore, it can recover by rereading once the method has entered a synchronized context and obtain the most up-to-date, non-null value.
I hope that my explanations and the accompanying examples that I have given will clear things up for you.

Use ArrayBuffer in sequentially executed Threads?

I have two Futures, the second of which starts after the first ended. Both write to the same ArrayBuffer instance, but since they are executed serially (not at the same time), I consider them not acting concurrently.
However, I know there is the #volatile annotation for variables shared among two or more threads (#volatile disables caching).
Since after the first thread finishes, inside the ArrayBuffer instance, there might be some caching going on that makes it impossible for the second thread to see the ArrayBuffer's real state: I am not sure whether it is safe to use ArrayBuffer this way.
Is it true that caching might be a problem in my situation, and if this is the case: Is there a recommended way to make ArrayBuffer use #volatile internally?
It should be fine iff (if-and-only-if) you propagate it [the array] through the future:
val futureA = Future {
val buf = ArrayBuffer(…)
update(buf)
buf
}
val futureB = futureA map {
buf => moreUpdates(buf); buf
}
futureB foreach println // print the result of the transformations
This is OK from a memory safety point of view because the completion of futureA happens-before the onComplete (virtually all transformations on Future is implemented on top of onComplete) callback is invoked. In this case map.
The problem is not caching, per se, but the fact that an ArrayBuffer is a composite, with several subfields that have to be updated in concert to assure correct operation. You will need to use thread synchronization tools to ensure this.
class ArrayBufferWrapper[T](ab: ArrayBuffer[T]) {
def add(item: T) = {
this.synchronized {
ab.add(item)
}
}
}
By wrapping the ArrayBuffer, the components are properly realized into the current thread, and you ensure thread-safe add operations.
No, it is not safe.
This is exactly the reason why they invented functional programming. If you are using scala anyway, might as well take advantage of the paradigm it offers.
Avoid using mutable structures, or, at least, in the rare cases when you have to use them, do not let them escape the local scope. Then you won't ever have to deal with problems like this. They just will not exist anymore.
Tell us more about what you are trying to do, and i am sure someone will suggest a design or two, not involving two threads mutating the same structure.

Is the Scala List's cons-operator "::" thread-safe?

Is the cons-operator :: on a given list thread-safe?
For example what happens if 2 threads use the cons-operator on the same list?
val listOne = 1::2::3::Nil
val listTwo = 4::5::Nil
val combinedList = listOne ::: listTwo // thread1
val combinedList2 = listOne ::: 7:8:NIL // thread2 on the same time
To add to the answer that Jim Collins gave:
All operations on immutable data structures are generally thread safe.
The list cons operator does not modify the original list, since every list is immutable.
Instead, it creates a new list that represents the changed state of the list.
Thread synchronization issues only arise when different threads want to change the same data in memory.
This problem is called "shared state".
Immutable objects are stateless, therefore there can be no shared state.
Scala also offers mutable data structures. Look out of the term "mutable" in the package name.
Those have all the same synchronization issues as Java collections have.
By default, if you just type List inside a Scala program, without adding any special import statements, Scala will use an immutable list.
The same goes for Set, Map, etc.
Rule of thumb: If a class does not contain a var statement, and no reference to any other class that contains a var statement, it can be regarded as immutable.
That means it can be passed between threads safely.
(I know that experts can easily construct exceptions from this rule, but as long as you know nothing else, this is a pretty good rule to go by, edge cases set aside.)
It is thread safe. In your example both threads would execute all the statements.

Is this a safe version of double-checked locking?

Slightly modified version of canonical broken double-checked locking from Wikipedia:
class Foo {
private Helper helper = null;
public Helper getHelper() {
if (helper == null) {
synchronized(this) {
if (helper == null) {
// Create new Helper instance and store reference on
// stack so other threads can't see it.
Helper myHelper = new Helper();
// Atomically publish this instance.
atomicSet(helper, myHelper);
}
}
}
return helper;
}
}
Does simply making the publishing of the newly created Helper instance atomic make this double checked locking idiom safe, assuming that the underlying atomic ops library works properly? I realize that in Java, one could just use volatile, but even though the example is in pseudo-Java, this is supposed to be a language-agnostic question.
See also:
Double checked locking Article
It entirely depends on the exact memory model of your platform/language.
My rule of thumb: just don't do it. Lock-free (or reduced lock, in this case) programming is hard and shouldn't be attempted unless you're a threading ninja. You should only even contemplate it when you've got profiling proof that you really need it, and in that case you get the absolute best and most recent book on threading for that particular platform and see if it can help you.
I don't think you can answer the question in a language-agnostic fashion without getting away from code completely. It all depends on how synchronized and atomicSet work in your pseudocode.
The answer is language dependent - it comes down to the guarantees provided by atomicSet().
If the construction of myHelper can be spread out after the atomicSet() then it doesn't matter how the variable is assigned to the shared state.
i.e.
// Create new Helper instance and store reference on
// stack so other threads can't see it.
Helper myHelper = new Helper(); // ALLOCATE MEMORY HERE BUT DON'T INITIALISE
// Atomically publish this instance.
atomicSet(helper, myHelper); // ATOMICALLY POINT UNINITIALISED MEMORY from helper
// other thread gets run at this time and tries to use helper object
// AT THE PROGRAMS LEISURE INITIALISE Helper object.
If this is allowed by the language then the double checking will not work.
Using volatile would not prevent a multiple instantiations - however using the synchronize will prevent multiple instances being created. However with your code it is possible that helper is returned before it has been setup (thread 'A' instantiates it, but before it is setup thread 'B' comes along, helper is non-null and so returns it straight away. To fix that problem, remove the first if (helper == null).
Most likely it is broken, because the problem of a partially constructed object is not addressed.
To all the people worried about a partially constructed object:
As far as I understand, the problem of partially constructed objects is only a problem within constructors. In other words, within a constructor, if an object references itself (including it's subclass) or it's members, then there are possible issues with partial construction. Otherwise, when a constructor returns, the class is fully constructed.
I think you are confusing partial construction with the different problem of how the compiler optimizes the writes. The compiler can choose to A) allocate the memory for the new Helper object, B) write the address to myHelper (the local stack variable), and then C) invoke any constructor initialization. Anytime after point B and before point C, accessing myHelper would be a problem.
It is this compiler optimization of the writes, not partial construction that the cited papers are concerned with. In the original single-check lock solution, optimized writes can allow multiple threads to see the member variable between points B and C. This implementation avoids the write optimization issue by using a local stack variable.
The main scope of the cited papers is to describe the various problems with the double-check lock solution. However, unless the atomicSet method is also synchronizing against the Foo class, this solution is not a double-check lock solution. It is using multiple locks.
I would say this all comes down to the implementation of the atomic assignment function. The function needs to be truly atomic, it needs to guarantee that processor local memory caches are synchronized, and it needs to do all this at a lower cost than simply always synchronizing the getHelper method.
Based on the cited paper, in Java, it is unlikely to meet all these requirements. Also, something that should be very clear from the paper is that Java's memory model changes frequently. It adapts as better understanding of caching, garbage collection, etc. evolve, as well as adapting to changes in the underlying real processor architecture that the VM runs on.
As a rule of thumb, if you optimize your Java code in a way that depends on the underlying implementation, as opposed to the API, you run the risk of having broken code in the next release of the JVM. (Although, sometimes you will have no choice.)
dsimcha:
If your atomicSet method is real, then I would try sending your question to Doug Lea (along with your atomicSet implementation). I have a feeling he's the kind of guy that would answer. I'm guessing that for Java he will tell you that it's cheaper to always synchronize and to look to optimize somewhere else.

Resources