In one of his videos (concerning Scala's lazy evaluation, namely lazy keyword), Martin Odersky shows the following implementation of cons operation used to construct a Stream:
def cons[T](hd: T, tl: => Stream[T]) = new Stream[T] {
def head = hd
lazy val tail = tl
...
}
So tail operation is written concisely using lazy evaluation feature of the language.
But in reality (in Scala 2.11.7), the implementation of tail is a bit less elegant:
#volatile private[this] var tlVal: Stream[A] = _
#volatile private[this] var tlGen = tl _
def tailDefined: Boolean = tlGen eq null
override def tail: Stream[A] = {
if (!tailDefined)
synchronized {
if (!tailDefined) {
tlVal = tlGen()
tlGen = null
}
}
tlVal
}
Double-checked locking and two volatile fields: that's roughly how you would implement a thread-safe lazy computation in Java.
So the questions are:
Doesn't lazy keyword of Scala provide any 'evaluated maximum once' guarantee in a multi-threaded case?
Is the pattern used in real tail implementation an idiomatic way to do a thread-safe lazy evaluation in Scala?
Doesn't lazy keyword of Scala provide any 'evaluated maximum once'
guarantee in a multi-threaded case?
Yes, it does, as others have stated.
Is the pattern used in real tail implementation an idiomatic way to do
a thread-safe lazy evaluation in Scala?
Edit:
I think I have the actual answer as to why not lazy val. Stream has public facing API methods such as hasDefinitionSize inherited from TraversableOnce. In order to know if a Stream has a finite size not, we need a way of checking without materializing the underlying Stream tail. Since lazy val doesn't actually expose the underlying bit, we can't do that.
This is backed by SI-1220
To strengthen this point, #Jasper-M points out that the new LazyList api in strawman (Scala 2.13 collection makeover) no longer has this issue, since the entire collection hierarchy has been reworked and there are no longer such concerns.
Performance related concerns
I would say "it depends" on which angle you're looking at this problem. From a LOB point of view, I'd say definitely go with lazy val for conciseness and clarity of implementation. But, if you look at it from the point of view of a Scala collections library author, things start to look differently. Think of it this way, you're creating a library which will be potentially be used by many people and ran on many machines across the world. This means that you should be thinking of the memory overhead of each structure, especially if you're creating such an essential data structure yourself.
I say this because when you use lazy val, by design you generate an additional Boolean field which flags if the value has been initialized, and I am assuming this is what the library authors were aiming to avoid. The size of a Boolean on the JVM is of course VM dependent, by even a byte is something to consider, especially when people are generating large Streams of data. Again, this is definitely not something I would usually consider and is definitely a micro optimization towards memory usage.
The reason I think performance is one of the key points here is SI-7266 which fixes a memory leak in Stream. Note how it is of importance to track the byte code to make sure no extra values are retained inside the generated class.
The difference in the implementation is that the definition of tail being initialized or not is a method implementation which checks the generator:
def tailDefined: Boolean = tlGen eq null
Instead of a field on the class.
Scala lazy values are evaluated only once in multi-threaded cases. This is because the evaluation of lazy members is actually wrapped in a synchronized block in the generated code.
Lets take a look at the simple claas,
class LazyTest {
lazy val x = 5
}
Now, lets compile this with scalac,
scalac -Xprint:all LazyTest.scala
This will result in,
package <empty> {
class LazyTest extends Object {
final <synthetic> lazy private[this] var x: Int = _;
#volatile private[this] var bitmap$0: Boolean = _;
private def x$lzycompute(): Int = {
LazyTest.this.synchronized(if (LazyTest.this.bitmap$0.unary_!())
{
LazyTest.this.x = (5: Int);
LazyTest.this.bitmap$0 = true
});
LazyTest.this.x
};
<stable> <accessor> lazy def x(): Int = if (LazyTest.this.bitmap$0.unary_!())
LazyTest.this.x$lzycompute()
else
LazyTest.this.x;
def <init>(): LazyTest = {
LazyTest.super.<init>();
()
}
}
}
You should be able to see... that the lazy evaluation is thread-safe. And you will also see some similarity to that "less elegant" implementation in Scala 2.11.7
You can also experiment with tests similar to following,
import scala.concurrent.Future
import scala.concurrent.ExecutionContext.Implicits.global
case class A(i: Int) {
lazy val j = {
println("calculating j")
i + 1
}
}
def checkLazyInMultiThread(): Unit = {
val a = A(6)
val futuresList = Range(1, 20).toList.map(i => Future{
println(s"Future $i :: ${a.j}")
})
Future.sequence(futuresList).onComplete(_ => println("completed"))
}
checkLazyInMultiThread()
Now, the implementation in standard library avoids using lazy because they are able to provide a more efficient solution than this generic lazy translation.
You are correct, lazy vals use locking precisely to guard against double evaluation when accessed at the same time by two threads. Future developments, furthermore, will give the same guarantees without locking.
What is idiomatic, in my humble opinion, is a highly debatable subject when it comes to a language that, by design, allows for a wide range of different idioms to be adopted. In general, however, application code tends to be considered idiomatic when going more into the direction of pure functional programming, as it gives a series of interesting advantages in terms of ease of testing and reasoning that would make sense to give up only in case of serious concerns. This concern can be one of performance, which is why the current implementation of the Scala Collection API, while exposing in most cases a functional interface, makes heavy use (internally and in restricted scopes) of vars, while loops and established patterns from imperative programming (as the one you highlighted in your question).
Related
The following scenario shows an abstraction that seems to me to be impossible to implement declaratively.
Suppose that I want to create a Symbol object which allows you to create objects with strings that can be compared, like Symbol.for() in JavaScript. A simple implementation in JS might look like this:
function MySymbol(text){//Comparable symbol object class
this.text = text;
this.equals = function(other){//Method to compare to other MySymbol
return this.text == other.text;
}
}
I could easily write this in a declarative language like Haskell:
data MySymbol = MySymbol String
makeSymbol :: String -> MySymbol
makeSymbol s = MySymbol s
compareSymbol :: MySymbol -> MySymbol -> Bool
compareSymbol (MySymbol s1) (MySymbol s2) = s1 == s2
However, maybe in the future I want to improve efficiency by using a global registry without changing the interface to the MySymbol objects. (The user of my class doesn't need to know that I've changed it to use a registry)
For example, this is easily done in Javascript:
function MySymbol(text){
if (MySymbol.registry.has(text)){//check if symbol already in registry
this.id = MySymbol.registry.get(text);//get id
} else {
this.id = MySymbol.nextId++;
MySymbol.registry.set(text, this.id);//Add new symbol with nextId
}
this.equals = function(other){//To compare, simply compare ids
return this.id == other.id;
}
}
//Setup initial empty registry
MySymbol.registry = new Map();//A map from strings to numbers
MySymbol.nextId = 0;
However, it is impossible to create a mutable global registry in Haskell. (I can create a registry, but not without changing the interface to my functions.)
Specifically, these three possible Haskell solutions all have problems:
Force the user to pass a registry argument or equivalent, making the interface implementation dependent
Use some fancy Monad stuff like Haskell's Control.Monad.Random, which would require either foreseeing the optimization from the start or changing the interface (and is basically just adding the concept of state into your program and therefore breaks referential transparency etc.)
Have a slow implementation which might not be practical in a given application
None of these solutions allow me to sufficiently abstract away implementation from my Haskell interface.
So, my question is: Is there a way to implement this optimization to a Symbol object in Haskell (or any declarative language) without causing one of the three problems listed above,
and are there any other situations where an imperative language can express an abstraction (for example an optimization like above) that a declarative language can't?
The intern package shows how. As discussed by #luqui, it uses unsafePerformIO at a few key moments, and is careful to hide the identifiers produced during interning.
I'm going through Learning Concurrent Programming in Scala, and encountered the following:
In current versions of Scala, however, certain collections that are
deemed immutable, such as List and Vector, cannot be shared without
synchronization. Although their external API does not allow you to
modify them, they contain non-final fields.
Tip: Even if an object
seems immutable, always use proper synchronization to share any object
between the threads.
From Learning Concurrent Programming in Scala by Aleksandar Prokopec, end of Chapter 2 (p.58), Packt Publishing, Nov 2014.
Can that be right?
My working assumption has always been that any internal mutability (to implement laziness, caching, whatever) in Scala library data structures described as immutable would be idempotent, such that the worst that might happen in a bad race is work would be unnecessarily duplicated. This author seems to suggest correctness may be imperiled by concurrent access to immutable structures. Is that true? Do we really need to synchronize access to Lists?
Much of my transition to an immutable-heavy style has been motivated by a desire to avoid synchronization and the potential contention overhead it entails. It would be an unhappy big deal to learn that synchronization cannot be eschewed for Scala's core "immutable" data structures. Is this author simply overconservative?
Scala's documentation of collections includes the following:
A collection in package scala.collection.immutable is guaranteed to be immutable for everyone. Such a collection will never change after it is created. Therefore, you can rely on the fact that accessing the same collection value repeatedly at different points in time will always yield a collection with the same elements.
That doesn't quite say that they are safe for concurrent access by multiple threads. Does anyone know of an authoritative statement that they are (or aren't)?
It depends on where you share them:
it's not safe to share them inside scala-library
it's not safe to share them with Java-code, reflection
Simply saying, these collections are less protected than objects with only final fields. Regardless that they're same on JVM level (without optimization like ldc) - both may be fields with some mutable address, so you can change them with putfield bytecode command. Anyway, var is still less protected by the compiler, in comparision with java's final, scala's final val and val.
However, it's still fine to use them in most cases as their behaviour is logically immutable - all mutable operations are encapsulated (for Scala-code). Let's look at the Vector. It requires mutable fields to implement appending algorithm:
private var dirty = false
//from VectorPointer
private[immutable] var depth: Int = _
private[immutable] var display0: Array[AnyRef] = _
private[immutable] var display1: Array[AnyRef] = _
private[immutable] var display2: Array[AnyRef] = _
private[immutable] var display3: Array[AnyRef] = _
private[immutable] var display4: Array[AnyRef] = _
private[immutable] var display5: Array[AnyRef] = _
which is implemented like:
val s = new Vector(startIndex, endIndex + 1, blockIndex)
s.initFrom(this) //uses displayN and depth
s.gotoPos(startIndex, startIndex ^ focus) //uses displayN
s.gotoPosWritable //uses dirty
...
s.dirty = dirty
And s comes to the user only after method returned it. So it's not even concern of happens-before guarantees - all mutable operations are performed in the same thread (thread where you call :+, +: or updated), it's just kind of initialization. The only problem here is that private[somePackage] is accessible directly from Java code and from scala-library itself, so if you pass it to some Java's method it could modify them.
I don't think you should worry about thread-safety of let's say cons operator. It also has mutable fields:
final case class ::[B](override val head: B, private[scala] var tl: List[B]) extends List[B] {
override def tail : List[B] = tl
override def isEmpty: Boolean = false
}
But they used only inside library methods (inside one-thread) without any explicit sharing or thread creation, and they always return a new collection, let's consider take as an example:
override def take(n: Int): List[A] = if (isEmpty || n <= 0) Nil else {
val h = new ::(head, Nil)
var t = h
var rest = tail
var i = 1
while ({if (rest.isEmpty) return this; i < n}) {
i += 1
val nx = new ::(rest.head, Nil)
t.tl = nx //here is mutation of t's filed
t = nx
rest = rest.tail
}
h
}
So here t.tl = nx is not much differ from t = nx in the meaning of thread-safety. They both are reffered only from the single stack (take's stack). Althrought, if I add let's say someActor ! t (or any other async operation), someField = t or someFunctionWithExternalSideEffect(t) right inside the while loop - I could break this contract.
A little addtion here about relations with JSR-133:
1) new ::(head, Nil) creates new object in the heap and puts its address (lets say 0x100500) into the stack(val h =)
2) as long as this address is in the stack, it's known only to the current thread
3) Other threads could be involved only after sharing this address by putting it into some field; in case of take it has to flush any caches (to restore the stack and registers) before calling areturn (return h), so returned object will be consistent.
So all operations on 0x100500's object are out of scope of JSR-133 as long as 0x100500 is a part of stack only (not heap, not other's stacks). However, some fields of 0x100500's object may point to some shared objects (which might be in scope JSR-133), but it's not the case here (as these objects are immutable for outside).
I think (hope) the author meant logical synchronization guarantees for library's developers - you still need to be careful with these things if you're developing scala-library, as these vars are private[scala], private[immutable] so, it's possible to write some code to mutate them from different threads. From scala-library developer's perspective, it usually means that all mutations on single instance should be applied in single thread and only on collection that invisible to a user (at the moment). Or, simply saying - don't open mutable fields for outer users in any way.
P.S. Scala had several unexpected issues with synchronization, which caused some parts of the library to be surprisely not thread-safe, so I wouldn't wonder if something may be wrong (and this is a bug then), but in let's say 99% cases for 99% methods immutable collections are thread safe. In worst case you might be pushed from usage of some broken method or just (it might be not just "just" for some cases) need to clone the collection for every thread.
Anyway, immutability is still a good way for thread-safety.
P.S.2 Exotic case which might break immutable collections' thread-safety is using reflection to access their non-final fields.
A little addition about another exotic but really terrifying way, as it pointed out in comments with #Steve Waldman and #axel22 (the author). If you share immutable collection as member of some object shared netween threads && if collection's constructor becomes physically (by JIT) inlined (it's not logically inlined by default) && if your JIT-implementation allows to rearrange inlined code with normal one - then you have to synchronize it (usually is enough to have #volatile). However, IMHO, I don't believe that last condition is a correct behaviour - but for now, can't neither prove nor disprove that.
In your question you are asking for an authoritative statement. I found the following in "Programming in Scala" from Martin Odersky et al:
"Third, there is no way for two threads concurrently accessing an immutable to corrupt its state once it has been properbly constructed, because no thread can change the state of an immutable"
If you look for example at the implementation you see that this is followed in the implementation, see below.
There are some fields inside vector which are not final and could lead to data races. But since they are only changed inside a method creating a new instance and since you need an Synchronization action to access the newly created instance in different threads anyway everyting is fine.
The pattern used here is to create and modify an object. Than make it visible to other threads, for example by assigning this instance to a volatile static or static final. And after that make sure that it is not changed anymore.
As an Example the creation of two vectors:
val vector = Vector(4,5,5)
val vector2 = vector.updated(1, 2);
The method updated uses the var field dirty inside:
private[immutable] def updateAt[B >: A](index: Int, elem: B): Vector[B] = {
val idx = checkRangeConvert(index)
val s = new Vector[B](startIndex, endIndex, idx)
s.initFrom(this)
s.dirty = dirty
s.gotoPosWritable(focus, idx, focus ^ idx) // if dirty commit changes; go to new pos and prepare for writing
s.display0(idx & 0x1f) = elem.asInstanceOf[AnyRef]
s
}
but since after creation of vector2 it is assigned to a final variable:
Bytecode of variable declaration:
private final scala.collection.immutable.Vector vector2;
Byte code of constructor:
61 invokevirtual scala.collection.immutable.Vector.updated(int, java.lang.Object, scala.collection.generic.CanBuildFrom) : java.lang.Object [52]
64 checkcast scala.collection.immutable.Vector [48]
67 putfield trace.agent.test.scala.TestVector$.vector2 : scala.collection.immutable.Vector [22]
Everything is o.k.
im am writing a simple emulator in go (should i? or should i go back to c?).
anyway, i am fetching the instruction and decoding it. at this point i have a byte like 0x81, and i have to execute the right function.
should i have something like this
func (sys *cpu) eval() {
switch opcode {
case 0x80:
sys.add(sys.b)
case 0x81:
sys.add(sys.c)
etc
}
}
or something like this
var fnTable = []func(*cpu) {
0x80: func(sys *cpu) {
sys.add(sys.b)
},
0x81: func(sys *cpu) {
sys.add(sys.c)
}
}
func (sys *cpu) eval() {
return fnTable[opcode](sys)
}
1.which one is better?
2.which one is faster?
also
3.can i declare a function inline?
4.i have a cpu struct in which i have the registers etc. would it be faster if i have the registers and all as globals? (without the struct)
thank you very much.
I did some benchmarks and the table version is faster than the switch version once you have more than about 4 cases.
I was surprised to discover that the Go compiler (gc, anyway; not sure about gccgo) doesn't seem to be smart enough to turn a dense switch into a jump table.
Update:
Ken Thompson posted on the Go mailing list describing the difficulties of optimizing switch.
The first version looks better to me, YMMV.
Benchmark it. Depends how good is the compiler at optimizing. The "jump table" version might be faster if the compiler doesn't try hard enough to optimize.
Depends on your definition of what is "to declare function inline". Go can declare and define functions/methods at the top level only. But functions are first class citizens in Go, so one can have variables/parameters/return values and structured types of function type. In all this places a function literal can [also] be assigned to the variable/field/element...
Possibly. Still I would suggest to not keep the cpu state in a global variable. Once you possibly decide to go emulating multicore, it will be welcome ;-)
If you have the ast of some expression, and you want to eval it for a big amount of data rows, then you may only once compile it into the tree of lambdas, and do not calculate any switches on each iteration at all;
For example, given such ast: {* (a, {+ (b, c)})}
Compile function (in very rough pseudo language) will be something like this:
func (e *evaluator) compile(brunch ast) {
switch brunch.type {
case binaryOperator:
switch brunch.op {
case *: return func() {compile(brunch.arg0) * compile(brunch.arg1)}
case +: return func() {compile(brunch.arg0) + compile(brunch.arg1)}
}
case BasicLit: return func() {return brunch.arg0}
case Ident: return func(){return e.GetIdent(brunch.arg0)}
}
}
So eventually compile returns the func, that must be called on each row of your data and there will be no switches or other calculation stuff at all.
There remains the question about operations with data of different types, that is for your own research ;)
This is an interesting approach, in situations, when there is no jump-table mechanism available :) but sure, func call is more complex operation then jump.
Can anyone explain why these iterators behave differently? I generally expect a String to act like an IndexedSeq[Char]. Is this documented anywhere?
val si: Iterator[Char] = "uvwxyz".iterator
val vi: Iterator[Char] = "uvwxyz".toIndexedSeq.iterator
val sr = for (i <- 1 to 3)
yield si take 2 mkString
//sr: scala.collection.immutable.IndexedSeq[String] = Vector(uv, uv, uv)
val vr = for (i <- 1 to 3)
yield vi take 2 mkString
//vr: scala.collection.immutable.IndexedSeq[String] = Vector(uv, wx, yz)
There are no guarantees about the state of the iterator after you invoke take on it.
The problem with iterators is that many useful operations can only be implemented by causing side effects. All these operations have a specified direct effect but may also have side effects that cannot be specified (or would complicate the implementation).
In the case of take there are implementations that clone the internal state of the iterator and others that advance the iterator. If you want to guarantee the absence of side-effects you will have to use immutable data structures, in any other case your code should only rely on direct effects.
Something that comes up a fair amount when dealing with heterogeneous data is the need to partially change simple objects that hold data. For instance, you might want to add, drop, or rename a property, or concatenate two objects. This is easy enough in dynamic languages, but I'm wondering if there are any clever solutions proposed by static languages?
To fix ideas, are there any languages that enable, perhaps through some sort of static mixin syntax, something like this (C#):
var hello = new { Hello = "Hello" };
var world = new { World = "World" };
var helloWorld = hello + world;
Console.WriteLine(helloWorld.ToString());
//outputs {Hello = Hello, World = World}
This certainly seems possible, since no runtime information is used. Are there static languages that have this capability?
Added:
A limited version of what I'm considering is F#'s copy-and-update record expression:
let myRecord3 = { myRecord2 with Y = 100; Z = 2 }
What you're describing is known in programming language research as record concatenation. There has been some work on static type systems for record concatenation, mostly in the context of automatic type inference a la Haskell or ML. To the best of my knowledge it has not yet made an impact on any mainstream programming languages.