Scope of variables inside threaded for loops?

Scope of variables inside threaded for loops? - multithreading

In the following example
shared_arr = zeros(4000)
Threads.#threads for thread = 1:4
tmp_arr = rand(1000)
for i = 1:1000
shared_arr[(thread - 1)*1000+i] = tmp_arr[i]
end
end
I believe shared_arr is shared among all threads. Is tmp_arr allocated 4 times so that each thread has it's own tmp_arr?

According to the scoping rules described in the documentation, a new scope is introduced whenever a for-loop is invoked. Since tmp_arr isn't declared prior to the loop, it will be a distinct value in each iteration of the for loop. Note that rand might not be threadsafe however per #Lyndon White's comment.

Related

Can a race condition occur when executing such code on Julia?

I have a function below. Can a race condition occur when executing such code?
function thread_test(v)
Threads.#threads for i = 1:length(v)
#inbounds v[i] = rand()
end
sum(v)
end

If v is an Array there will be no race condition. Accessing different array elements in different threads is safe.
However, if v is e.g. a Dict{Int, Float64} you can have race conditions. Similarly, you are not guaranteed thread safety for subtypes of AbstractArray, like BitVector.

Non blocking reads with Julia

I would like to read an user input without blocking the main thread, much like the getch() function from conio.h. Is it possible in Julia?
I tried with #async but it looked like my input wasn't being read although the main thread wasn't blocked.

The problem, I believe, is either you are running on global scope which makes #async create its own local variables (when it reads, it reads into a variable in another scope) or you are using an old version of Julia.
The following examples read an integer from STDIN in a non-blocking fashion.
function foo()
a = 0
#async a = parse(Int64, readline())
println("See, it is not blocking!")
while (a == 0)
print("")
end
println(a)
end
The following two examples do the job in global scope, using an array. You can do the same trick with other types mutable objects.
Array example:
function nonblocking_readInt()
arr = [0]
#async arr[1] = parse(Int64, readline())
arr
end
r = nonblocking_readInt() # is an array
println("See, it is not blocking!")
while(r[1] == 0) # sentinel value check
print("")
end
println(r[1])

Evaluation order for always blocks triggered within always blocks in Verilog?

I understand that, for 2 always blocks with the same trigger, their order of evaluation is completely unpredictable.
However, suppose I have:
always #(a) begin : blockX
c = 0;
d = a + 2;
if(c != 1) e = 2;
end
always #(a) begin : blockY
e = 3;
end
always #(d) begin : blockZ
c = 1;
e = 1;
end
Suppose block X evaluates first. Does changing d in blockX immediately jump to blockZ? If not, when is blockZ evaluated with respect to blockY?
My programmer's instinct thinks of the sequence of events as a stack, where evaluating blockX is like a function call to blockZ and I immediately jump there in the code, then finish evaluating blockX.
However, because we call the active events queue, well, a queue, this suggests blockZ is enqueued at the back of the active events queue, and I'm 100% guaranteed it will be evaluated last (unless there are other triggered always blocks).
There's also the intermediate possibility, where it's neither first nor last but is also evaluated in a random and unpredictable order.
So in this example, are 1, 2, or 3 all possible final values for e, depending on how the compiler is feeling at run time?
Additionally, while I understand, of course, this represents awful style, where might I find the specification for this kind of behvaior?

Always blocks are not function calls. See a recent answer I just gave for a similar question. These blocks are concurrent processes. The LRM only guarentees the ordering of statements within a begin/end block. There is no defined ordering between concurrently executing begin/end blocks (See Section 4.7 Nondeterminism in the 1800-2012 LRM) So a simulator is free to interleave the statements in any way as long as it honors the order within a single block.
So you are correct that e could have the final values 1, 2 or 3 depending on how a simulator decides to implement and optimize your code.

Must access to scala.collection.immutable.List and Vector be synchronized?

I'm going through Learning Concurrent Programming in Scala, and encountered the following:
In current versions of Scala, however, certain collections that are
deemed immutable, such as List and Vector, cannot be shared without
synchronization. Although their external API does not allow you to
modify them, they contain non-final fields.
Tip: Even if an object
seems immutable, always use proper synchronization to share any object
between the threads.
From Learning Concurrent Programming in Scala by Aleksandar Prokopec, end of Chapter 2 (p.58), Packt Publishing, Nov 2014.
Can that be right?
My working assumption has always been that any internal mutability (to implement laziness, caching, whatever) in Scala library data structures described as immutable would be idempotent, such that the worst that might happen in a bad race is work would be unnecessarily duplicated. This author seems to suggest correctness may be imperiled by concurrent access to immutable structures. Is that true? Do we really need to synchronize access to Lists?
Much of my transition to an immutable-heavy style has been motivated by a desire to avoid synchronization and the potential contention overhead it entails. It would be an unhappy big deal to learn that synchronization cannot be eschewed for Scala's core "immutable" data structures. Is this author simply overconservative?
Scala's documentation of collections includes the following:
A collection in package scala.collection.immutable is guaranteed to be immutable for everyone. Such a collection will never change after it is created. Therefore, you can rely on the fact that accessing the same collection value repeatedly at different points in time will always yield a collection with the same elements.
That doesn't quite say that they are safe for concurrent access by multiple threads. Does anyone know of an authoritative statement that they are (or aren't)?

It depends on where you share them:
it's not safe to share them inside scala-library
it's not safe to share them with Java-code, reflection
Simply saying, these collections are less protected than objects with only final fields. Regardless that they're same on JVM level (without optimization like ldc) - both may be fields with some mutable address, so you can change them with putfield bytecode command. Anyway, var is still less protected by the compiler, in comparision with java's final, scala's final val and val.
However, it's still fine to use them in most cases as their behaviour is logically immutable - all mutable operations are encapsulated (for Scala-code). Let's look at the Vector. It requires mutable fields to implement appending algorithm:
private var dirty = false
//from VectorPointer
private[immutable] var depth: Int = _
private[immutable] var display0: Array[AnyRef] = _
private[immutable] var display1: Array[AnyRef] = _
private[immutable] var display2: Array[AnyRef] = _
private[immutable] var display3: Array[AnyRef] = _
private[immutable] var display4: Array[AnyRef] = _
private[immutable] var display5: Array[AnyRef] = _
which is implemented like:
val s = new Vector(startIndex, endIndex + 1, blockIndex)
s.initFrom(this) //uses displayN and depth
s.gotoPos(startIndex, startIndex ^ focus) //uses displayN
s.gotoPosWritable //uses dirty
...
s.dirty = dirty
And s comes to the user only after method returned it. So it's not even concern of happens-before guarantees - all mutable operations are performed in the same thread (thread where you call :+, +: or updated), it's just kind of initialization. The only problem here is that private[somePackage] is accessible directly from Java code and from scala-library itself, so if you pass it to some Java's method it could modify them.
I don't think you should worry about thread-safety of let's say cons operator. It also has mutable fields:
final case class ::[B](override val head: B, private[scala] var tl: List[B]) extends List[B] {
override def tail : List[B] = tl
override def isEmpty: Boolean = false
}
But they used only inside library methods (inside one-thread) without any explicit sharing or thread creation, and they always return a new collection, let's consider take as an example:
override def take(n: Int): List[A] = if (isEmpty || n <= 0) Nil else {
val h = new ::(head, Nil)
var t = h
var rest = tail
var i = 1
while ({if (rest.isEmpty) return this; i < n}) {
i += 1
val nx = new ::(rest.head, Nil)
t.tl = nx //here is mutation of t's filed
t = nx
rest = rest.tail
}
h
}
So here t.tl = nx is not much differ from t = nx in the meaning of thread-safety. They both are reffered only from the single stack (take's stack). Althrought, if I add let's say someActor ! t (or any other async operation), someField = t or someFunctionWithExternalSideEffect(t) right inside the while loop - I could break this contract.
A little addtion here about relations with JSR-133:
1) new ::(head, Nil) creates new object in the heap and puts its address (lets say 0x100500) into the stack(val h =)
2) as long as this address is in the stack, it's known only to the current thread
3) Other threads could be involved only after sharing this address by putting it into some field; in case of take it has to flush any caches (to restore the stack and registers) before calling areturn (return h), so returned object will be consistent.
So all operations on 0x100500's object are out of scope of JSR-133 as long as 0x100500 is a part of stack only (not heap, not other's stacks). However, some fields of 0x100500's object may point to some shared objects (which might be in scope JSR-133), but it's not the case here (as these objects are immutable for outside).
I think (hope) the author meant logical synchronization guarantees for library's developers - you still need to be careful with these things if you're developing scala-library, as these vars are private[scala], private[immutable] so, it's possible to write some code to mutate them from different threads. From scala-library developer's perspective, it usually means that all mutations on single instance should be applied in single thread and only on collection that invisible to a user (at the moment). Or, simply saying - don't open mutable fields for outer users in any way.
P.S. Scala had several unexpected issues with synchronization, which caused some parts of the library to be surprisely not thread-safe, so I wouldn't wonder if something may be wrong (and this is a bug then), but in let's say 99% cases for 99% methods immutable collections are thread safe. In worst case you might be pushed from usage of some broken method or just (it might be not just "just" for some cases) need to clone the collection for every thread.
Anyway, immutability is still a good way for thread-safety.
P.S.2 Exotic case which might break immutable collections' thread-safety is using reflection to access their non-final fields.
A little addition about another exotic but really terrifying way, as it pointed out in comments with #Steve Waldman and #axel22 (the author). If you share immutable collection as member of some object shared netween threads && if collection's constructor becomes physically (by JIT) inlined (it's not logically inlined by default) && if your JIT-implementation allows to rearrange inlined code with normal one - then you have to synchronize it (usually is enough to have #volatile). However, IMHO, I don't believe that last condition is a correct behaviour - but for now, can't neither prove nor disprove that.

In your question you are asking for an authoritative statement. I found the following in "Programming in Scala" from Martin Odersky et al:
"Third, there is no way for two threads concurrently accessing an immutable to corrupt its state once it has been properbly constructed, because no thread can change the state of an immutable"
If you look for example at the implementation you see that this is followed in the implementation, see below.
There are some fields inside vector which are not final and could lead to data races. But since they are only changed inside a method creating a new instance and since you need an Synchronization action to access the newly created instance in different threads anyway everyting is fine.
The pattern used here is to create and modify an object. Than make it visible to other threads, for example by assigning this instance to a volatile static or static final. And after that make sure that it is not changed anymore.
As an Example the creation of two vectors:
val vector = Vector(4,5,5)
val vector2 = vector.updated(1, 2);
The method updated uses the var field dirty inside:
private[immutable] def updateAt[B >: A](index: Int, elem: B): Vector[B] = {
val idx = checkRangeConvert(index)
val s = new Vector[B](startIndex, endIndex, idx)
s.initFrom(this)
s.dirty = dirty
s.gotoPosWritable(focus, idx, focus ^ idx) // if dirty commit changes; go to new pos and prepare for writing
s.display0(idx & 0x1f) = elem.asInstanceOf[AnyRef]
s
}
but since after creation of vector2 it is assigned to a final variable:
Bytecode of variable declaration:
private final scala.collection.immutable.Vector vector2;
Byte code of constructor:
61 invokevirtual scala.collection.immutable.Vector.updated(int, java.lang.Object, scala.collection.generic.CanBuildFrom) : java.lang.Object [52]
64 checkcast scala.collection.immutable.Vector [48]
67 putfield trace.agent.test.scala.TestVector$.vector2 : scala.collection.immutable.Vector [22]
Everything is o.k.

Multithread+Recursion strategies

I am just starting to learn the ins-and-outs of multithread programming and have a few basic questions that, once answered, should keep me occupied for quite sometime. I understand that multithreading loses its effectiveness once you have created more threads than there are cores (due to context switching and cache flushing). With that understood, I can think of two ways to employ multithreading of a recursive function...but am not quite sure what is the common way to approach the problem. One seems much more complicated, perhaps with a higher payoff...but thats what I hope you will be able to tell me.
Below is pseudo-code for two different methods of multithreading a recursive function. I have used the terminology of merge sort for simplicity, but it's not that important. It is easy to see how to generalize the methods to other problems. Also, I will personally be employing these methods using the pthreads library in C, so the thread syntax mildly reflects this.
Method 1:
main ()
{
A = array of length N
NUM_CORES = get number of functional cores
chunk[NUM_CORES] = array of indices partitioning A into (N / NUM_CORES) sized chunks
thread_id[NUM_CORES] = array of thread id’s
thread[NUM_CORES] = array of thread type
//start NUM_CORES threads on working on each chunk of A
for i = 0 to (NUM_CORES - 1) {
thread_id[i] = thread_start(thread[i], MergeSort, chunk[i])
}
//wait for all threads to finish
//Merge chunks appropriately
exit
}
MergeSort ( chunk )
{
MergeSort ( lowerSubChunk )
MergeSort ( higherSubChunk )
Merge(lowerSubChunk, higherSubChunk)
}
//Merge(,) not shown
Method 2:
main ()
{
A = array of length N
NUM_CORES = get number of functional cores
chunk = indices 0 and N
thread_id[NUM_CORES] = array of thread id’s
thread[NUM_CORES] = array of thread type
//lock variable aka mutex
THREADS_IN_USE = 1
MergeSort( chunk )
exit
}
MergeSort ( chunk )
{
lock THREADS_IN_USE
if ( THREADS_IN_USE < NUM_CORES ) {
FREE_CORE = find index of unused core
thread_id[FREE_CORE] = thread_start(thread[FREE_CORE], MergeSort, lowerSubChunk)
THREADS_IN_USE++
unlock THREADS_IN_USE
MergeSort( higherSubChunk )
//wait for thread_id[FREE_CORE] and current thread to finish
lock THREADS_IN_USE
THREADS_IN_USE--
unlock THREADS_IN_USE
Merge(lowerSubChunk, higherSubChunk)
}
else {
unlock THREADS_IN_USE
MergeSort( lowerSubChunk )
MergeSort( higherSubChunk )
Merge(lowerSubChunk, higherSubChunk)
}
}
//Merge(,) not shown
Visually, one can think of the differences between these two methods as follows:
Method 1: creates NUM_CORES separate recursion trees, each one having a single core traversing it.
Method 2: creates a single recursion tree but has all cores traversing it. In particular, whenever there is a free core, it is set to work on the "left child subtree" of the first node where MergeSort is called after the core is freed.
The problem with Method 1 is that if it is the case that the running time of the recursive function varies with the distribution of values within each initial subchunk (i.e. the chunk[i]), one thread could finish much faster leaving a core sitting idle while the others finish. With Merge Sort this is not likely to be the case since the work of MergeSort happens in Merge whose runtime isn't affected much by the distribution of values in the (sorted) subchunks. However, with a more involved recursive function, the running time on one subchunk could be much longer!
With Method 2 it is possible to have the same problem. Again, with merge sort its not clear since the running time for each subchunk is likely to be similar, but the line //wait for thread_id[FREE_CORE] and current thread to finish would also require one core to wait for the other. However, with Method 2, all calls to Merge run ASAP as opposed to Method 1 where one must wait for NUM_CORES calls to MergeSort to finish and then do NUM_CORES - 1 merges afterward (although you can multithread this as well...to an extent)
(though the syntax might not be completely correct)
Are both of these methods used in practice? Are there situations where one is more beneficial over the other? Is this the correct way to implement Method 2? (in this case, THREADS_IN_USE is a semaphore?)
Thanks so much for your help!

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Scope of variables inside threaded for loops? - multithreading

In the following example shared_arr = zeros(4000) Threads.#threads for thread = 1:4 tmp_arr = rand(1000) for i = 1:1000 shared_arr[(thread - 1)*1000+i] = tmp_arr[i] end end I believe shared_arr is shared among all threads. Is tmp_arr allocated 4 times so that each thread has it's own tmp_arr?

Related

Can a race condition occur when executing such code on Julia?

Non blocking reads with Julia

Evaluation order for always blocks triggered within always blocks in Verilog?

Must access to scala.collection.immutable.List and Vector be synchronized?

Multithread+Recursion strategies

Categories

Resources