What's wrong with this groovy for-loop of Closures? [duplicate] - groovy

In the context of Jenkins pipelines, I have some Groovy code that's enumerating a list, creating closures, and then using that value in the closure as a key to lookup another value in a map. This appears to be rife with some sort of anomaly or race condition almost every time.
This is a simplification of the code:
def tasks = [:]
for (platformName in platforms) {
// ...
tasks[platformName] = {
def componentUploadPath = componentUploadPaths[platformName]
echo "Uploading for platform [${platformName}] to [${componentUploadPath}]."
// ...
}
tasks.failFast = true
parallel(tasks)
platforms has two values. I will usually see two iterations and two tasks registered and the keys in tasks will be correct, but the echo statement inside the closure indicates that we're just running one of the platforms twice:
14:20:02 [platform2] Uploading for platform [platform1] to [some_path/platform1].
14:20:02 [platform1] Uploading for platform [platform1] to [some_path/platform1].
It's ridiculous.
What do I need to add or do differently?

It's the same issue as you'd see in Javascript.
When you generate the closures in a for loop, they are bound to a variable, not the value of the variable.
When the loop exits, and the closures are run, they will all be using the same value...that is -- the last value in the for loop before it exited
For example, you'd expect the following to print 1 2 3 4, but it doesn't
def closures = []
for (i in 1..4) {
closures << { -> println i }
}
closures.each { it() }
It prints 4 4 4 4
To fix this, you need to do one of two things... First, you could capture the value in a locally scoped variable, then close over this variable:
for (i in 1..4) {
def n = i
closures << { -> println n }
}
The second thing you could do is use groovy's each or collect as each time they are called, the variable is a different instance, so it works again:
(1..4).each { i ->
closures << { -> println i }
}
For your case, you can loop over platforms and collect into a map at the same time by using collectEntries:
def tasks = platforms.collectEntries { platformName ->
[
platformName,
{ ->
def componentUploadPath = componentUploadPaths[platformName]
echo "Uploading for platform [${platformName}] to [${componentUploadPath}]."
}
]
}
Hope this helps!

Related

Dynamic variables and promises

It seems that dynamic variables don't always survive subroutine calls in threads:
sub foo($x, &y = &infix:<+>) {
my &*z = &y;
bar($x);
}
sub bar ($x) {
say &*z($x,$x);
my $promise = start { bar($x-1) if $x > 0 }
await $promise;
# bar($x-1) if $x > 0 # <-- provides the expected result: 6, 4, 2, 0
}
foo(3); # 6, 4, Dynamic variable &*z not found
Using a more globally scoped variable also works, so it's not that all variables are lost — it seems specific to dynamics:
our &b;
sub foo($a, &c = &infix:<+>) {
&b = &c;
bar($a);
}
sub bar ($a) {
say &b($a,$a);
my $promise = start { bar($a-1) if $a > 0 }
await $promise;
}
foo(3); # 6, 4, 2, 0
Once the variable is set in foo(), it is read without problem in bar(). But when bar() is called from inside the promise, the value for &*z disappears not on the first layer of recursion but the second.
I'm sensing a bug but maybe I'm doing something weird with the between the recursion/dynamic variables/threading that's messing things up.
Under current semantics, start will capture the context it was invoked in. If dynamic variable lookup fails on the stack of the thread that the start executes on (one of those from the thread pool), then it will fall back to looking at the dynamic scope captured when the start block was scheduled.
When a start block is created during the execution of another start block, the same thing happens. However, there is no relationship between the two, meaning that the context captured by the "outer" start block will not be searched also. While one could argue for that to happen, it seems potentially problematic to do so. Consider this example:
sub tick($n = 1 --> Nil) {
start {
await Promise.in(1);
say $n;
tick($n + 1);
}
}
tick();
sleep;
This is a (not entirely idiomatic) way to produce a tick every second. Were the inner start to retain a reference back to the state of the outer one, for the purpose of dynamic variable lookup, then this program would build up a chain of ever increasing length in memory, which seems like an undesirable behavior.

Implementing pipes without using threads

I am working on a small language for fun and to try out some ideas. One of the ideas I am trying to implement is piping like in the shell but with arbitrary objects. An example might make this clearer.
The functions range and show_pipe can be defined like this:
range(n) => {
x := 0;
while x < n do {
push x;
x := x + 1;
}
}
show_pipe() => {
while true do {
x := pull;
if x = FinishedPipe then {
return 0
} else {
print(x)
};
}
}
push pushes a value into the next part of the pipeline and suspends the function until another value is needed and pull pulls a value from the pipe and returns it or FinishedPipe if the previous part of the pipeline has finished executing.
You can then pipe these two function together with range(10) | show_pipe() which will show the numbers 0 through 9 on console.
I'm implementing this by using a thread for each part of the pipeline and using thread safe queues for passing values from one part of the pipe the other. I would really like to find a way to implement pipes without using threads. I am using Rust so I can't use coroutines.

SyncVar transfer producer/consumer threads in scala

Note: The problem that I solve has only educational purpose, I know that abstraction that I want to create is error prone and so on... I don't need fast solution, I need explanation.
In the book I am reading there is exercise that says that I need to implement SyncVar which has the following interface:
class SyncVar[T] {
def get(): T = ???
def put(x: T): Unit = ???
}
My comment: Alright seems understandable, need some sync variable that I can put or get.
A SyncVar object is used to exchange values between two or more threads.
When created, the SyncVar object is empty:
° Calling get throws an exception
° Calling put adds a value to the SyncVar object
After a value is added to a SyncVar object, we can say that it is non-empty:
° Calling get returns the current value, and changes the state to empty
° Calling put throws an exception
My thoughts: This is variable that throws exception on empty value when calling get, or put when we have a value, when we call get it clears previous value. Seems like I need to use Option.
So I provide the following implementation:
class SyncVar[T] {
var value: Option[T] = None
def get(): T = value match {
case Some(t) => this.synchronized {
value = None
t
}
case None => throw new IllegalArgumentException("error get")
}
def put(x: T): Unit = this.synchronized{
value match {
case Some(t) => throw new IllegalArgumentException("error put")
case None => value = Some(x)
}
}
def isEmpty = value.isEmpty
def nonEmpty = value.nonEmpty
}
My comment:
Synchronously invoking put and get, also have isEmpty and nonEmpty
The next task makes me confused:
The SyncVar object from the previous exercise can be cumbersome to use,
due to exceptions when the SyncVar object is in an invalid state. Implement
a pair of methods isEmpty and nonEmpty on the SyncVar object. Then,
implement a producer thread that transfers a range of numbers 0 until 15
to the consumer thread that prints them.
As I understand I need two threads:
//producer thread that produces numbers from 1 to 15
val producerThread = thread{
for (i <- 0 until 15){
println(s"$i")
if (syncVar.isEmpty) {
println(s"put $i")
syncVar.put(i)
}
}
}
//consumer that prints value from 0 to 15
val consumerThread = thread{
while (true) {
if (syncVar.nonEmpty) println(s"get ${syncVar.get()}")
}
}
Question:
But this code caused by nondeterminism, so it has different result each time, while I need to print numbers from 1 to 15 (in right order). Could you explain me what is wrong with my solution?
First, your synchronized in get is too narrow. It should surround the entire method, like in put (can you think why?).
After fixing, consider this scenario:
producerThread puts 0 into syncVar.
producerThread continues to run and tries to put 1. syncVar.isEmpty returns false so it doesn't put 1. It continues to loop with next i instead.
consumerThread gets 0.
producerThread puts 2.
Etc. So consumerThread can never get and print 1, because producerThread never puts it there.
Think what producerThread should do if syncVar is not empty and what consumerThread should do if it is.
Thanks to #Alexey Romanov, finally I implement transfer method:
Explanation:
The idea is, that producer thread checks is syncVar is empty, if it is it puts it, otherwise it waits with while(syncVar.nonEmpty){} (using busy waiting, which is bad practice, but it is important to know about it in educational purpose) and when we leaving the loop(stop busy waiting) we putting variable and leaving for loop for i == 0. Meanwhile consumer thread busy waiting forever, and reads variable when it is nonEmpty.
Solution:
def transfer() = {
val syncVar = new SyncVar[Int]
val producerThread = thread{
log("producer thread started")
for (i <- 0 until 15){
if (syncVar.isEmpty) {
syncVar.put(i)
} else {
while (syncVar.nonEmpty) {
log("busy wating")
}
if (syncVar.isEmpty) {
syncVar.put(i)
}
}
}
}
val consumerThread = thread{
log("consumer thread started")
while (true) {
if (syncVar.nonEmpty) {
syncVar.get()
}
}
}
}

Fork/Join example with GPars

I found an example for fork/join in GPars here: Fork/Join
import static groovyx.gpars.GParsPool.runForkJoin
import static groovyx.gpars.GParsPool.withPool
withPool() {
println """Number of files: ${
runForkJoin(new File("./src")) {file ->
long count = 0
file.eachFile {
if (it.isDirectory()) {
println "Forking a child task for $it"
forkOffChild(it) //fork a child task
} else {
count++
}
}
return count + (childrenResults.sum(0))
//use results of children tasks to calculate and store own result
}
}"""
}
It works and returns the correct number of files, but unfortunately I don't understand this line:
return count + (childrenResults.sum(0))
How exactly work count and childrenResult?
Why is a 0 passed as a parameter to sum()?
I'm not much familiar with GPars, but the link you provided says it is a Divide-and-Conquer algorithm and clarifies a bit more what's implicit later on, explaining that forkOffChild() does not wait -- instead getChildrenResults() does.
You may find easier to understand the provided alternative approach in the same page, that uses a more Java-ish style, if you're more familiar to that.
childrenResults results in calling the method getChildrenResults(), this is the "join" in "Fork/Join", it waits for all children to finish and then returns a list with the results of them (or re-throws any exception a children may have thrown).
0 is just the initial value for the sum. If childrenResult is empty, that's what gets summed to count:
groovy:000> [].sum(1)
===> 1
groovy:000> [1].sum(1)
===> 2

Access to modified closure - ref int

int count = itemsToValidate.Count;
foreach(var item in itemsToValidate)
{
item.ValidateAsync += (x, y) => this.HandleValidate(ref count);
}
private void HandleValidate(ref int x)
{
--x;
if (x == 0)
{
// All items are validated.
}
}
For the above code resharper complained "Access to Modified Closure". Doesn't do that if I change that to type of object. Why is this a closure, even though I am passing by ref ?
This happens all the time
ReSharper is warning you that count is implicitly captured by the lambdas that you are assigning as "validation complete" event handlers, and that its value may well change between the time the lambda is created (i.e. when you assign the event handler) and the time when it is invoked. If this happens, the lambda will not see the value one would intuitively expect.
An example:
int count = itemsToValidate.Count;
foreach(var item in itemsToValidate)
{
item.ValidateAsync += (x, y) => this.HandleValidate(ref count);
}
// afterwards, at some point before the handlers get invoked:
count = 0;
In this instance the handlers will read the value of count as 0 instead of itemsToValidate.Count -- which might be called "obvious", but is surprising and counter-intuitive to many developers not familiar with the mechanics of lambdas.
And we usually solve it like this
The usual solution to "shut R# up" is to move the captured variable in an inner scope, where it is much less accessible and R# can be prove that it cannot be modified until the lambda is evaluated:
int count = itemsToValidate.Count;
foreach(var item in itemsToValidate)
{
int inner = count; // this makes inner impossible to modify
item.ValidateAsync += (x, y) => this.HandleValidate(ref inner);
}
// now this will of course not affect what the lambdas do
count = 0;
But your case is special
Your particular case is a comparatively rare one where you specifically want this behavior, and using the above trick would actually make the program behave incorrectly (you need the captured references to point to the same count).
The correct solution: disable this warning using the special line comments that R# recognizes.

Resources