I found an example for fork/join in GPars here: Fork/Join
import static groovyx.gpars.GParsPool.runForkJoin
import static groovyx.gpars.GParsPool.withPool
withPool() {
println """Number of files: ${
runForkJoin(new File("./src")) {file ->
long count = 0
file.eachFile {
if (it.isDirectory()) {
println "Forking a child task for $it"
forkOffChild(it) //fork a child task
} else {
count++
}
}
return count + (childrenResults.sum(0))
//use results of children tasks to calculate and store own result
}
}"""
}
It works and returns the correct number of files, but unfortunately I don't understand this line:
return count + (childrenResults.sum(0))
How exactly work count and childrenResult?
Why is a 0 passed as a parameter to sum()?
I'm not much familiar with GPars, but the link you provided says it is a Divide-and-Conquer algorithm and clarifies a bit more what's implicit later on, explaining that forkOffChild() does not wait -- instead getChildrenResults() does.
You may find easier to understand the provided alternative approach in the same page, that uses a more Java-ish style, if you're more familiar to that.
childrenResults results in calling the method getChildrenResults(), this is the "join" in "Fork/Join", it waits for all children to finish and then returns a list with the results of them (or re-throws any exception a children may have thrown).
0 is just the initial value for the sum. If childrenResult is empty, that's what gets summed to count:
groovy:000> [].sum(1)
===> 1
groovy:000> [1].sum(1)
===> 2
Related
In the context of Jenkins pipelines, I have some Groovy code that's enumerating a list, creating closures, and then using that value in the closure as a key to lookup another value in a map. This appears to be rife with some sort of anomaly or race condition almost every time.
This is a simplification of the code:
def tasks = [:]
for (platformName in platforms) {
// ...
tasks[platformName] = {
def componentUploadPath = componentUploadPaths[platformName]
echo "Uploading for platform [${platformName}] to [${componentUploadPath}]."
// ...
}
tasks.failFast = true
parallel(tasks)
platforms has two values. I will usually see two iterations and two tasks registered and the keys in tasks will be correct, but the echo statement inside the closure indicates that we're just running one of the platforms twice:
14:20:02 [platform2] Uploading for platform [platform1] to [some_path/platform1].
14:20:02 [platform1] Uploading for platform [platform1] to [some_path/platform1].
It's ridiculous.
What do I need to add or do differently?
It's the same issue as you'd see in Javascript.
When you generate the closures in a for loop, they are bound to a variable, not the value of the variable.
When the loop exits, and the closures are run, they will all be using the same value...that is -- the last value in the for loop before it exited
For example, you'd expect the following to print 1 2 3 4, but it doesn't
def closures = []
for (i in 1..4) {
closures << { -> println i }
}
closures.each { it() }
It prints 4 4 4 4
To fix this, you need to do one of two things... First, you could capture the value in a locally scoped variable, then close over this variable:
for (i in 1..4) {
def n = i
closures << { -> println n }
}
The second thing you could do is use groovy's each or collect as each time they are called, the variable is a different instance, so it works again:
(1..4).each { i ->
closures << { -> println i }
}
For your case, you can loop over platforms and collect into a map at the same time by using collectEntries:
def tasks = platforms.collectEntries { platformName ->
[
platformName,
{ ->
def componentUploadPath = componentUploadPaths[platformName]
echo "Uploading for platform [${platformName}] to [${componentUploadPath}]."
}
]
}
Hope this helps!
I want to give users the ability to stop a sorting routine if it takes too long.
I tried to use DispatchWorkItem.cancel. However, this does not actually stop processes that have already started.
let myArray = [String]() // potentially 230M elements!
...
workItem = DispatchWorkItem {
let result = myArray.sorted()
DispatchQueue.main.async {
print ("done")
}
}
// if process is too long, user clicks "Cancel" =>
workItem.cancel() // does not stop sort
How can I kill the workItem's thread?
I don't have access to the sorted routine, therefore I cannot insert tests to check if current thread is in "cancelled" status...
As you have deduced, one simply cannot kill a work item without periodically checking isCancelled and manually performing an early exit if it is set.
There are two options:
You can used sorted(by:), test for isCancelled there, throwing an error if it is canceled. That achieves the desired early exit.
That might look like:
func cancellableSort() -> DispatchWorkItem {
var item: DispatchWorkItem!
item = DispatchWorkItem() {
let unsortedArray = (0..<10_000_000).shuffled()
let sortedArray = try? unsortedArray.sorted { (obj1, obj2) -> Bool in
if item.isCancelled {
throw SortError.cancelled
}
return obj1 < obj2
}
// do something with your sorted array
item = nil
}
DispatchQueue.global().async(execute: item)
return item
}
Where
enum SortError: Error {
case cancelled
}
Be forewarned that even in release builds, this can have a dramatic impact on performance. So you might want to benchmark this.
You could just write your own sort routine, inserting your own test of isCancelled inside the algorithm. This gives you more control over where precisely you perform the test (i.e., you might not want to do it for every comparison, but rather at some higher level loop within the algorithm, thereby minimizing the performance impact). And given the number of records, this gives you a chance to pick an algorithm best suited for your data set.
Obviously, when benchmarking these alternatives, make sure you test optimized/release builds so that your results are not skewed by your build settings.
As an aside, you might considering using Operation, too, as its handling of cancelation is more elegant, IMHO. Plus you can have a dedicated object for the sort operation, which is cleaner.
For a JMH class, Thread number is limited to 1 via #Threads(1).
However, when I get the number of threads using Thread.activeCount(), it shows that there are 2 threads.
The simplified version of the code is below:
#Fork(1)
#Warmup(iterations = 10)
#Measurement(iterations = 10)
#BenchmarkMode(Mode.AverageTime)
#OutputTimeUnit(TimeUnit.MICROSECONDS)
#Threads(1)
public class MyBenchmark {
#State(Scope.Benchmark)
public static class BState {
#Setup(Level.Trial)
public void initTrial() {
}
#TearDown(Level.Trial)
public void tearDownTrial(){
}
}
#Benchmark
public List<Integer> get(BState state) {
System.out.println("Thread number: " + Thread.activeCount());
...
List<byte[]> l = new ArrayList<byte[]>(state.dict.get(k));
...
}
}
Actually, the value is tried to get from the dictionary using its key. However, when 2 threads exist, the key is not able to get from the dictionary, and here list l becomes [].
Why the key is not taken? I limit the thread number because of this to 1.
Thread.activeCount() answers the number of threads in the system, not necessarily the number of benchmark threads. Using that to divide the work between the benchmark threads is dangerous because of this fundamental disconnect. ThreadParams may help to get the benchmark thread indexes, if needed, see the relevant example.
If you want a more conclusive answer, you need to provide MCVE that clearly highlights your problem.
Note: The problem that I solve has only educational purpose, I know that abstraction that I want to create is error prone and so on... I don't need fast solution, I need explanation.
In the book I am reading there is exercise that says that I need to implement SyncVar which has the following interface:
class SyncVar[T] {
def get(): T = ???
def put(x: T): Unit = ???
}
My comment: Alright seems understandable, need some sync variable that I can put or get.
A SyncVar object is used to exchange values between two or more threads.
When created, the SyncVar object is empty:
° Calling get throws an exception
° Calling put adds a value to the SyncVar object
After a value is added to a SyncVar object, we can say that it is non-empty:
° Calling get returns the current value, and changes the state to empty
° Calling put throws an exception
My thoughts: This is variable that throws exception on empty value when calling get, or put when we have a value, when we call get it clears previous value. Seems like I need to use Option.
So I provide the following implementation:
class SyncVar[T] {
var value: Option[T] = None
def get(): T = value match {
case Some(t) => this.synchronized {
value = None
t
}
case None => throw new IllegalArgumentException("error get")
}
def put(x: T): Unit = this.synchronized{
value match {
case Some(t) => throw new IllegalArgumentException("error put")
case None => value = Some(x)
}
}
def isEmpty = value.isEmpty
def nonEmpty = value.nonEmpty
}
My comment:
Synchronously invoking put and get, also have isEmpty and nonEmpty
The next task makes me confused:
The SyncVar object from the previous exercise can be cumbersome to use,
due to exceptions when the SyncVar object is in an invalid state. Implement
a pair of methods isEmpty and nonEmpty on the SyncVar object. Then,
implement a producer thread that transfers a range of numbers 0 until 15
to the consumer thread that prints them.
As I understand I need two threads:
//producer thread that produces numbers from 1 to 15
val producerThread = thread{
for (i <- 0 until 15){
println(s"$i")
if (syncVar.isEmpty) {
println(s"put $i")
syncVar.put(i)
}
}
}
//consumer that prints value from 0 to 15
val consumerThread = thread{
while (true) {
if (syncVar.nonEmpty) println(s"get ${syncVar.get()}")
}
}
Question:
But this code caused by nondeterminism, so it has different result each time, while I need to print numbers from 1 to 15 (in right order). Could you explain me what is wrong with my solution?
First, your synchronized in get is too narrow. It should surround the entire method, like in put (can you think why?).
After fixing, consider this scenario:
producerThread puts 0 into syncVar.
producerThread continues to run and tries to put 1. syncVar.isEmpty returns false so it doesn't put 1. It continues to loop with next i instead.
consumerThread gets 0.
producerThread puts 2.
Etc. So consumerThread can never get and print 1, because producerThread never puts it there.
Think what producerThread should do if syncVar is not empty and what consumerThread should do if it is.
Thanks to #Alexey Romanov, finally I implement transfer method:
Explanation:
The idea is, that producer thread checks is syncVar is empty, if it is it puts it, otherwise it waits with while(syncVar.nonEmpty){} (using busy waiting, which is bad practice, but it is important to know about it in educational purpose) and when we leaving the loop(stop busy waiting) we putting variable and leaving for loop for i == 0. Meanwhile consumer thread busy waiting forever, and reads variable when it is nonEmpty.
Solution:
def transfer() = {
val syncVar = new SyncVar[Int]
val producerThread = thread{
log("producer thread started")
for (i <- 0 until 15){
if (syncVar.isEmpty) {
syncVar.put(i)
} else {
while (syncVar.nonEmpty) {
log("busy wating")
}
if (syncVar.isEmpty) {
syncVar.put(i)
}
}
}
}
val consumerThread = thread{
log("consumer thread started")
while (true) {
if (syncVar.nonEmpty) {
syncVar.get()
}
}
}
}
I am very new to functional programming concepts and was watching a presentation by Neil Ford in youtube. There he talks about a counter to demonstrate a piece of code without using a global state(at 20:04). Coming from Java world, I have some difficulty to understand the concept here and how the counter is incremented. Below is the relevant code
def makeCounter() {
def very_local_variable = 0;
return {very_local_variable += 1}
}
c1 = makeCounter()
c1()
c1()
c1()
c2 = makeCounter()
println "C1 = ${c1()}, C2 = ${c2()}"
He goes on to say that C1 = 4, and C2 = 1 will be printed. How does this happen? I am sure my lack of understanding here stems from probably a conceptual failure in how Groovy works or probably there is something general in functional languages like Groovy, Scala etc. Does a local variable within a method maintains its state until the function is called again and assigned to another variable? (A google search with "functional counter groovy| scala" brings nothing)
I don't know why this question doesn't have an answer, but it probably deserves at least one for future visitors.
What you have written above is a simple example of closure. Closures are a language-agnostic idea, not tied to Groovy at all. You use them all the time e.g. in JavaScript.
A closure is essentially a function or reference to a function together with a referencing environment.
In your example each call to makeCounter introduces a new local very_local_variable, it's clear I think. What you do in the second line of that function is return a closure ({ .. } syntax) that takes no arguments and returns the local variable incremented by 1. The closure has access to that variable as long, as it exists (it's part of the referencing environment).
Each call to makeCounter creates a new local variable and returns a new closure, so each of them operates on its own local variable / environment.
Actually, if you come from Java island it shouldn't be that new to you. You have anonymous classes, which can access final variables, e.g.
Runnable makeCounter() {
final int[] very_local_variable = {0};
return new Runnable() {
#Override
public void run() {
very_local_variable[0] += 1;
}
};
}
Runnable c1 = makeCounter();
c1.run();
c1.run();
c1.run();
Declaring the local variable as array, returning it as Runnable functor and so on.. it all comes from Java's design and limits, but in essence it's really close to your code. Java 8 bridges the gap even closer:
Runnable makeCounter() {
int[] very_local_var = {0};
return () -> { very_local_variable[0] += 1; };
}