Lazy evaluation of chained functional methods in Groovy - groovy

What I've seen in Java
Java 8 allows lazy evaluation of chained functions in order to avoid performance penalties.
For instance, I can have a list of values and process it like this:
someList.stream()
.filter( v -> v > 0)
.map( v -> v * 4)
.filter( v -> v < 100)
.findFirst();
I pass a number of closures to the methods called on a stream to process the values in a collection and then only grab the first one.
This looks as if the code had to iterate over the entire collection, filter it, then iterate over the entire result and apply some logic, then filter the whole result again and finally grab just a single element.
In reality, the compiler handles this in a smarter way and optimizes the number of iterations required.
This is possible because no actual processing is done until findFirst is called. This way the compiler knows what I want to achieve and it can figure out how to do it in an efficient manner.
Take a look at this video of a presentation by Venkat Subramaniam for a longer explanation.
What I'd like to do in Groovy
While answering a question about Groovy here on StackOverflow I figured out a way to perform the task the OP was trying to achieve in a more readable manner. I refrained from suggesting it because it meant a performance decrease.
Here's the example:
collectionOfSomeStrings.inject([]) { list, conf -> if (conf.contains('homepage')) { list } else { list << conf.trim() } }
Semantically, this could be rewritten as
collectionOfSomeStrings.grep{ !it.contains('homepage')}.collect{ it.trim() }
I find it easier to understand but the readability comes at a price. This code requires a pass of the original collection and another iteration over the result of grep. This is less than ideal.
It doesn't look like the GDK's grep, collect and findAll methods are lazily evaluated like the methods in Java 8's streams API. Is there any way to have them behave like this? Is there any alternative library in Groovy that I could use?
I imagine it might be possible to use Java 8 somehow in Groovy and have this functionality. I'd welcome an explanation on the details but ideally, I'd like to be able to do that with older versions of Java.
I found a way to combine closures but it's not really what I want to do. I'd like to chain not only closures themselves but also the functions I pass them to.
Googling for Groovy and Streams mostly yields I/O related results. I haven't found anything of interest by searching for lazy evaluation, functional and Groovy as well.

Adding the suggestion as an answer taking cfrick's comment as an example:
#Grab( 'com.bloidonia:groovy-stream:0.8.1' )
import groovy.stream.Stream
List integers = [ -1, 1, 2, 3, 4 ]
//.first() or .last() whatever is needed
Stream.from integers filter{ it > 0 } map{ it * 4 } filter{ it < 15 }.collect()
Tim, I still know what you did few summers ago. ;-)

Groovy 2.3 supports jdk8 groovy.codehaus.org/Groovy+2.3+release+notes. your example works fine using groovy closures:
[-1,1,2,3,4].stream().filter{it>0}.map{it*4}.filter{it < 100}.findFirst().get()
If you can't use jdk8, you can follow the suggestion from the other answer or achieve "the same" using RxJava/RxGroovy:
#Grab('com.netflix.rxjava:rxjava-groovy:0.20.7')
import rx.Observable
Observable.from( [-1, 1, 2, 3, 4, 666] )
.filter { println "f1 $it"; it > 0 }
.map { println "m1 $it"; it * 4 }
.filter { println "f2 $it"; it < 100 }
.subscribe { println "result $it" }

Related

Which is best: collection.each or for each? [duplicate]

This question already has an answer here:
Groovy collections performance considerations regarding space/time
(1 answer)
Closed 6 years ago.
Which of these two ways of transversing a list is recommended:
List list = [1, 2, 3]
list.each { num ->
// Do something
}
or
List list = [1, 2, 3]
for (num in list) {
// Do something
}
?
Does any of these make any difference in performance? I think, at least in terms of readability, they are both the same.
It depends on what you are doing inside of the block of code. You might want to be taking advantage of the closure's delegate or you might have a closure you want to re-use in different contexts, which would be arguments for using each. If none of that applies, the for loop probably makes more sense and will perform a small bit better.
From the usage POV, the main difference between them is the way how you abnormally terminate the loop.
With for-each it's straight-forward:
for (num in list) {
if( someCond ) break
}
whereas with each you would have to throw an exception:
list.each{
if( someCond ) throw new Exception( "" )
}
which is not so nice.

GPars syntax unfamiliar

I'm coming from a Java background and I'm stuck learning Groovy and Gradle at the same time, since my purpose for one is the other. :-/ I'm also in need of the GPars stuff since speed and parallelism are an issue. Anyway, I see this GPars example and I have some questions which I think are linguistic nuances, rather than library issues, that I don't yet understand.
//check whether all elements within a collection meet certain criteria
GParsPool.withPool(5) { ForkJoinPool pool ->
assert [1, 2, 3, 4, 5].everyParallel {it > 0}
assert ![1, 2, 3, 4, 5].everyParallel {it > 1}
}
I see ForkJoinPool pool ->... Why aren't the two lines wrapped in braces like so. Seems like you would loose track of the scope if it were just an optional omission, like for semicolons:
//check whether all elements within a collection meet certain criteria
GParsPool.withPool(5) { ForkJoinPool pool -> {
assert [1, 2, 3, 4, 5].everyParallel {it > 0}
assert ![1, 2, 3, 4, 5].everyParallel {it > 1}
}
}
What is it? Is it an iterator? Where did it come from?
By what means is it possible to call .everyParallel on an Object when it's never been explicitly wrapped by something that has that function as far as I can tell?
I'll start with the disclaimer that I am by no means a GPars expert, but I have used it in a couple of situations, so hopefully there can be something helpful here (updates from the community are welcome).
Closures
Groovy Closures are blocks of code that can be passed around. When a parameter is passed into the block, it will come in before the -> notation. For example:
GParsPool.withPool(5) { ForkJoinPool pool ->
// Here the `pool` object is available to use for processing.
}
In the case that you do not provide a defined variable, there is an implicit it object included. The above closure could be written in the following ways:
GParsPool.withPool(5) { Object it ->
// Generically stating that a single object will be passed in, called "it". In this example it is a `ForkJoinPool` object.
}
GParsPool.withPool(5) {
// No "it" object is specified, but you can still use "it" because it is implied.
}
everyParallel()
The GParsPool class enables a ParallelArray-based (from JSR-166y)
concurrency DSL for collections and objects. Source
If I understand this correctly, there is functionality automatically added when using GParsPool.withPool(), which allows you to use methods like everyParallel(). Dynamic programming FTW! I'm guessing it uses the Groovy metaClass capability to add methods dynamically at runtime, so that you can call them without adding them yourself.

What is closure and why to use it?

What is closure in groovy?
Why we use this closure?
Are you asking about Closure annotation parameters?
[...
An interesting feature of annotations in Groovy is that you can use a closure as an annotation value. Therefore annotations may be used with a wide variety of expressions and still have IDE support. For example, imagine a framework where you want to execute some methods based on environmental constraints like the JDK version or the OS. One could write the following code:
class Tasks {
Set result = []
void alwaysExecuted() {
result << 1
}
#OnlyIf({ jdk>=6 })
void supportedOnlyInJDK6() {
result << 'JDK 6'
}
#OnlyIf({ jdk>=7 && windows })
void requiresJDK7AndWindows() {
result << 'JDK 7 Windows'
}
}
...]
Source:http://docs.groovy-lang.org/
Closures are a powerful concept with which you can implement a variety of things and which enable specifying DSLs. They are sort of like Java ( lambdas, but more powerful and versatile. You dont need to use closures, but they can make many things easier.
Since you didnt really specify a concrete question, I'll just point you to the startegy pattern example in the groovy docs:
http://docs.groovy-lang.org/latest/html/documentation/#_strategy_pattern
Think of the closure as an executable unit on its own, like a method or function, except that you can pass it around like a variable, but can do a lot of things that you would normally do with a class, for example.
An example: You have a list of numbers and you either want to add +1 to each number, or you want to double each number, so you say
def nums = [1,2,3,4,5]
def plusone = { item ->
item + 1
}
def doubler = { item ->
item * 2
}
println nums.collect(plusone)
println nums.collect(doubler)
This will print out
[2, 3, 4, 5, 6]
[2, 4, 6, 8, 10]
So what you achieved is that you separated the function, the 'what to do' from the object that you did it on. Your closures separate an action that can be passed around and used by other methods, that are compatible with the closure's input and output.
What we did in the example is that we had a list of numbers and we passed each of them to a closure that did something with it. Either added +1 or doubled the value, and collected them into another list.
And this logic opens up a whole lot of possibilities to solve problems smarter, cleaner, and write code that represents the problem better.

What are the benefits of `while(condition) { //work }` and `do { //work } while(condition)`?

I found myself confronted with an interview question where the goal was to write a sorting algorithm that sorts an array of unsorted int values:
int[] unsortedArray = { 9, 6, 3, 1, 5, 8, 4, 2, 7, 0 };
Now I googled and found out that there are so many sorting algorithms out there!
Finally I could motivate myself to dig into Bubble Sort because it seemed pretty simple to start with.
I read the sample code and came to a solution looking like this:
static int[] BubbleSort(ref int[] array)
{
long lastItemLocation = array.Length - 1;
int temp;
bool swapped;
do
{
swapped = false;
for (int itemLocationCounter = 0; itemLocationCounter < lastItemLocation; itemLocationCounter++)
{
if (array[itemLocationCounter] > array[itemLocationCounter + 1])
{
temp = array[itemLocationCounter];
array[itemLocationCounter] = array[itemLocationCounter + 1];
array[itemLocationCounter + 1] = temp;
swapped = true;
}
}
} while (swapped);
return array;
}
I clearly see that this is a situation where the do { //work } while(cond) statement is a great help to be and prevents the use of another helper variable.
But is this the only case that this is more useful or do you know any other application where this condition has been used?
In general:
use do...while when you want the body to be executed at least once.
use while... when you may not want the body to be executed at all.
EDIT: I'd say the first option comes up about 10% of the time and the second about 90%. You can always re-factor to use either, in either circumstance. Use the one that's closest to what you want to say.
do...while guarantees that the body of code inside the loop executes at least once. This can be handy under certain conditions; when coding a REPL loop, for example.
Anytime you need to loop through some code until a condition is met is a good example of when to use do...while or while...
A good example of when to use do...while or while... is if you have a game or simulation where the game engine is continuously running the various components until some condition occurs like you win or lose.
Of course this is only one example.
The above posts are correct regarding the two conditional looping forms. Some languages have a repeat until form instead of do while. Also there's a minimalist view where only the necessary control structures should exist in a language. The while do is necessary but the do while isn't. And as for bubble sort you'll want to avoid going there as it's the slowest of the commonly known sorting algorithms. Look at selection sort or insertion sort instead. Quicksort and merge sort are fast but are hard to write without using recursion and perform badly if you happen to chose a poor pivot value.

is it possible to break out of closure in groovy

is there a way to 'break' out of a groovy closure.
maybe something like this:
[1, 2, 3].each {
println(it)
if (it == 2)
break
}
I often forget that Groovy implements an "any" method.
[1, 2, 3].any
{
println it
return (it == 2)
}​
12/05/2013 Heavily Edited.
Answering the question that was asked.
Is it possible to break out of a Closure?
You would "break" out of a closure by issuing the return keyword. However that isn't helpful in the example that is given. The reason for this is that the closure (think of it as a method) is called by the each method for every item in the collection.
If you run this example you will see it will print 1 then 3.
[1, 2, 3].each {
if (it == 2) return
println(it)
}
Why break in the context of each doesn't make sense.
To understand why you cannot break out of the each method like you could break out of a for loop you need to understand a bit of what is actually happening. Here is a gross simplification what the each method on a collection does.
myEach([0,1,3])
void myEach(List things) {
for (i in things) {
myEachMethod(i)
}
}
void myEachMethod(Object it) { // this is your Closure
if(it == 2) return
println it
}
As you can see the closure is basically a method that can be passed around. Just as in java you cannot break from within method call or closure.
What to do instead of breaking from each.
In Groovy you are supposed to express your code using high level abstractions as such primitive looping is not idiomatic. For the example that you gave I would consider making use of findAll. For example:
[1,2,3].findAll { it < 2 }.each { println it }
I hope this helps you understand what is going on.
Answering the implied question.
Can you break out of the Collection.each iterations against your supplied closure?
You cannot break out of the each method without throwing and catching an exception as John Wagenleitner has said. Although I would argue that throwing and catching an exception in the name of flow control is a code smell and a fellow programmer might slap your hands.
You can throw an exception:
try {
[1, 2, 3].each {
println(it)
if (it == 2)
throw new Exception("return from closure")
}
} catch (Exception e) { }
Use could also use "findAll" or "grep" to filter out your list and then use "each".
[1, 2, 3].findAll{ it < 3 }.each{ println it }
Take a look at Best pattern for simulating continue in groovy closure for an extensive discussion.
Try to use any instead of each
def list = [1, 2, 3, 4, 5, -1, -2]
list.any { element ->
if (element > 3)
return true // break
println element
}
The result : 1, 2, 3
Just using special Closure
// declare and implement:
def eachWithBreak = { list, Closure c ->
boolean bBreak = false
list.each() { it ->
if (bBreak) return
bBreak = c(it)
}
}
def list = [1,2,3,4,5,6]
eachWithBreak list, { it ->
if (it > 3) return true // break 'eachWithBreak'
println it
return false // next it
}
There is an other solution. Although, that groovy stuff like each/find/any is quite cool: if it doesn't fit, don't use it. You can still use the plain old
for (def element : list)
Especially, if you want to leave the method, too. Now you are free to use continue/break/return as you like. The resulting code might not be cool, but it is easy and understandable.
This is in support of John Wagenleiter's answer. Tigerizzy's answer is plain wrong. It can easily be disproved practically by executing his first code sample, or theoretically by reading Groovy documentation. A return returns a value (or null without an argument) from the current iteration, but does not stop the iteration. In a closure it behaves rather like continue.
You won't be able to use inject without understanding this.
There is no way to 'break the loop' except by throwing an exception. Using exceptions for this purpose is considered smelly. So, just as Wagenleiter suggests, the best practice is to filter out the elements you want to iterate over before launching each or one of its cousins.
With rx-java you can transform an iterable in to an observable.
Then you can replace continue with a filter and break with takeWhile
Here is an example:
import rx.Observable
Observable.from(1..100000000000000000)
.filter { it % 2 != 1}
.takeWhile { it<10 }
.forEach {println it}

Resources