Spock: Data Pipes, and Variable Assignments - groovy

From Spock documentation:
Data tables, data pipes, and variable assignments can be combined as needed:
...
where:
a | _
3 | _
7 | _
0 | _
b << [5, 0, 0]
c = a > b ? a : b
This simple example produces a MissingPropertyException for y.
def test() {
expect:
x == 42
where:
y = 12
x << [y + 30, 54 - y]
}
What's wrong with this example?

You can't mix data pipe and variable assignment that way. Let's take a look at the source code. You can put a breakpoint in org.spockframework.runtime.ParameterizedSpecRunner class at line 49 and run the test with the debugger. You will see that currentFeature holds two parameters with names y and x:
You will also notice that there is a single dataProvider for variable name x defined:
This data provider exists because we have defined x as a data pipe, so it has to iterate over the list of variables and evaluate it in the context of data pipes. In this context it expects that y variable is also defined as a data pipe so it can take a value associated with the same index.
If you define your where: as:
where:
y << [12, 12]
x << [y + 30, 54 - y]
your test would succeed, because now y variable exists as a data provider and evaluating values for x accesses values for y using second data provider.
How to combine variable assignment with data pipes?
Consider following example:
#Unroll
def "variable assignment example"() {
expect:
x == 42
and:
c
where:
y << [12, 12]
x << [y + 30, 54 - y]
c = y < x
}
In this case variable c is evaluated twice for each element in data pipes y and x. When the first unroll happens y = 12, x = y + 30 => 12 + 30 => 42 and c evaluates to true because y<x. When the second unroll happens 12 value is assigned to y again, x evaluates to 42 (54 - y => 54 - 12 => 42) and c evaluates to true again.
Conclusion
I see that it may look like simple y = 12 variable assignment should be discovered by data pipe evaluation. Unfortunately it does not work that way. When data pipe gets evaluated it can only use values from other data pipes and any variable assignment is not seen in this context.

I think assignment y=12 is not allowed in a where block.
For example, this version works, two tests run as expected:
class SampleTest extends Specification {
#Unroll
def "test"() {
expect:
x == 42
where:
x << [12 + 30, 54 - 12]
}
}
If you absolutely need an assignment, the following will also work:
class SampleTest extends Specification {
#Unroll
def "test"() {
expect:
x == 42
where:
x << myfunc()
}
def myfunc() {
def y = 12
[y + 30, 54 - y]
}
}

Related

Pick elements on (un)even index from an array in Groovy

I have a Groovy array containing digits of a number. I need to create two new arrays containing only the digits at even resp. uneven positions from that array.
The best way that I could find is this, but I feel there's quite a lot of room for improvement here:
def evenDigits = digits
.indexed(1)
.findAll { i, v -> i % 2 == 0 }
.collect { it.value }
Obviously the unevenDigits variant would be to simply check the modulus in the findAll closure against 1 instead of 0.
Does anyone know if this code can be improved or compacted?
A "less smarter" (and definitely more performant) solution:
def evens = [], odds = []
digits.eachWithIndex{ v, ix -> ( ix & 1 ? odds : evens ) << v }
You can use groupBy to separate the results to odd/even items. E.g.
groovy:000> ["a","b","c"].indexed(1).groupBy{ i, _ -> i & 1 }.collectEntries{ k, v -> [k as Boolean, v.values()] }
===> [true:[a, c], false:[b]]
One more "Groovy" solution that uses withIndex() and findResults() combination.
withIndex() transforms a List<T> to List<Tuple2<T,Integer>> - a list of value-index tuples.
findResults(closure) runs filtering transformation - the closure it receives is a transforming predicate. In our case, it checks if the index value is odd or even and extracts the value from tuple if the predicate matches. (All null values are filtered out.)
Short and concise. Requires a minimal number of transformations: List<T> to List<Tuple2<T,Integer>> and then a single iteration to produce the final result.
def numbers = [1,2,3,4,5,6,2,3,1] // Some test data
def even = { t -> t.second % 2 == 0 ? t.first : null } // "Even" transforming predicate
def odd = { t -> t.second % 2 == 1 ? t.first : null } // "Odd" transforming predicate
def evens = numbers.withIndex(1).findResults even
def odds = numbers.withIndex(1).findResults odd
// And some assertions to test the implementation
assert evens == [2,4,6,3]
assert odds == [1,3,5,2,1]
Another option, for a single pass (but still with the intermediate collection due to indexed), would be a reduce:
def (odd,even) = digits.indexed().inject([[],[]]){ acc, it -> acc[it.key&1] << it.value; acc }
I came up with this, but it's probably not the cleverest way.
def isEven = { int x -> x % 2 == 0 ? x : null}
def (digits, evens, odds) = [[1, 2, 3, 4, 5, 6, 7, 8, 9], [], []]
digits.each {
if (isEven(it))
evens.add(isEven(it))
}
odds = digits - evens
assert evens == [2, 4, 6, 8]
assert odds == [1, 3, 5, 7, 9]

how do i write multiple function outputs to single csv file

i am scraping multiple websites so i am using one function for each website script, so each function returns 4 values, i want to print them in dataframe and write them in csv but i am facing this problem, i may be asking something too odd or basic but please help
Either i will have to write whole script in one block and that will look very nasty to handle so if i could find a way around, this is just a sample of problem i am facing..
def a1(x):
z=x+1
r = x+2
print(z, r)
def a2(x):
y=x+4
t=x+3
print(y, t)
x = 2
a1(x)
a2(x)
3 4
6 5
data = pd.Dataframe({'first' : [z],
'second' : [r],
'third' : [y],
'fourth' : [t]
})`
data
*error 'z' is not defined*
You may find it convenient to write functions that return a list of dicts.
For example:
rows = [dict(a=1, b=2, c=3),
dict(a=4, b=5, c=6)]
df = pd.DataFrame(rows)
The variables are only defined in the local scope of your functions, you'd either need to declare them globally or - the better way - return them so you can use them outside of the function by assigning the return values to new variables
import pandas as pd
def a1(x):
z = x+1
r = x+2
return (z, r)
def a2(x):
y = x+4
t = x+3
return (y, t)
x = 2
z, r = a1(x)
y, t = a2(x)
data = pd.DataFrame({'first' : [z],
'second' : [r],
'third' : [y],
'fourth' : [t]
})

How does spock determine which iteratable is greater than the other?

I'm reading this doc in spock: http://spockframework.org/spock/docs/1.1/data_driven_testing.html and came across Data Variable Assignment part which has this snippet of code:
a = 3
b = Math.random() * 100
c = a > b ? a : b
but what if I tried using an iterable
a << [6, 2, 0]
b << [4, 10, 10]
c = a > b ? a : b
It doesn't.
Spock does heavy rewriting of your test specifications (by registering AST transformations with the groovy compiler).
For the where: clause it creates code to iterate over to provided lists for a and b (and assigning each value in turn to these local variables) and it runs the code for c = a > b ? a : b on each iteration.
The code transformation for the where: clause is located in the class org.spockframework.compiler.WhereBlockRewriter (https://github.com/spockframework/spock/blob/master/spock-core/src/main/java/org/spockframework/compiler/WhereBlockRewriter.java)

Python 3 name tuple position

If I have the coordinate tuple (10.1, 15.2), how can I make it so that I can call 10.1 as simply x instead of coordinates[0], and y instead of coordinates[1]?
I want to do this so that I can pass a tuple from function to function while still being able to call x and y easily. I could just:
x = coordinates[0]
y = coordinated[1]
but that seems like a bad idea -- lengthy and I'd have to repeat it for each function.
Use namedtuple:
> from collections import namedtuple
> c = namedtuple('Coords',['x','y'])
> xy = c(5,6)
> xy
=> Coords(x=5, y=6)
> xy.x
=> 5
> xy.y
=> 6

Unpack multiple variables from sequence

I am expecting the code below to print chr7.
import strutils
var splitLine = "chr7 127471196 127472363 Pos1 0 +".split()
var chrom, startPos, endPos = splitLine[0..2]
echo chrom
Instead it prints #[chr7, 127471196, 127472363].
Is there a way to unpack multiple values from sequences at the same time?
And what would the tersest way to do the above be if the elements weren't contiguous? For example:
var chrom, startPos, strand = splitLine[0..1, 5]
Gives the error:
read_bed.nim(8, 40) Error: type mismatch: got (seq[string], Slice[system.int], int literal(5))
but expected one of:
system.[](a: array[Idx, T], x: Slice[system.int])
system.[](s: string, x: Slice[system.int])
system.[](a: array[Idx, T], x: Slice[[].Idx])
system.[](s: seq[T], x: Slice[system.int])
var chrom, startPos, strand = splitLine[0..1, 5]
^
This can be accomplished using macros.
import macros
macro `..=`*(lhs: untyped, rhs: tuple|seq|array): auto =
# Check that the lhs is a tuple of identifiers.
expectKind(lhs, nnkPar)
for i in 0..len(lhs)-1:
expectKind(lhs[i], nnkIdent)
# Result is a statement list starting with an
# assignment to a tmp variable of rhs.
let t = genSym()
result = newStmtList(quote do:
let `t` = `rhs`)
# assign each component to the corresponding
# variable.
for i in 0..len(lhs)-1:
let v = lhs[i]
# skip assignments to _.
if $v.toStrLit != "_":
result.add(quote do:
`v` = `t`[`i`])
macro headAux(count: int, rhs: seq|array|tuple): auto =
let t = genSym()
result = quote do:
let `t` = `rhs`
()
for i in 0..count.intVal-1:
result[1].add(quote do:
`t`[`i`])
template head*(count: static[int], rhs: untyped): auto =
# We need to redirect this through a template because
# of a bug in the current Nim compiler when using
# static[int] with macros.
headAux(count, rhs)
var x, y: int
(x, y) ..= (1, 2)
echo x, y
(x, _) ..= (3, 4)
echo x, y
(x, y) ..= #[4, 5, 6]
echo x, y
let z = head(2, #[4, 5, 6])
echo z
(x, y) ..= head(2, #[7, 8, 9])
echo x, y
The ..= macro unpacks tuple or sequence assignments. You can accomplish the same with var (x, y) = (1, 2), for example, but ..= works for seqs and arrays, too, and allows you to reuse variables.
The head template/macro extracts the first count elements from a tuple, array, or seqs and returns them as a tuple (which can then be used like any other tuple, e.g. for destructuring with let or var).
For anyone that's looking for a quick solution, here's a nimble package I wrote called unpack.
You can do sequence and object destructuring/unpacking with syntax like this:
someSeqOrTupleOrArray.lunpack(a, b, c)
[a2, b2, c2] <- someSeqOrTupleOrArray
{name, job} <- tim
tom.lunpack(job, otherName = name)
{job, name: yetAnotherName} <- john
Currently pattern matching in Nim only works with tuples. This also makes sense, because pattern matching requires a statically known arity. For instance, what should happen in your example, if the seq does not have a length of three? Note that in your example the length of the sequence can only be determined at runtime, so the compiler does not know if it is actually possible to extract three variables.
Therefore I think the solution which was linked by #def- was going in the right direction. This example uses arrays, which do have a statically known size. In this case the compiler knows the tuple arity, i.e., the extraction is well defined.
If you want an alternative (maybe convenient but unsafe) approach you could do something like this:
import macros
macro extract(args: varargs[untyped]): typed =
## assumes that the first expression is an expression
## which can take a bracket expression. Let's call it
## `arr`. The generated AST will then correspond to:
##
## let <second_arg> = arr[0]
## let <third_arg> = arr[1]
## ...
result = newStmtList()
# the first vararg is the "array"
let arr = args[0]
var i = 0
# all other varargs are now used as "injected" let bindings
for arg in args.children:
if i > 0:
var rhs = newNimNode(nnkBracketExpr)
rhs.add(arr)
rhs.add(newIntLitNode(i-1))
let assign = newLetStmt(arg, rhs) # could be replaced by newVarStmt
result.add(assign)
i += 1
#echo result.treerepr
let s = #["X", "Y", "Z"]
s.extract(a, b, c)
# this essentially produces:
# let a = s[0]
# let b = s[1]
# let c = s[2]
# check if it works:
echo a, b, c
I do not have included a check for the seq length yet, so you would simply get out-of-bounds error if the seq does not have the required length. Another warning: If the first expression is not a literal, the expression would be evaluated/calculated several times.
Note that the _ literal is allowed in let bindings as a placeholder, which means that you could do things like this:
s.extract(a, b, _, _, _, x)
This would address your splitLine[0..1, 5] example, which btw is simply not a valid indexing syntax.
yet another option is package definesugar:
import strutils, definesugar
# need to use splitWhitespace instead of split to prevent empty string elements in sequence
var splitLine = "chr7 127471196 127472363 Pos1 0 +".splitWhitespace()
echo splitLine
block:
(chrom, startPos, endPos) := splitLine[0..2]
echo chrom # chr7
echo startPos # 127471196
echo endPos # 127472363
block:
(chrom, startPos, strand) := splitLine[0..1] & splitLine[5] # splitLine[0..1, 5] not supported
echo chrom
echo startPos
echo strand # +
# alternative syntax
block:
(chrom, startPos, *_, strand) := splitLine
echo chrom
echo startPos
echo strand
see https://forum.nim-lang.org/t/7072 for recent discussion

Resources