How can I compare two lists in Groovy - groovy

How can I compare the items in two lists and create a new list with the difference in Groovy?

I'd just use the arithmetic operators, I think it's much more obvious what's going on:
def a = ["foo", "bar", "baz", "baz"]
def b = ["foo", "qux"]
assert ["bar", "baz", "baz", "qux"] == ((a - b) + (b - a))

Collections intersect might help you with that even if it is a little tricky to reverse it. Maybe something like this:
def collection1 = ["test", "a"]
def collection2 = ["test", "b"]
def commons = collection1.intersect(collection2)
def difference = collection1.plus(collection2)
difference.removeAll(commons)
assert ["a", "b"] == difference

I assume the OP is asking for the exclusive disjunction between two lists.
(Note: Neither of the previous solutions handle duplicates!)
If you want to code it yourself in Groovy, do the following:
def a = ['a','b','c','c','c'] // diff is [b, c, c]
def b = ['a','d','c'] // diff is [d]
// for quick comparison
assert (a.sort() == b.sort()) == false
// to get the differences, remove the intersection from both
a.intersect(b).each{a.remove(it);b.remove(it)}
assert a == ['b','c','c']
assert b == ['d']
assert (a + b) == ['b','c','c','d'] // all diffs
One gotcha when using lists/arrays of ints. You may have problems due to the polymorphic method remove(int) vs remove(Object). See here for a (untested) solution.
Rather than reinventing the wheel, you should just use a library (e.g. commons-collections):
#Grab('commons-collections:commons-collections:3.2.1')
import static org.apache.commons.collections.CollectionUtils.*
def a = ['a','b','c','c','c'] // diff is [b, c, c]
def b = ['a','d','c'] // diff is [d]
assert disjunction(a, b) == ['b', 'c', 'c', 'd']

If it is a list of numbers, you can do this:
def before = [0, 0, 1, 0]
def after = [0, 1, 1, 0]
def difference =[]
for (def i=0; i<4; i++){
difference<<after[i]-before[i]
}
println difference //[0, 1, 0, 0]

Related

creating a list of tuples based on successive items of initial list [duplicate]

I sometimes need to iterate a list in Python looking at the "current" element and the "next" element. I have, till now, done so with code like:
for current, next in zip(the_list, the_list[1:]):
# Do something
This works and does what I expect, but is there's a more idiomatic or efficient way to do the same thing?
Some answers to this problem can simplify by addressing the specific case of taking only two elements at a time. For the general case of N elements at a time, see Rolling or sliding window iterator?.
The documentation for 3.8 provides this recipe:
import itertools
def pairwise(iterable):
"s -> (s0, s1), (s1, s2), (s2, s3), ..."
a, b = itertools.tee(iterable)
next(b, None)
return zip(a, b)
For Python 2, use itertools.izip instead of zip to get the same kind of lazy iterator (zip will instead create a list):
import itertools
def pairwise(iterable):
"s -> (s0, s1), (s1, s2), (s2, s3), ..."
a, b = itertools.tee(iterable)
next(b, None)
return itertools.izip(a, b)
How this works:
First, two parallel iterators, a and b are created (the tee() call), both pointing to the first element of the original iterable. The second iterator, b is moved 1 step forward (the next(b, None)) call). At this point a points to s0 and b points to s1. Both a and b can traverse the original iterator independently - the izip function takes the two iterators and makes pairs of the returned elements, advancing both iterators at the same pace.
Since tee() can take an n parameter (the number of iterators to produce), the same technique can be adapted to produce a larger "window". For example:
def threes(iterator):
"s -> (s0, s1, s2), (s1, s2, s3), (s2, s3, 4), ..."
a, b, c = itertools.tee(iterator, 3)
next(b, None)
next(c, None)
next(c, None)
return zip(a, b, c)
Caveat: If one of the iterators produced by tee advances further than the others, then the implementation needs to keep the consumed elements in memory until every iterator has consumed them (it cannot 'rewind' the original iterator). Here it doesn't matter because one iterator is only 1 step ahead of the other, but in general it's easy to use a lot of memory this way.
Roll your own!
def pairwise(iterable):
it = iter(iterable)
a = next(it, None)
for b in it:
yield (a, b)
a = b
Starting in Python 3.10, this is the exact role of the pairwise function:
from itertools import pairwise
list(pairwise([1, 2, 3, 4, 5]))
# [(1, 2), (2, 3), (3, 4), (4, 5)]
or simply pairwise([1, 2, 3, 4, 5]) if you don't need the result as a list.
I’m just putting this out, I’m very surprised no one has thought of enumerate().
for (index, thing) in enumerate(the_list):
if index < len(the_list):
current, next_ = thing, the_list[index + 1]
#do something
Since the_list[1:] actually creates a copy of the whole list (excluding its first element), and zip() creates a list of tuples immediately when called, in total three copies of your list are created. If your list is very large, you might prefer
from itertools import izip, islice
for current_item, next_item in izip(the_list, islice(the_list, 1, None)):
print(current_item, next_item)
which does not copy the list at all.
Iterating by index can do the same thing:
#!/usr/bin/python
the_list = [1, 2, 3, 4]
for i in xrange(len(the_list) - 1):
current_item, next_item = the_list[i], the_list[i + 1]
print(current_item, next_item)
Output:
(1, 2)
(2, 3)
(3, 4)
I am really surprised nobody has mentioned the shorter, simpler and most importantly general solution:
Python 3:
from itertools import islice
def n_wise(iterable, n):
return zip(*(islice(iterable, i, None) for i in range(n)))
Python 2:
from itertools import izip, islice
def n_wise(iterable, n):
return izip(*(islice(iterable, i, None) for i in xrange(n)))
It works for pairwise iteration by passing n=2, but can handle any higher number:
>>> for a, b in n_wise('Hello!', 2):
>>> print(a, b)
H e
e l
l l
l o
o !
>>> for a, b, c, d in n_wise('Hello World!', 4):
>>> print(a, b, c, d)
H e l l
e l l o
l l o
l o W
o W o
W o r
W o r l
o r l d
r l d !
This is now a simple Import As of 16th May 2020
from more_itertools import pairwise
for current, next in pairwise(your_iterable):
print(f'Current = {current}, next = {nxt}')
Docs for more-itertools
Under the hood this code is the same as that in the other answers, but I much prefer imports when available.
If you don't already have it installed then:
pip install more-itertools
Example
For instance if you had the fibbonnacci sequence, you could calculate the ratios of subsequent pairs as:
from more_itertools import pairwise
fib= [1,1,2,3,5,8,13]
for current, nxt in pairwise(fib):
ratio=current/nxt
print(f'Curent = {current}, next = {nxt}, ratio = {ratio} ')
As others have pointed out, itertools.pairwise() is the way to go on recent versions of Python. However, for 3.8+, a fun and somewhat more concise (compared to the other solutions that have been posted) option that does not require an extra import comes via the walrus operator:
def pairwise(iterable):
a = next(iterable)
yield from ((a, a := b) for b in iterable)
A basic solution:
def neighbors( list ):
i = 0
while i + 1 < len( list ):
yield ( list[ i ], list[ i + 1 ] )
i += 1
for ( x, y ) in neighbors( list ):
print( x, y )
Pairs from a list using a list comprehension
the_list = [1, 2, 3, 4]
pairs = [[the_list[i], the_list[i + 1]] for i in range(len(the_list) - 1)]
for [current_item, next_item] in pairs:
print(current_item, next_item)
Output:
(1, 2)
(2, 3)
(3, 4)
code = '0016364ee0942aa7cc04a8189ef3'
# Getting the current and next item
print [code[idx]+code[idx+1] for idx in range(len(code)-1)]
# Getting the pair
print [code[idx*2]+code[idx*2+1] for idx in range(len(code)/2)]

How to make create two list with one generator in python

I am trying to create two separate lists from a base list with only one generator but do not know how to do it.
this is the idea, I am wondering if there is a way to create the list's b and c below while only looping through a once.
a = [[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]]
b = [x[:2] for x in a]
c = [x[2:] for x in a]
What I did before this was just use a for loop through a and constantly append x[:2], x[2:] to b, c with every iteration but after using timeit module I found that using a generator is actually faster, and so I moved on to using two separate generators but now after using timeit with the above python code it seems to be just as slow as before the generators. I suspect it is because I am iterating through the list a twice now.
So basically my question is, what is the most efficient way to create b and c given a two dimensional list, for my application the base list, a, are quite large and so I need it as efficient as possible.
TL;DR: I would suggest to keep using list comprehension and bench mark before making any optimizations (if you really think you need them).
I tried four ways:
Using loops:
def use_loop(a):
b = []
c = []
for item in a:
b.append(item[:2])
c.append(item[2:])
return (b,c)
Using list comprehension twice:
def use_comprehension(a):
b = [x[:2] for x in a]
c = [x[2:] for x in a]
return (b,c)
Using list comprehension with zip
def use_comprehension_with_zip(a):
data = [[], []]
b, c = zip(*[(x[:2], x[2:]) for x in a])
return (list(b),list(c))
Usings threads is overkill and it will definitely increase your time.
def get_shorter_list(a, index, ans):
if index == 0:
for item in a:
ans.append(item[:2])
else:
for item in a:
ans.append(item[2:])
def use_threads(a):
b = []
c = []
data = {0:b, 1:c}
threads = []
for x in range(2):
thread = threading.Thread(target = get_shorter_list, args=(a, x, data.get(x)))
thread.start()
threads.append(thread)
for thread in threads:
thread.join()
return (b,c)
Then I used timeit for the three methods
function_list = [
use_loop,
use_comprehension,
use_comprehension_with_zip,
use_threads
]
assert use_loop(a) == use_comprehension(a) == use_comprehension_with_zip(a) == use_threads(a)
for function in function_list:
print(f'Time taken by {function.__name__}: {timeit.timeit("function(a)", globals = globals(), number = 10000)}')
That did not show a huge degradation by using list comprehension which I always prefer for clarity, brevity and speed.
Time taken by use_loop: 0.01102178400469711
Time taken by use_comprehension: 0.011585937994823325
Time taken by use_comprehension_with_zip: 0.0187399349961197
Time taken by use_threads: 2.036599840997951

Split list into sub-lists, similar to collate but with a max size of the result

Let's say I have a list
def letters = 'a' .. 'g'
I know that I can use collate to create a list of sub-lists of equal size (plus the remainder).
assert letters.collate(3) == [['a', 'b', 'c'], ['d', 'e', 'f'], ['g']]
But what I want is to get a list of a specific size of sub-lists, where the items of the original list are split into sublists that are as big as needed to get the most equal distribution of the sub-list size. Example:
def numbers = 1..7
assert numbers.collateIntoFixedSizedList(5) == [[1,2], [3,4], [5], [6], [7]]
// the elements that contain two items could be at the end of the list as well
// doesn't matter much to me
assert numbers.collateIntoFixedSizedList(5) == [[1], [2], [3], [4,5], [6,7]]
Lists that are smaller than the max_size would produce a list of the same size as the original of single element lists:
def numbers = 1..7
assert numbers.collateIntoFixedSizeList(10) == [[1],[2],[3],[4],[5],[6],[7]]
Does anybody know whether such magic exists or will I have to code this up myself?
There's nothing built in to Groovy to do this, but you could write your own:
def fancyCollate(Collection collection, int groupCount) {
collection.indexed().groupBy { i, v -> i % groupCount }.values()*.values()
}
Or, you could do this, which creates less intermediate objects:
def fancyCollate(Collection collection, int groupCount) {
(0..<collection.size()).inject([[]] * groupCount) { l, v ->
l[v % groupCount] += collection[v]
l
}
}
Try #2 ;-)
def fancyCollate(Collection collection, int size) {
int stride = Math.ceil((double)collection.size() / size)
(1..size).collect { [(it - 1) * stride, Math.min(it * stride, collection.size())] }
.collect { a, b -> collection.subList(a, b) }
}
assert fancyCollate('a'..'z', 3) == ['a'..'i', 'j'..'r', 's'..'z']
Try #3 (with your example)
Collection.metaClass.collateIntoFixedSizeList = { int size ->
int stride = Math.ceil((double)delegate.size() / size)
(1..Math.min(size, delegate.size())).collect { [(it - 1) * stride, Math.min(it * stride, delegate.size())] }
.collect { a, b -> delegate.subList(a, b) }
}
def numbers = (1..7)
assert numbers.collateIntoFixedSizeList(10) == [[1],[2],[3],[4],[5],[6],[7]]

Groovy or Java equivalent of sumproduct?

Before I writing my own, does anyone know if Groovy or Java has something pre-built which is similar to Excel's sumproduct function?
The quasi syntax for sumproduct is something like
def list1 = [2,3,4]
def list2 = [5,10,20]
SUMPRODUCT(list1, list2 ...) = 120
You will get 120 ((2*5) + (3*10) + (4*20) = 120)
You can transpose(), collect() and sum the result:
def list1 = [2,3,4]
def list2 = [5,10,20]
assert [list1, list2]
.transpose()
.collect { it[0] * it[1] }
.sum() == 120
not really the out-of-box SUMPRODUCT substitute, but still an one-liner:
def list1 = [2,3,4]
def list2 = [5,10,20]
assert 120 == GroovyCollections.transpose( list1, list2 ).sum{ it[ 0 ] * it[ 1 ] }
Here's a version of sumproduct that isn't limited to two input lists:
def sumproduct(List... lists) {
(lists as List).transpose().sum { it.inject(1) { prod, val -> prod * val } }
}
Calling it with sumproduct([2,3,4], [5,10,20], [1,2,3]) returns 310.

How do I implement a comparator for a map in Groovy?

I have a map in Groovy:
['keyOfInterest' : 1, 'otherKey': 2]
There is a list containing a number of these maps. I want to know if a map exists in the list with keyOfInterest of a certain value.
If the data types were simple objects, I could use indexOf(), but I don't know how to do this with a more complicated type. E.g. (taken from the docs)
assert ['a', 'b', 'c', 'd', 'c'].indexOf('z') == -1 // 'z' is not in the list
I'd like to do something like:
def mapA = ['keyOfInterest' : 1, 'otherKey': 2]
def mapB = ['keyOfInterest' : 3, 'otherKey': 2]
def searchMap = ['keyOfInterest' : 1, 'otherKey': 5]
def list = [mapA, mapB]
assert list.indexOf(searchMap) == 0 // keyOfInterest == 1 for both mapA and searchMap
Is there a way to do this with more complicated objects, such as a map, easily?
While #dmahapatro is correct, and you can use find() to find the map in the list of maps that has the matching index... that's not what you asked for. So I'll show how you can get either the index of that entry in the list, or just whether a map with matching keyOfInterest exists.
def mapA = ['keyOfInterest' : 1, 'otherKey': 2]
def mapB = ['keyOfInterest' : 3, 'otherKey': 2]
def searchMap = ['keyOfInterest':1, 'otherKey': 55 ]
def list = [mapA, mapB]
// findIndexOf() returns the first index of the map that matches in the list, or -1 if none match
assert list.findIndexOf { it.keyOfInterest == searchMap.keyOfInterest } == 0
assert list.findIndexOf { it.keyOfInterest == 33 } == -1
// any() returns a boolean OR of all the closure results for each entry in the list.
assert list.any { it.keyOfInterest == searchMap.keyOfInterest } == true
assert list.any { it.keyOfInterest == 33 } == false
Note that there is no performance penalty for using one over the other as they all stop as soon as one match is found. find() gives you the most information, but if you're actually looking for the index or a boolean result, these others can also be used.
Simplest implementation would be to use find(). It returns null when criteria is not met in the supplied closure.
def mapA = ['keyOfInterest' : 1, 'otherKey': 2]
def mapB = ['keyOfInterest' : 3, 'otherKey': 2]
def list = [mapA, mapB]
assert list.find { it.keyOfInterest == 1 } == ['keyOfInterest':1, 'otherKey':2]
assert !list.find { it.keyOfInterest == 7 }

Resources