Groovy: Find duplicates in Multiple arrays - groovy

I have three arrays as below and need to find the maching/duplicate in all of these
`def Ids_AS =[04-04350, 21-005676, REGU-132644681]
def Ids_AO= [ 04-04350, 04-04356, REGU-132644681]
def Ids_AV= [ 04-04350, AB-132644681, REGU-132644681]`
println(IdsResultMissingOnSolrOutPut_AS.intersect(IdsResultMissingOnSolrOutPut_AV))
I used intersect but it is getting applies on 2 arrays/list only
Another Case: Need to handle empty array like below and it should match the rest of remaining instead of returning null or error
`def Ids_AS =[04-04350, 21-005676, REGU-132644681]
def Ids_AO= [ 04-04350, 04-04356, REGU-132644681]
def Ids_AV= []`
is there way to find duplicates on multiple arrays? Please help

Just do the intersection for the third array
def duplicates = Ids_AS.intersect(Ids_AO).intersect(Ids_AV)
If you want to get clever, and you have many, you can make a list of your lists, and then use inject (fold) to intersect them all against each other
def all = [Ids_AS, Ids_AO, Ids_AV]
def duplicates = all.inject { a, b -> a.intersect(b) }
Both methods will result in
['04-04350', 'REGU-132644681']
For the second question, sort the list of lists so the longest one is first, and then ignore empty lists
def duplicates = all.sort { -it.size() }.inject { a, b -> b.empty ? a : a.intersect(b) }

Related

Get index of a list with tuples in which the first element of the tuple matches pattern

I have a list of tuples:
countries = [('Netherlands','31'),
('US','1'),
('Brazil','55'),
('Russia','7')]
Now, I want to find the index of the list, based on the first item in the tuple.
I have tried countries.index('Brazil'), I would like the output to be 2. But instead, that returns a ValueError:
ValueError: 'Brazil' is not in list
I am aware that I could convert this list into a pd DataFrame and then search for a pattern match within the first column. However, I suspect there is a faster way to do this.
You can use enumerate() to find your index:
Try:
idx = next(i for i, (v, *_) in enumerate(countries) if v == "Brazil")
print(idx)
Prints:
2

Appending Elements to a List Creates List of List in Groovy

I am parsing each element of a list one by one.
def List1 = (String[]) Data[2].split(',')
Part of this list gives me a list with elements that contain a delimiter !.
List1 = [TEST1!01.01.01, TEST2!02.02.02]
I tried to iterate each element of this list and obtain a comma separated list.
def List2 = []
List1.each { List2.add(it.split('!'))}
However, the result was a list of list.
[[TEST1, 01.01.01], [TEST2, 02.02.02]]
Instead of [TEST1, 01.01.01, TEST2, 02.02.02].
How do I avoid this and obtain a list as shown above?
How about this?
def list1 = ['TEST1!01.01.01', 'TEST2!02.02.02']
println list1.collect{it.split('!')}.flatten()
When you do List2.add(it.split('!')), you are adding list to List2 instead of single string because .split() creates a list from string.
You should firstly create list by using .split() and than add each member of list to List2.
Here is solution:
def List1 = ["TEST1!01.01.01", "TEST2!02.02.02"]
​def List2 = []
List1.each { List1member ->
def subList = List1member.split('!')
subList.each { subListMember ->
List2.add(subListMember)
}
}
println(List2)​
split() returns a list. That is the reason why I got a list of list. I found that split() can carry process multiple delimiters as well when applied with an operator.
The following returns the desired output.
def List1 = (String[]) Data[2].split(',|!')

comparing two arrays and get the values which are not common

I am doing this problem a friend gave me where you are given 2 arrays say (a[1,2,3,4] and b[8,7,9,2,1]) and you have to find not common elements.
Expected output is [3,4,8,7,9]. Code below.
def disjoint(e,f):
c = e[:]
d = f[:]
for i in range(len(e)):
for j in range(len(f)):
if e[i] == f[j]:
c.remove(e[i])
d.remove(d[j])
final = c + d
print(final)
print(disjoint(a,b))
I tried with nested loops and creating copies of given arrays to modify them then add them but...
def disjoint(e,f):
c = e[:] # list copies
d = f[:]
for i in range(len(e)):
for j in range(len(f)):
if e[i] == f[j]:
c.remove(c[i]) # edited this line
d.remove(d[j])
final = c + d
print(final)
print(disjoint(a,b))
when I try removing common element from list copies, I get different output [2,4,8,7,9]. why ??
This is my first question in this website. I'll be thankful if anyone can clear my doubts.
Using sets you can do:
a = [1,2,3,4]
b = [8,7,9,2,1]
diff = (set(a) | set(b)) - (set(a) & set(b))
(set(a) | set(b)) is the union, set(a) & set(b) is the intersection and finally you do the difference between the two sets using -.
Your bug comes when you remove the elements in the lines c.remove(c[i]) and d.remove(d[j]). Indeed, the common elements are e[i]and f[j] while c and d are the lists you are updating.
To fix your bug you only need to change these lines to c.remove(e[i]) and d.remove(f[j]).
Note also that your method to delete items in both lists will not work if a list may contain duplicates.
Consider for instance the case a = [1,1,2,3,4] and b = [8,7,9,2,1].
You can simplify your code to make it works:
def disjoint(e,f):
c = e.copy() # [:] works also, but I think this is clearer
d = f.copy()
for i in e: # no need for index. just walk each items in the array
for j in f:
if i == j: # if there is a match, remove the match.
c.remove(i)
d.remove(j)
return c + d
print(disjoint([1,2,3,4],[8,7,9,2,1]))
Try it online!
There are a lot of more effecient way to achieve this. Check this stack overflow question to discover them: Get difference between two lists. My favorite way is to use set (like in #newbie's answer). What is a set? Lets check the documentation:
A set object is an unordered collection of distinct hashable objects. Common uses include membership testing, removing duplicates from a sequence, and computing mathematical operations such as intersection, union, difference, and symmetric difference. (For other containers see the built-in dict, list, and tuple classes, and the collections module.)
emphasis mine
Symmetric difference is perfect for our need!
Returns a new set with elements in either the set or the specified iterable but not both.
Ok here how to use it in your case:
def disjoint(e,f):
return list(set(e).symmetric_difference(set(f)))
print(disjoint([1,2,3,4],[8,7,9,2,1]))
Try it online!

Return multiple values from map in Groovy?

Let's say I have a map like this:
def map = [name: 'mrhaki', country: 'The Netherlands', blog: true, languages: ['Groovy', 'Java']]
Now I can return "submap" with only "name" and "blog" like this:
def keys = ['name', 'blog']
map.subMap(keys)
// Will return a map with entries name=mrhaki and blog=true
But is there a way to easily return multiple values instead of a list of entries?
Update:
I'd like to do something like this (which doesn't work):
def values = map.{'name','blog'}
which would yield for example values = ['mrhaki', true] (a list or tuple or some other datastructure).
map.subMap(keys)*.value
The Spread Operator (*.) is used to invoke an action on all items of
an aggregate object. It is equivalent to calling the action on each
item and collecting the result into a list
You can iterate over the submap and collect the values:
def values = map.subMap(keys).collect {it.value}
// Result: [mrhaki, true]
Or, iterate over the list of keys, returning the map value for that key:
def values = keys.collect {map[it]}
I would guess the latter is more efficient, not having to create the submap.
A more long-winded way to iterate over the map
def values = map.inject([]) {values, key, value ->
if (keys.contains(key)) {values << value}
values
}
For completeness I'll add another way of accomplishing this using Map.findResults:
map.findResults { k, v -> k in keys ? v : null }
flexible, but more long-winded than some of the previous answers.

set from the union of elements contained in two lists

this is for a pre-interview questioner. i believe i have the answer just wanted to get confirmation that im right.
Part 1 - Tell me what this code does, and its big-O performance
Part 2 - Re-write it yourself and tell me the big-O performance of your solution
def foo(a, b):
""" a and b are both lists """
c = []
for i in a:
if is_bar(b, i):
c.append(i)
return unique(c)
def is_bar(a, b):
for i in a:
if i == b:
return True
return False
def unique(arr):
b = {}
for i in arr:
b[i] = 1
return b.keys()
ANSWERS:
It creates a set from the union of elements contained in two lists. It big O performance is O(n2)
my solution which i believe achieves O(n)
Set A = getSetA();
Set B = getSetB();
Set UnionAB = new Set(A);
UnionAB.addAll(B);
for (Object inA : a)
if(B.contains(inA))
UnionAB.remove(inA);
It seems like the original code is doing an intersection not a union. It's traversing all the elements in the first list (a) and checking if it exists in the second list (b), in which case it is adding it to list c. Then it is returning the unique elements from c. Performance of O(n^2) seems right.

Resources