I need to merge to maps while perform some calculation for example having the following maps that always will be the same size
def map1 = [
[name: 'Coord1', quota: 200],
[name: 'Coord2', quota: 300]
]
def map2 = [
[name: 'Coord1', copiesToDate: 270],
[name: 'Coord2', copiesToDate: 30]
]
I want to get this map
def map3 = [
[name: 'Coord1', quota: 200, copiesToDate: 60, balance: 140],
[name: 'Coord2', quota: 300, copiesToDate: 30, balance: 270]
]
Right now i am trying with this solution and its working
def map4 = map1.collect { m1 ->
[
name: m1.name,
quota: m1.quota,
copiesToDate: map2.find { m2 ->
m1.name == m2.name
}.copiesToDate,
balanceToDate: m1.quota - map2.find { m2 ->
m1.name == m2.name
}.copiesToDate
]}
Could you please share a groovy way to do this task. Thanks
Grooviest code I could come up with:
def map3 = [map1, map2].transpose()*.sum().each { m ->
m.balance = m.quota - m.copiesToDate
}
edit: as noted by Tim, this code works as long as the two input lists (map1 and map2) are of the same size and have the maps in order. If this is not the case I would recommend Tim's answer which handles those cases.
The above returns the map as defined in your question. The following code:
def list1 = [
[name: 'Coord1', quota: 200],
[name: 'Coord2', quota: 300]
]
def list2 = [
[name: 'Coord1', copiesToDate: 60],
[name: 'Coord2', copiesToDate: 30]
]
def x = [list1, list2].transpose()*.sum().each { m ->
m.balance = m.quota - m.copiesToDate
}
x.each {
println it
}
demonstrates the idea and prints:
[name:Coord1, quota:200, copiesToDate:60, balance:140]
[name:Coord2, quota:300, copiesToDate:30, balance:270]
I have renamed map1 and map2 into list1 and list2 since they are in fact two lists containing inner maps.
The code is somewhat concise and might need a bit of explanation if you're not used to transpose and the groovy spread and map operations.
Explanation:
[list1, list2] - first we create a new list where the two existing lists are elements. So we now have a list of lists where the elements in the inner lists are maps.
.transpose() - we then call transpose which might need a bit of effort to grasp when you see it for the first time. If you have a list of lists, you can see transpose as flipping the lists "into the other direction".
In our case the two lists:
[[name:Coord1, quota:200], [name:Coord2, quota:300]]
[[name:Coord1, copiesToDate:60], [name:Coord2, copiesToDate:30]]
become:
[[name:Coord1, quota:200], [name:Coord1, copiesToDate:60]]
[[name:Coord2, quota:300], [name:Coord2, copiesToDate:30]]
i.e. after transpose, everything relating to Coord1 is in the first list and everything relating to Coord2 is in the second.
Each of the lists we have now is a list of Maps. But what we want is just one map for Coord1 and one map for Coord2. So for each of the above lists, we now need to coalesce or merge the contained maps into one map. We do this using the fact that in groovy map+map returns a merged map. Using the groovy spread operator *. we therefore call sum() on each list of maps.
i.e.:
[[name:Coord1, quota:200], [name:Coord1, copiesToDate:60]].sum()
computes into:
[name:Coord1, quota:200, copiesToDate:60]
and:
[[name:Coord2, quota:300], [name:Coord2, copiesToDate:30]].sum()
into:
[name:Coord2, quota:300, copiesToDate:30]
lastly we want to add the balance property to the maps so we iterate through what is now a list of two maps and add balance as a computation of quota - copiesToDate. The each construct returns the list it is working on which is what we assign to x.
Don't call find twice. Use the Map.plus() method to append new entries. Handle missing names from map2.
def map3 = map1.collect {m1 ->
def m2 = map2.find {it.name == m1.name} ?: [copiesToDate: 0]
m1 + m2 + [balance: m1.quota - m2.copiesToDate]
}
Another option for fun :-)
def result = (map1 + map2).groupBy { it.name }
.values()
*.sum()
.collect { it << ['balance': it.quota - it.copiesToDate] }
add the lists together
group by the name
get the grouped values and concatenate them
then for each of them, work out the balance
I have array like this:
def arr = [
"v3.1.20161004.0",
"v3.1.20161004.1",
"v3.1.20161004.10",
"v3.1.20161004.11",
"v3.1.20161004.2",
"v3.1.20161004.3",
"v3.1.20161004.30",
]
I need to get this:
def arr = [
"v3.1.20161004.0",
"v3.1.20161004.1",
"v3.1.20161004.2",
"v3.1.20161004.3",
"v3.1.20161004.10",
"v3.1.20161004.11",
"v3.1.20161004.30",
]
How to sort it by last number '.x' ?
You can tokenize each string on . and then grab the last element as an Integer, and sort on this (passing false to return a new list)
def newArray = arr.sort(false) { it.tokenize('.')[-1] as Integer }
When sorting an array you can define a sorting closure. In this case you can split on the dot and sort using the spaceship operator:
arr.sort { a, b -> a.tokenize('.').last().toInteger() <=> b.tokenize('.').last().toInteger() }
How can I convert a string like this
'[["dfd","ewer","errr","ggg"],["yyy","ttt","rrr","ggg"]]'
into a list?
I don't want to use GroovyShell().evaluate()
Thanks
You can use Eval.me like so:
String input = '[["dfd","ewer","errr","ggg"],["yyy","ttt","rrr","ggg"]]'
List output = Eval.me( input )
assert output.size() == 2
assert output*.size() == [ 4, 4 ]
(though of course, under the covers, Groovy just calls GroovyShell.evaluate())
Then for pure Groovy, there's JsonParser:
output = new groovy.json.JsonSlurper().parseText( input )
Is there any way to remove variable "i" in the following example and still get access to index of item that being printed ?
def i = 0;
"one two three".split().each {
println ("item [ ${i++} ] = ${it}");
}
=============== EDIT ================
I found that one possible solution is to use "eachWithIndex" method:
"one two three".split().eachWithIndex {it, i
println ("item [ ${i} ] = ${it}");
}
Please do let me know if there are other solutions.
you can use eachWithIndex()
"one two three four".split().eachWithIndex() { entry, index ->
println "${index} : ${entry}" }
this will result in
0 : one
1 : two
2 : three
3 : four
Not sure what 'other solutions' you are looking for... The only other thing you can do that I can think of (with Groovy 1.8.6), is something like:
"one two three".split().with { words ->
[words,0..<words.size()].transpose().collect { word, index ->
word * index
}
}
As you can see, this allows you to use collect with an index as well (as there is no collectWithIndex method).
Another approach, if you want to use the index of the collection on other methods than each is to define an enumerate method that returns pairs [index, element], analog to Python's enumerate:
Iterable.metaClass.enumerate = { start = 0 ->
def index = start
delegate.collect { [index++, it] }
}
So, for example:
assert 'un dos tres'.tokenize().enumerate() == [[0,'un'], [1,'dos'], [2,'tres']]
(notice that i'm using tokenize instead of split because the former returns an Iterable, while the later returns a String[])
And we can use this new collection with each, as you wanted:
'one two three'.tokenize().enumerate().each { index, word ->
println "$index: $word"
}
Or we can use it with other iteration methods :D
def repetitions = 'one two three'.tokenize().enumerate(1).collect { n, word ->
([word] * n).join(' ')
}
assert repetitions == ['one', 'two two', 'three three three']
Note: Another way of defining the enumerate method, following tim_yates' more functional approach is:
Iterable.metaClass.enumerate = { start = 0 ->
def end = start + delegate.size() - 1
[start..end, delegate].transpose()
}
I am looking for ways to find matching patterns in lists or arrays of strings, specifically in .NET, but algorithms or logic from other languages would be helpful.
Say I have 3 arrays (or in this specific case List(Of String))
Array1
"Do"
"Re"
"Mi"
"Fa"
"So"
"La"
"Ti"
Array2
"Mi"
"Fa"
"Jim"
"Bob"
"So"
Array3
"Jim"
"Bob"
"So"
"La"
"Ti"
I want to report on the occurrences of the matches of
("Mi", "Fa") In Arrays (1,2)
("So") In Arrays (1,2,3)
("Jim", "Bob", "So") in Arrays (2,3)
("So", "La", "Ti") in Arrays (1, 3)
...and any others.
I am using this to troubleshoot an issue, not to make a commercial product of it specifically, and would rather not do it by hand (there are 110 lists of about 100-200 items).
Are there any algorithms, existing code, or ideas that will help me accomplish finding the results described?
The simplest way to code would be to build a Dictionary then loop through each item in each array. For each item do this:
Check if the item is in the dictonary if so add the list to the array.
If the item is not in the dictionary add it and the list.
Since as you said this is non-production code performance doesn't matter so this approach should work fine.
Here's a solution using SuffixTree module to locate subsequences:
#!/usr/bin/env python
from SuffixTree import SubstringDict
from collections import defaultdict
from itertools import groupby
from operator import itemgetter
import sys
def main(stdout=sys.stdout):
"""
>>> import StringIO
>>> s = StringIO.StringIO()
>>> main(stdout=s)
>>> print s.getvalue()
[['Mi', 'Fa']] In Arrays (1, 2)
[['So', 'La', 'Ti']] In Arrays (1, 3)
[['Jim', 'Bob', 'So']] In Arrays (2, 3)
[['So']] In Arrays (1, 2, 3)
<BLANKLINE>
"""
# array of arrays of strings
arr = [
["Do", "Re", "Mi", "Fa", "So", "La", "Ti",],
["Mi", "Fa", "Jim", "Bob", "So",],
["Jim", "Bob", "So", "La", "Ti",],
]
#### # 28 seconds (27 seconds without lesser substrs inspection (see below))
#### N, M = 100, 100
#### import random
#### arr = [[random.randrange(100) for _ in range(M)] for _ in range(N)]
# convert to ASCII alphabet (for SubstringDict)
letter2item = {}
item2letter = {}
c = 1
for item in (i for a in arr for i in a):
if item not in item2letter:
c += 1
if c == 128:
raise ValueError("too many unique items; "
"use a less restrictive alphabet for SuffixTree")
letter = chr(c)
letter2item[letter] = item
item2letter[item] = letter
arr_ascii = [''.join(item2letter[item] for item in a) for a in arr]
# populate substring dict (based on SuffixTree)
substring_dict = SubstringDict()
for i, s in enumerate(arr_ascii):
substring_dict[s] = i+1
# enumerate all substrings, save those that occur more than once
substr2indices = {}
indices2substr = defaultdict(list)
for str_ in arr_ascii:
for start in range(len(str_)):
for size in reversed(range(1, len(str_) - start + 1)):
substr = str_[start:start + size]
if substr not in substr2indices:
indices = substring_dict[substr] # O(n) SuffixTree
if len(indices) > 1:
substr2indices[substr] = indices
indices2substr[tuple(indices)].append(substr)
#### # inspect all lesser substrs
#### # it could diminish size of indices2substr[ind] list
#### # but it has no effect for input 100x100x100 (see above)
#### for i in reversed(range(len(substr))):
#### s = substr[:i]
#### if s in substr2indices: continue
#### ind = substring_dict[s]
#### if len(ind) > len(indices):
#### substr2indices[s] = ind
#### indices2substr[tuple(ind)].append(s)
#### indices = ind
#### else:
#### assert set(ind) == set(indices), (ind, indices)
#### substr2indices[s] = None
#### break # all sizes inspected, move to next `start`
for indices, substrs in indices2substr.iteritems():
# remove substrs that are substrs of other substrs
substrs = sorted(substrs, key=len) # sort by size
substrs = [p for i, p in enumerate(substrs)
if not any(p in q for q in substrs[i+1:])]
# convert letters to items and print
items = [map(letter2item.get, substr) for substr in substrs]
print >>stdout, "%s In Arrays %s" % (items, indices)
if __name__=="__main__":
# test
import doctest; doctest.testmod()
# measure performance
import timeit
t = timeit.Timer(stmt='main(stdout=s)',
setup='from __main__ import main; from cStringIO import StringIO as S; s = S()')
N = 1000
milliseconds = min(t.repeat(repeat=3, number=N))
print("%.3g milliseconds" % (1e3*milliseconds/N))
It takes about 30 seconds to process 100 lists of 100 items each. SubstringDict in the above code might be emulated by grep -F -f.
Old solution:
In Python (save it to 'group_patterns.py' file):
#!/usr/bin/env python
from collections import defaultdict
from itertools import groupby
def issubseq(p, q):
"""Return whether `p` is a subsequence of `q`."""
return any(p == q[i:i + len(p)] for i in range(len(q) - len(p) + 1))
arr = (("Do", "Re", "Mi", "Fa", "So", "La", "Ti",),
("Mi", "Fa", "Jim", "Bob", "So",),
("Jim", "Bob", "So", "La", "Ti",))
# store all patterns that occure at least twice
d = defaultdict(list) # a map: pattern -> indexes of arrays it's within
for i, a in enumerate(arr[:-1]):
for j, q in enumerate(arr[i+1:]):
for k in range(len(a)):
for size in range(1, len(a)+1-k):
p = a[k:k + size] # a pattern
if issubseq(p, q): # `p` occures at least twice
d[p] += [i+1, i+2+j]
# group patterns by arrays they are within
inarrays = lambda pair: sorted(set(pair[1]))
for key, group in groupby(sorted(d.iteritems(), key=inarrays), key=inarrays):
patterns = sorted((pair[0] for pair in group), key=len) # sort by size
# remove patterns that are subsequences of other patterns
patterns = [p for i, p in enumerate(patterns)
if not any(issubseq(p, q) for q in patterns[i+1:])]
print "%s In Arrays %s" % (patterns, key)
The following command:
$ python group_patterns.py
prints:
[('Mi', 'Fa')] In Arrays [1, 2]
[('So',)] In Arrays [1, 2, 3]
[('So', 'La', 'Ti')] In Arrays [1, 3]
[('Jim', 'Bob', 'So')] In Arrays [2, 3]
The solution is terribly inefficient.
As others have mentioned the function you want is Intersect. If you are using .NET 3.0 consider using LINQ's Intersect function.
See the following post for more information
Consider using LinqPAD to experiment.
www.linqpad.net
I hacked the program below in about 10 minutes of Perl. It's not perfect, it uses a global variable, and it just prints out the counts of every element seen by the program in each list, but it's a good approximation to what you want to do that's super-easy to code.
Do you actually want all combinations of all subsets of the elements common to each array? You could enumerate all of the elements in a smarter way if you wanted, but if you just wanted all elements that exist at least once in each array you could use the Unix command "grep -v 0" on the output below and that would show you the intersection of all elements common to all arrays. Your question is missing a little bit of detail, so I can't perfectly implement something that solves your problem.
If you do more data analysis than programming, scripting can be very useful for asking questions from textual data like this. If you don't know how to code in a scripting language like this, I would spend a month or two reading about how to code in Perl, Python or Ruby. They can be wonderful for one-off hacks such as this, especially in cases when you don't really know what you want. The time and brain cost of writing a program like this is really low, so that (if you're fast) you can write and re-write it several times while still exploring the definition of your question.
#!/usr/bin/perl -w
use strict;
my #Array1 = ( "Do", "Re", "Mi", "Fa", "So", "La", "Ti");
my #Array2 = ( "Mi", "Fa", "Jim", "Bob", "So" );
my #Array3 = ( "Jim", "Bob", "So", "La", "Ti" );
my %counts;
sub count_array {
my $array = shift;
my $name = shift;
foreach my $e (#$array) {
$counts{$e}{$name}++;
}
}
count_array( \#Array1, "Array1" );
count_array( \#Array2, "Array2" );
count_array( \#Array3, "Array3" );
my #names = qw/ Array1 Array2 Array3 /;
print join ' ', ('element',#names);
print "\n";
my #unique_names = keys %counts;
foreach my $unique_name (#unique_names) {
my #counts = map {
if ( exists $counts{$unique_name}{$_} ) {
$counts{$unique_name}{$_};
} else {
0;
}
}
#names;
print join ' ', ($unique_name,#counts);
print "\n";
}
The program's output is:
element Array1 Array2 Array3
Ti 1 0 1
La 1 0 1
So 1 1 1
Mi 1 1 0
Fa 1 1 0
Do 1 0 0
Bob 0 1 1
Jim 0 1 1
Re 1 0 0
Looks like you want to use an intersection function on sets of data. Intersection picks out elements that are common in both (or more) sets.
The problem with this viewpoint is that sets cannot contain more than one of each element, i.e. no more than one Jim per set, also it cannot recognize several elements in a row counting as a pattern, you can however modify a comparison function to look further to see just that.
There mey be functions like intersect that works on bags (which is kind of like sets, but tolerate identical elements).
These functions should be standard in most languages or pretty easy to write yourself.
I'm sure there's a MUCH more elegant way, but...
Since this isn't production code, why not just hack it and convert each array into a delimited string, then search each string for the pattern you want? i.e.
private void button1_Click(object sender, EventArgs e)
{
string[] array1 = { "do", "re", "mi", "fa", "so" };
string[] array2 = { "mi", "fa", "jim", "bob", "so" };
string[] pattern1 = { "mi", "fa" };
MessageBox.Show(FindPatternInArray(array1, pattern1).ToString());
MessageBox.Show(FindPatternInArray(array2, pattern1).ToString());
}
private bool FindPatternInArray(string[] AArray, string[] APattern)
{
return string.Join("~", AArray).IndexOf(string.Join("~", APattern)) >= 0;
}
First, start by counting each item.
You make a temp list : "Do" = 1, "Mi" = 2, "So" = 3, etc.
you can remove from the temp list all the ones that match = 1 (ex: "Do").
The temp list contains the list of non-unique items (save it somewhere).
Now, you try to make lists of two from one in the temp list, and a following in the original lists.
"So" + "La" = 2, "Bob" + "So" = 2, etc.
Remove the ones with = 1.
You have the lists of couple that appears at least twice (save it somewhere).
Now, try to make lists of 3 items, by taking a couple from the temp list, and take the following from the original lists.
("Mi", "Fa") + "So" = 1, ("Mi", "Fa") + "Jim" = 1, ("So", "La") + "Ti" = 2
Remove the ones with = 1.
You have the lists of 3 items that appears at least twice (save it).
And you continue like that until the temp list is empty.
At the end, you take all the saved lists and you merge them.
This algorithm is not optimal (I think we can do better with suitable data structures), but it is easy to implement :)
Suppose a password consisted of a string of nine characters from the English alphabet (26 characters). If each possible password could be tested in a millisecond, how long would it take to test all possible passwords?