Compare two maps and find differences using Groovy or Java - groovy

I would like to find difference in two maps and create a new csv file with the difference (and put the difference between **) like below:
Map 1
[
[cuInfo:"T12",service:"3",startDate:"14-01-16 13:22",appId:"G12355"],
[cuInfo:"T13",service:"3",startDate:"12-02-16 13:00",appId:"G12356"],
[cuInfo:"T14",service:"9",startDate:"10-01-16 11:20",appId:"G12300"],
[cuInfo:"T15",service:"10",startDate:"26-02-16 10:20",appId:"G12999"]
]
Map 2
[
[name:"Apple", cuInfo:"T12",service:"3",startDate:"14-02-16 10:00",appId:"G12351"],
[name:"Apple",cuInfo:"T13",service:"3",startDate:"14-01-16 13:00",appId:"G12352"],
[name:"Apple",cuInfo:"T16",service:"3",startDate:"14-01-16 13:00",appId:"G12353"],
[name:"Google",cuInfo:"T14",service:"9",startDate:"10-01-16 11:20",appId:"G12301"],
[name:"Microsoft",cuInfo:"T15",service:"10",startDate:"26-02-16 10:20",appId:"G12999"],
[name:"Microsoft",cuInfo:"T18",service:"10",startDate:"26-02-16 10:20",appId:"G12999"]
]
How can I get the output csv like below
Map 1 data | Map 2 data
service 3;name Apple;
cuInfo;startDate;appId | cuInfo;startDate;appId
T12;*14-02-16 10:00*;*G12351* | T12;*14-01-16 13:22*;*G12355*
T13;*14-01-16 13:00*;*G12352* | T13;*12-02-16 13:00*;*G12356*
service 9;name Google;
T14;*10-01-16 11:20*;*G12301* | T12;*10-01-16 11:20*;*G12300*
Thanks

In the following I'm assuming that the list of maps is sorted appropriately so that the comparison is fair, and that both lists are of the same length:
First, create an Iterator to traverse both lists simultaneously:
#groovy.transform.TupleConstructor
class DualIterator implements Iterator<List> {
Iterator iter1
Iterator iter2
boolean hasNext() {
iter1.hasNext() && iter2.hasNext()
}
List next() {
[iter1.next(), iter2.next()]
}
void remove() {
throw new UnsupportedOperationException()
}
}
Next, process the lists to get rows for the CSV file:
def rows = new DualIterator(list1.iterator(), list2.iterator())
.findAll { it[0] != it[1] } // Grab the non-matching lines.
.collect { // Mark the non-matching values.
(m1, m2) = it
m1.keySet().each { key ->
if(m1[key] != m2[key]) {
m1[key] = "*${m1[key]}*"
m2[key] = "*${m2[key]}*"
}
}
[m1, m2]
}.collect { // Merge the map values into a List of String arrays
[it[0].values(), it[1].values()].flatten() as String[]
}
Finally, write the header and rows out in CSV format. NOTE: I'm using a proper CSV; your example is actually invalid because the number of columns are inconsistent:
def writer = new CSVWriter(new FileWriter('blah.csv'))
writer.writeNext(['name1', 'cuInfo1', 'service1', 'startDate1', 'appId1', 'name2', 'cuInfo2', 'service2', 'startDate2', 'appId2'] as String[])
writer.writeAll(rows)
writer.close()
The output looks like this:
"name1","cuInfo1","service1","startDate1","appId1","name2","cuInfo2","service2","startDate2","appId2"
"Apple","T12","3","*14-02-16 10:00*","*G12351*","Apple","T12","3","*14-01-16 13:22*","*G12355*"
"Apple","T13","3","*14-01-16 13:00*","*G12352*","Apple","T13","3","*12-02-16 13:00*","*G12356*"
"Google","T14","9","10-01-16 11:20","*G12301*","Google","T14","9","10-01-16 11:20","*G12300*"

Related

Filter map key length in Groovy

I am trying to create a new map, extracting from another map the keys and trying to insert the ones that are less than 13 characters long. I have tried to filter like this but it is impossible for me.
Map<String, String> maps = entriesOnlyOnLeft.findAll { it.key }.each { it.key.length() < 13 }
Output:
log.info("Key Map ---> "+maps.keySet())
[431486899900600, 280799200020001, 251206899900600, 080196899900604, 350166899900600, 180876899900600, 260896899900600, 372746899900600, 442166899900600, 330446899900600, 401946899900600, 110126899900600, 200696899900600, 410916899900600, 060156899900600, 210416899900600, 040136899900600, 290676899900600, 140216899900600, 020036899900600, 360386899900600, 312016899900600, 280796800220073, 451686899900600, 150306899900600, 280796800110071, 280796899900604, 320546899900600, 492756899900600, 240896899900600, 380386899900600, 000000E04921301, 100376899900600, 480206899900600, 000000E00004101, 280796800330051, 280796800330064, 050196899900600, 170796899900600, 390756899900600, 520016899900600, 000000E04921701, 280796899900075, 280796899900074, 280796899900077, 280796899900076, 280796899900079, 280796899900078, 280796899900065, 280796899900053, 280796899900057, 280796899900040, 280796899900041, 280796899900046, 280796899900045, 280796899900048, 280796899900049, 191306899900600, 030146899900600, 280796899900032, 280796899900035, 280796899900034, 280796899900036, 280796899900039, 280796899900038, 280796800440072, 341206899900600, 160786899900600, 130346899900600, 120406899900600, 510016899900600, 502976899900600, 471866899900600, 270286899900600, 300306899900600, 090596899900600, 000000E04924801, 230506899900600, 462506899900600, 070406899900600]
You almost had it. Here's the solution:
Map<String,String> results = entriesOnlyOnLeft.findAll { it.key.length() < 13 }
The closure used in findAll must return a truthy result in order for it to include that Map.Entry in the results. Also the findAll closure will be invoked for each item in the given Map (ie entriesOnlyOnLeft) so there is no need for a call to each. Plus each returns void so there wouldn't be any results returned to the caller by call each.
You can try the following way (Somewhat mutating way)
Map<String, String> maps = new HashMap<>();
entriesOnlyOnLeft.each { it ->
if (it.key.length() < 13) {
maps.put(it.key, it.value)
}
}

Transforming list to map in Groovy

I'm trying to transform a list of strings:
def description = """user:some.mail#gmail.com;groups:Team-1, Team-2
user:some.othermail#gmail.com;groups:Team-2, Team-3
user:another.mail#gmail.com;groups:Team-1, Team-3
some other text"""
description = description.split('\\r\\n|\\n|\\r').findAll { it.startsWith('user') }
into a map so it looked like this:
[some.mail#gmail.com: "Team-2, Team-3", some.othermail#gmail.com: "Team-2, Team-3", another.mail#gmail.com: "Team-1, Team-3"]
so I could later iterate getting an e-mail address and corresponding teams.
Sadly with the below code I was only able to partly achieve it and only for one item of the list. I'm stuck with getting it into a loop and get the full result.
def userData = [:]
userData = description[0].split(';').inject([:]) { map, token ->
token.split(':').with {
map[it[0].trim()] = it[1].trim()
}
map
}
Can you give me a hint as for how I could get a map with all the items from the list?
You can use collectEntries method on a list:
def description = """user:some.mail#gmail.com;groups:Team-1, Team-2
user:some.othermail#gmail.com;groups:Team-2, Team-3
user:another.mail#gmail.com;groups:Team-1, Team-3
some other text"""
description = description.split('\\r\\n|\\n|\\r').findAll { it.startsWith('user') }
def map = description.collectEntries {
// split "user:some.mail#gmail.com;groups:Team-1, Team-2"
def split = it.split(';')
// remove "user:" prefix
def email = split[0].split(':')[1]
// remove "groups:" prefix
def groups = split[1].split(':')[1]
// create a map entry
[(email), groups]
}
Then running map.forEach {k, v -> println "key: '${k}', value: '${v}'"} prints following: (standard map to string may be a little bit chaotic in this case)
key: 'some.mail#gmail.com', value: 'Team-1, Team-2'
key: 'some.othermail#gmail.com', value: 'Team-2, Team-3'
key: 'another.mail#gmail.com', value: 'Team-1, Team-3'

Filter JavaRDD into multiple JavaRDD based on Condtion

I have one JavaRdd records
I would like to create 3 JavaRdd from records depending on condition:
JavaRdd<MyClass> records1 =records1.filter(record -> “A”.equals(record.getName()));
JavaRdd<MyClass> records2 =records1.filter(record -> “B”.equals(record.getName()));
JavaRdd<MyClass> records13=records1.filter(record -> “C”.equals(record.getName()));
The problem is, that I can do like I show above, but my records may have millions record and I don’t want to scan all records 3 times.
So I want to do it in one iteration over the records.
I need something like this:
records
.forEach(record -> {
if (“A”.equals(records.getName()))
{
records1(record);
}
else if (“B”.equals(records.getName()))
{
records2(record);
}
else if (“C”.equals(records.getName()))
{
records3(record);
}
});
How can I achieve this in Spark usin JavaRDD?
In my idea you can use "MapToPair" and new a Tuple2 object in each of your if condition block. Then your key in the Tuple2 will help you to find each rdd objects type. In other words, Tuple2s key shows the type of the objects you wanted to store in one rdd and it's value is your main data.
your code would be something like below:
JavaPairRdd<String,MyClass> records1 =records.forEach(record -> {
String key = "";
if (“A”.equals(record.getName()))
{
key="A";
}
else if ("B".equals(record.getName()))
{
key="B";
}
else if ("C".equals(record.getName()))
{
key="C";
}
return new Tuple2<>(key, record);
});
the resulting pairrdd objects can be divided by different keys you have used at foreach method.

Spark: count two fields together

I am trying to count some parameters with Spark. I used the word count example.
In this example, we can count a word but I wonder how I can count two fields at the same time.
Here is what I want to do:
Input files
{
"redundancy":1,
"deviceID":"dv1"
}
{
"redundancy":1,
"deviceID":"dv2"
}
{
"redundancy":2,
"deviceID":"dv1"
}
{
"redundancy":1,
"deviceID":"dv1"
}
{
"redundancy":2,
"deviceID":"dv5"
}
Output files
{
"redundancy":1,
"count":3,
"nbDevice":2
}
{
"redundancy":2,
"count":2,
"nbDevice":2
}
I wonder if there is already an example of this use case or if you have any documentation or links, i would be very thankful.
You can use pairs as keys.
The solution can look like:
rdd.map(record => (record.firstField, record.secondField) -> 1)
.reduceByKey(_ + _)

Groovier way to parse tsv file into map

I have a tsv file in the form of "key \t value", and I need to read into a map. Currently i do it like this:
referenceFile.eachLine { line ->
def (name, reference) = line.split(/\t/)
referencesMap[name.toLowerCase()] = reference
}
Is there a shorter/nicer way to do it?
It's already quite short. Two answers I can think of:
First one avoids the creation of a temporary map object:
referenceFile.inject([:]) { map, line ->
def (name, reference) = line.split(/\t/)
map[name.toLowerCase()] = reference
map
}
Second one is more functional:
referenceFile.collect { it.split(/\t/) }.inject([:]) { map, val -> map[val[0].toLowerCase()] = val[1]; map }
The only other way I can think of doing it would be with an Iterator like you'd find in Commons IO:
#Grab( 'commons-io:commons-io:2.4' )
import org.apache.commons.io.FileUtils
referencesMap = FileUtils.lineIterator( referenceFile, 'UTF-8' )
.collectEntries { line ->
line.tokenize( '\t' ).with { k, v ->
[ (k.toLowerCase()): v ]
}
}
Or with a CSV parser:
#Grab('com.xlson.groovycsv:groovycsv:1.0')
import static com.xlson.groovycsv.CsvParser.parseCsv
referencesMap = referenceFile.withReader { r ->
parseCsv( [ separator:'\t', readFirstLine:true ], r ).collectEntries {
[ (it[ 0 ].toLowerCase()): it[ 1 ] ]
}
}
But neither of them are shorter, and not necessarily nicer either...
Though I prefer option 2 as it can handle cases such as:
"key\twith\ttabs"\tvalue
As it deals with quoted strings
This is the comment tim_yates added to melix's answer, and I think it's the shortest/clearest answer:
referenceFile.collect { it.tokenize( '\t' ) }.collectEntries { k, v -> [ k.toLowerCase(), v ] }

Resources