I am trying to create a new map, extracting from another map the keys and trying to insert the ones that are less than 13 characters long. I have tried to filter like this but it is impossible for me.
Map<String, String> maps = entriesOnlyOnLeft.findAll { it.key }.each { it.key.length() < 13 }
Output:
log.info("Key Map ---> "+maps.keySet())
[431486899900600, 280799200020001, 251206899900600, 080196899900604, 350166899900600, 180876899900600, 260896899900600, 372746899900600, 442166899900600, 330446899900600, 401946899900600, 110126899900600, 200696899900600, 410916899900600, 060156899900600, 210416899900600, 040136899900600, 290676899900600, 140216899900600, 020036899900600, 360386899900600, 312016899900600, 280796800220073, 451686899900600, 150306899900600, 280796800110071, 280796899900604, 320546899900600, 492756899900600, 240896899900600, 380386899900600, 000000E04921301, 100376899900600, 480206899900600, 000000E00004101, 280796800330051, 280796800330064, 050196899900600, 170796899900600, 390756899900600, 520016899900600, 000000E04921701, 280796899900075, 280796899900074, 280796899900077, 280796899900076, 280796899900079, 280796899900078, 280796899900065, 280796899900053, 280796899900057, 280796899900040, 280796899900041, 280796899900046, 280796899900045, 280796899900048, 280796899900049, 191306899900600, 030146899900600, 280796899900032, 280796899900035, 280796899900034, 280796899900036, 280796899900039, 280796899900038, 280796800440072, 341206899900600, 160786899900600, 130346899900600, 120406899900600, 510016899900600, 502976899900600, 471866899900600, 270286899900600, 300306899900600, 090596899900600, 000000E04924801, 230506899900600, 462506899900600, 070406899900600]
You almost had it. Here's the solution:
Map<String,String> results = entriesOnlyOnLeft.findAll { it.key.length() < 13 }
The closure used in findAll must return a truthy result in order for it to include that Map.Entry in the results. Also the findAll closure will be invoked for each item in the given Map (ie entriesOnlyOnLeft) so there is no need for a call to each. Plus each returns void so there wouldn't be any results returned to the caller by call each.
You can try the following way (Somewhat mutating way)
Map<String, String> maps = new HashMap<>();
entriesOnlyOnLeft.each { it ->
if (it.key.length() < 13) {
maps.put(it.key, it.value)
}
}
I'm trying to transform a list of strings:
def description = """user:some.mail#gmail.com;groups:Team-1, Team-2
user:some.othermail#gmail.com;groups:Team-2, Team-3
user:another.mail#gmail.com;groups:Team-1, Team-3
some other text"""
description = description.split('\\r\\n|\\n|\\r').findAll { it.startsWith('user') }
into a map so it looked like this:
[some.mail#gmail.com: "Team-2, Team-3", some.othermail#gmail.com: "Team-2, Team-3", another.mail#gmail.com: "Team-1, Team-3"]
so I could later iterate getting an e-mail address and corresponding teams.
Sadly with the below code I was only able to partly achieve it and only for one item of the list. I'm stuck with getting it into a loop and get the full result.
def userData = [:]
userData = description[0].split(';').inject([:]) { map, token ->
token.split(':').with {
map[it[0].trim()] = it[1].trim()
}
map
}
Can you give me a hint as for how I could get a map with all the items from the list?
You can use collectEntries method on a list:
def description = """user:some.mail#gmail.com;groups:Team-1, Team-2
user:some.othermail#gmail.com;groups:Team-2, Team-3
user:another.mail#gmail.com;groups:Team-1, Team-3
some other text"""
description = description.split('\\r\\n|\\n|\\r').findAll { it.startsWith('user') }
def map = description.collectEntries {
// split "user:some.mail#gmail.com;groups:Team-1, Team-2"
def split = it.split(';')
// remove "user:" prefix
def email = split[0].split(':')[1]
// remove "groups:" prefix
def groups = split[1].split(':')[1]
// create a map entry
[(email), groups]
}
Then running map.forEach {k, v -> println "key: '${k}', value: '${v}'"} prints following: (standard map to string may be a little bit chaotic in this case)
key: 'some.mail#gmail.com', value: 'Team-1, Team-2'
key: 'some.othermail#gmail.com', value: 'Team-2, Team-3'
key: 'another.mail#gmail.com', value: 'Team-1, Team-3'
I have one JavaRdd records
I would like to create 3 JavaRdd from records depending on condition:
JavaRdd<MyClass> records1 =records1.filter(record -> “A”.equals(record.getName()));
JavaRdd<MyClass> records2 =records1.filter(record -> “B”.equals(record.getName()));
JavaRdd<MyClass> records13=records1.filter(record -> “C”.equals(record.getName()));
The problem is, that I can do like I show above, but my records may have millions record and I don’t want to scan all records 3 times.
So I want to do it in one iteration over the records.
I need something like this:
records
.forEach(record -> {
if (“A”.equals(records.getName()))
{
records1(record);
}
else if (“B”.equals(records.getName()))
{
records2(record);
}
else if (“C”.equals(records.getName()))
{
records3(record);
}
});
How can I achieve this in Spark usin JavaRDD?
In my idea you can use "MapToPair" and new a Tuple2 object in each of your if condition block. Then your key in the Tuple2 will help you to find each rdd objects type. In other words, Tuple2s key shows the type of the objects you wanted to store in one rdd and it's value is your main data.
your code would be something like below:
JavaPairRdd<String,MyClass> records1 =records.forEach(record -> {
String key = "";
if (“A”.equals(record.getName()))
{
key="A";
}
else if ("B".equals(record.getName()))
{
key="B";
}
else if ("C".equals(record.getName()))
{
key="C";
}
return new Tuple2<>(key, record);
});
the resulting pairrdd objects can be divided by different keys you have used at foreach method.
I am trying to count some parameters with Spark. I used the word count example.
In this example, we can count a word but I wonder how I can count two fields at the same time.
Here is what I want to do:
Input files
{
"redundancy":1,
"deviceID":"dv1"
}
{
"redundancy":1,
"deviceID":"dv2"
}
{
"redundancy":2,
"deviceID":"dv1"
}
{
"redundancy":1,
"deviceID":"dv1"
}
{
"redundancy":2,
"deviceID":"dv5"
}
Output files
{
"redundancy":1,
"count":3,
"nbDevice":2
}
{
"redundancy":2,
"count":2,
"nbDevice":2
}
I wonder if there is already an example of this use case or if you have any documentation or links, i would be very thankful.
You can use pairs as keys.
The solution can look like:
rdd.map(record => (record.firstField, record.secondField) -> 1)
.reduceByKey(_ + _)
I have a tsv file in the form of "key \t value", and I need to read into a map. Currently i do it like this:
referenceFile.eachLine { line ->
def (name, reference) = line.split(/\t/)
referencesMap[name.toLowerCase()] = reference
}
Is there a shorter/nicer way to do it?
It's already quite short. Two answers I can think of:
First one avoids the creation of a temporary map object:
referenceFile.inject([:]) { map, line ->
def (name, reference) = line.split(/\t/)
map[name.toLowerCase()] = reference
map
}
Second one is more functional:
referenceFile.collect { it.split(/\t/) }.inject([:]) { map, val -> map[val[0].toLowerCase()] = val[1]; map }
The only other way I can think of doing it would be with an Iterator like you'd find in Commons IO:
#Grab( 'commons-io:commons-io:2.4' )
import org.apache.commons.io.FileUtils
referencesMap = FileUtils.lineIterator( referenceFile, 'UTF-8' )
.collectEntries { line ->
line.tokenize( '\t' ).with { k, v ->
[ (k.toLowerCase()): v ]
}
}
Or with a CSV parser:
#Grab('com.xlson.groovycsv:groovycsv:1.0')
import static com.xlson.groovycsv.CsvParser.parseCsv
referencesMap = referenceFile.withReader { r ->
parseCsv( [ separator:'\t', readFirstLine:true ], r ).collectEntries {
[ (it[ 0 ].toLowerCase()): it[ 1 ] ]
}
}
But neither of them are shorter, and not necessarily nicer either...
Though I prefer option 2 as it can handle cases such as:
"key\twith\ttabs"\tvalue
As it deals with quoted strings
This is the comment tim_yates added to melix's answer, and I think it's the shortest/clearest answer:
referenceFile.collect { it.tokenize( '\t' ) }.collectEntries { k, v -> [ k.toLowerCase(), v ] }