I have below line which split the words by space and I have map function where I want to pass my custom function to manipulate just for my learning and understanding better
val wordList = dataRdd.flatMap(x=>x.split(" ")).map(x=>(myMap(x)))
def myMap(word:String)
{
if(wordMap.contains(word))
{
var value:Int = wordMap.get(word).get
wordMap+=(word->(value+1))
}else
{
wordMap+=(word->1)
}
}
I have map where I am checking the words and if present I am incrementing by 1 and updating the map
what should be the return type of myMap function, how does the result stored in wordList?
can somebody explain please?
It cannot be done like this (just read Spark Programming Guide about closures).
And the problem you try to solve is just a word count:
dataRdd.flatMap(_.split(" ").map((_, 1L))).reduceByKey(_ + _).collectAsMap
Related
I cannot figure out why my function invokeAll does not give out the correct output/work properly. Any solutions? (No futures or parallel collections allowed and the return type needs to be Seq[Int])
def invokeAll(work: Seq[() => Int]): Seq[Int] = {
//this is what we should return as an output "return res.toSeq"
//res cannot be changed!
val res = new Array[Int](work.length)
var list = mutable.Set[Int]()
var n = res.size
val procedure = (0 until n).map(work =>
new Runnable {
def run {
//add the finished element/Int to list
list += work
}
}
)
val threads = procedure.map(new Thread(_))
threads.foreach(x => x.start())
threads.foreach (x => (x.join()))
res ++ list
//this should be the final output ("return res.toSeq")
return res.toSeq
}
OMG, I know a java programmer, when I see one :)
Don't do this, it's not java!
val results: Future[Seq[Int]] = Future.traverse(work)
This is how you do it in scala.
This gives you a Future with the results of all executions, that will be satisfied when all work is finished. You can use .map, .flatMap etc. to access and transform those results. For example
val sumOfAll: Future[Int] = results.map(_.sum)
Or (in the worst case, when you want to just give the result back to imperative code), you could block and wait on the future to get ahold of the actual result (don't do this unless you are absolutely desperate): Await.result(results, 1 year)
If you want the results as array, results.map(_.toArray) will do that ... but you really should not: arrays aren't really a good choice for the vast majority of use cases in scala. Just stick with Seq.
The main problem in your code is that you are using fixed size array and trying to add some elements using ++ (concatenate) operator: res ++ list. It produces new Seq but you don't store it in some val.
You could remove last line return res.toSeq and see that res ++ lest will be return value. It will be your work.length array of zeros res with some list sequence at the end. Try read more about scala collections most of them immutable and there is a good practice to use immutable data structures. In scala Arrays doesn't accumulate values using ++ operator in left operand. Array's in scala are fixed size.
I have a sequence of code points as Sequence<Int>.
I want to get this into a String.
What I currently do is this:
val string = codePoints
.map { codePoint -> String(intArrayOf(codePoint), 0, 1) }
.joinToString()
But it feels extremely hairy to create a string for each code point just to concatenate them immediately after. Is there a more direct way to do this?
So far the best I was able to do was something like this:
val string2 = codePoints.toList().toIntArray()
.let { codePoints -> String(codePoints, 0, codePoints.size) }
The amount of code isn't really any better, and it has a toList().toIntArray() which I'm not completely fond of. But it at least avoids the packaging of everything into dozens of one-code-point strings, and the logic is still written in the logical order.
You can either go for the simple:
val string = codePoints.joinToString("") { Character.toString(it) }
// or
val string = codePoints.joinToString("", transform = Character::toString)
Or use a string builder:
fun Sequence<Int>.codePointsToString(): String = buildString {
this#codePointsToString.forEach { cp ->
appendCodePoint(cp)
}
}
This second one expresses exactly what you want, and may benefit from future optimizations in the string builder.
it feels extremely hairy to create a string for each code point just to concatenate them immediately after
Did you really measure a performance issue with the extra string objects created here? Using toList() would also create a bunch of object arrays behind the scenes (one for each resize), which is a bit less, but not tremendously better. And as you pointed out toIntArray on top of that is yet another array creation.
Unless you know the number of elements in the sequence up front, I don't believe there is much you can do about that (the string builder approach will also likely use a resizable array behind the scenes, but at least you don't need extra array copies).
val result = codePoints.map { Character.toString(it) }.joinToString("")
Edit, based on Joffrey's comment below:
val result = codePoints.joinToString("") { Character.toString(it) }
Additional edit, full example:
val codePoints: Sequence<Int> = sequenceOf(
'a'.code,
Character.toCodePoint(0xD83D.toChar(), 0xDE03.toChar()),
Character.toCodePoint(0xD83D.toChar(), 0xDE04.toChar()),
Character.toCodePoint(0xD83D.toChar(), 0xDE05.toChar())
)
val result = codePoints.joinToString("") { Character.toString(it) }
println(result)
This will print: a😃😄😅
What would be the kotlin way to handle multiple string concatenation?
--edit--
placing the piece of code that led me to this doubt
fun getNCharsFromRange(n: Int, range: CharRange): String {
val chars = range.toList()
val buffer = StringBuffer()
while (buffer.length < n) {
val randomInt = Random.Default.nextInt(0, chars.lastIndex)
val newchar = chars[randomInt]
val lastChar = buffer.lastOrNull() ?: ""
if (newchar != lastChar) {
buffer.append(newchar)
}
}
return buffer.toString()
}
A StringBuilder is the standard way to construct a String in Kotlin, as in Java.
(Unless it can be done in one line, of course, where a string template is usually better than Java-style concatenation.)
Kotlin has one improvement, though: you can use buildString to handle that implicitly, which can make the code a little more concise. For example, your code can be written as:
fun getNCharsFromRange(n: Int, range: CharRange): String {
val chars = range.toList()
return buildString {
while (length < n) {
val randomInt = Random.Default.nextInt(0, chars.lastIndex)
val newChar = chars[randomInt]
val lastChar = lastOrNull() ?: ""
if (newChar != lastChar)
append(newChar)
}
}
}
That has no mention of buffer: instead, buildString creates a StringBuilder, makes it available as this, and then returns the resulting String. (So length, lastOrNull(), and append refer to the StringBuilder.)
For short code, this can be significantly more concise and clearer; though the benefits are much less clear with longer code. (Your code may be in the grey area between…)
Worth pointing out that the function name is misleading: it avoids repeated characters, but allows duplicates that are not consecutive. If that's deliberate, then it would be worth making clear in the function name (and/or its doc comment). Alternatively, if the intent is to avoid all duplicates, then there's an approach which is much simpler and/or more efficient: to shuffle the range (or at least part of it).
Using existing library functions, and making it an extension function on CharRange, the whole thing could be as simple as:
fun CharRange.randomChars(n: Int) = shuffled().take(n).joinToString("")
That shuffles the whole list, even if only a few characters are needed.  So it would be even more efficient to shuffle just the part needed. But there's no library function for that, so you'd have to write that manually. I'll leave it as an exercise!
"When you've found the treasure, stop digging!"
I'm wanting to use more functional programming in Groovy, and thought rewriting the following method would be good training. It's harder than it looks because Groovy doesn't appear to build short-circuiting into its more functional features.
Here's an imperative function to do the job:
fullyQualifiedNames = ['a/b/c/d/e', 'f/g/h/i/j', 'f/g/h/d/e']
String shortestUniqueName(String nameToShorten) {
def currentLevel = 1
String shortName = ''
def separator = '/'
while (fullyQualifiedNames.findAll { fqName ->
shortName = nameToShorten.tokenize(separator)[-currentLevel..-1].join(separator)
fqName.endsWith(shortName)
}.size() > 1) {
++currentLevel
}
return shortName
}
println shortestUniqueName('a/b/c/d/e')
Result: c/d/e
It scans a list of fully-qualified filenames and returns the shortest unique form. There are potentially hundreds of fully-qualified names.
As soon as the method finds a short name with only one match, that short name is the right answer, and the iteration can stop. There's no need to scan the rest of the name or do any more expensive list searches.
But turning to a more functional flow in Groovy, neither return nor break can drop you out of the iteration:
return simply returns from the present iteration, not from the whole .each so it doesn't short-circuit.
break isn't allowed outside of a loop, and .each {} and .eachWithIndex {} are not considered loop constructs.
I can't use .find() instead of .findAll() because my program logic requires that I scan all elements of the list, nut just stop at the first.
There are plenty of reasons not to use try..catch blocks, but the best I've read is from here:
Exceptions are basically non-local goto statements with all the
consequences of the latter. Using exceptions for flow control
violates the principle of least astonishment, make programs hard to read
(remember that programs are written for programmers first).
Some of the usual ways around this problem are detailed here including a solution based on a new flavour of .each. This is the closest to a solution I've found so far, but I need to use .eachWithIndex() for my use case (in progress.)
Here's my own poor attempt at a short-circuiting functional solution:
fullyQualifiedNames = ['a/b/c/d/e', 'f/g/h/i/j', 'f/g/h/d/e']
def shortestUniqueName(String nameToShorten) {
def found = ''
def final separator = '/'
def nameComponents = nameToShorten.tokenize(separator).reverse()
nameComponents.eachWithIndex { String _, int i ->
if (!found) {
def candidate = nameComponents[0..i].reverse().join(separator)
def matches = fullyQualifiedNames.findAll { String fqName ->
fqName.endsWith candidate
}
if (matches.size() == 1) {
found = candidate
}
}
}
return found
}
println shortestUniqueName('a/b/c/d/e')
Result: c/d/e
Please shoot me down if there is a more idiomatic way to short-circuit in Groovy that I haven't thought of. Thank you!
There's probably a cleaner looking (and easier to read) solution, but you can do this sort of thing:
String shortestUniqueName(String nameToShorten) {
// Split the name to shorten, and make a list of all sequential combinations of elements
nameToShorten.split('/').reverse().inject([]) { agg, l ->
if(agg) agg + [agg[-1] + l] else agg << [l]
}
// Starting with the smallest element
.find { elements ->
fullyQualifiedNames.findAll { name ->
name.endsWith(elements.reverse().join('/'))
}.size() == 1
}
?.reverse()
?.join('/')
?: ''
}
I might be missing something but recently I came across a task to get last symbols according to some condition. For example I have a string: "this_is_separated_values_5". Now I want to extract 5 as Int.
Note: number of parts separated by _ is not defined.
If I would have a method takeRightWhile(f: Char => Boolean) on a string it would be trivial: takeRightWhile(ch => ch != '_'). Moreover it would be efficient: a straightforward implementation would actually involve finding the last index of _ and taking a substring while the use of this method would save first step and provide better average time complexity.
UPDATE: Guys, all the variations of str.reverse.takeWhile(_!='_').reverse are quite inefficient as you actually use additional O(n) space. If you want to implement method takeRightWhile efficiently you could iterate starting from the right, accumulating result in string builder of whatever else, and returning the result. I am asking about this kind of method, not implementation which was already described and declined in the question itself.
Question: Does this kind of method exist in scala standard library? If no, is there method combination from the standard library to achieve the same in minimum amount of lines?
Thanks in advance.
Possible solution:
str.reverse.takeWhile(_!='_').reverse
Update
You can go from right to left with following expression using foldRight:
str.toList.foldRight(List.empty[Char]) {
case (item, acc) => item::acc
}
Here you need to check condition and stop adding items after condition met. For this you can pass a flag to accumulated value:
val (_, list) = str.toList.foldRight((false, List.empty[Char])) {
case (item, (false, list)) if item!='_' => (false, item::list)
case (_, (_, list)) => (true, list)
}
val res = list.mkString.toInt
This solution is even more inefficient then solution with double reverse:
Implementation of foldRight uses combination of List reverse and foldLeft
You cannot break foldRight execution, so you need flag to skip all items after condition met
I'd go with this:
val s = "string_with_following_number_42"
s.split("_").reverse.head
// res:String = 42
This is a naive attempt and by no means optimized. What it does is splitting the String into an Array of Strings, reverses it and takes the first element. Note that, because the reversing happens after the splitting, the order of the characters is correct.
I am not exactly sure about the problem you are facing. My understanding is that you want have a string of format xxx_xxx_xx_...._xxx_123 and you want to extract the part at the end as Int.
import scala.util.Try
val yourStr = "xxx_xxx_xxx_xx...x_xxxxx_123"
val yourInt = yourStr.split('_').last.toInt
// But remember that the above is unsafe so you may want to take it as Option
val yourIntOpt = Try(yourStr.split('_').last.toInt).toOption
Or... lets say your requirement is to collect a right-suffix till some boolean condition remains true.
import scala.util.Try
val yourStr = "xxx_xxx_xxx_xx...x_xxxxx_123"
val rightSuffix = yourStr.reverse.takeWhile(c => c != '_').reverse
val yourInt = rightSuffix.toInt
// but above is unsafe so
val yourIntOpt = Try(righSuffix.toInt).toOption
Comment if your requirement is different from this.
You can use StringBuilder and lastIndexWhere.
val str = "this_is_separated_values_5"
val sb = new StringBuilder(str)
val lastIdx = sb.lastIndexWhere(ch => ch != '_')
val lastCh = str.charAt(lastIdx)