Merge two strings in kotlin - string

I have two strings
val a = "abc"
val b = "xyz"
I want to merge it and need output like below
axbycz
I added both strings to arraylist and then flatmap it
val c = listOf(a, b)
val d = c.flatMap {
it.toList()
}
but not getting the desired result

Use the zip function. It creates a list of pairs with "adjacent" letters. You can then use joinToString with a transformer to create your final result.
a.zip(b) // Returns the list [(a, x), (b, y), (c, z)]
.joinToString("") { (a, b) -> "$a$b" } // Joins the list back to a string with no separator

You can always use a simple loop, assuming both strings have the same size. That way You only allocate a StringBuilder and counter variable, without any lists, arrays or pairs:
val a = "abc"
val b = "xyz"
val sb = StringBuilder()
for(i in 0 until a.length){
sb.append(a[i]).append(b[i])
}
val d = sb.toString()

marstran's answer is really concise and Pawels answer is really fast. Using buildString you can have to best of both worlds:
buildString {
a.zip(b).forEach { (a, b) ->
append(a).append(b)
}
}
buildString creates a StringBuilder and offers it as receiver in the lambda. It returns the built string.
Try it out here: Kotlin Playground. Thanks to Pawel for creating the original benchmark.

Related

How to build a map from a string, counting the occurrences of each letter?

The following method is supposed to count the number of occurrences of every char in a given string:
def countLetters(text: String): Map[Char, Int] = ???
For example, the input string "aabaabcab" should be mapped to
Map(a -> 5, b -> 3, c -> 1)
Here is a straightforward iterative approach:
def countLetters(text: String): Map[Char, Int] = {
val h = collection.mutable.HashMap.empty[Char, Int]
for (c <- text)
h(c) = h.getOrElse(c, 0) + 1
h.toMap
}
Is there any way to implement it without looping and explicitly allocating mutable hash maps?
Today finally did it! This is what worked for me:
text.foldLeft(Map[Char, Int]() withDefaultValue 0){(h, c) => h.updated(c, h(c)+1)}
Good luck on your next ones if you are reading this! ;)

scala - string parsing without Regex

I have various types of strings like the following:
sales_data_type
saledatatypes
sales_data.new.metric1
sales_data.type.other.metric2
sales_data.type3.metric3
I'm trying to parse them to get a substring with a word before and after the last dot. For example: new.metric1, other.metric2, type3.metric3. If a word doesn't contain dots, it has to be returned as is: sales_data_type, saledatatypes.
With a Regex it may be done this way:
val infoStr = "sales_data.type.other.metric2"
val pattern = ".*?([^.]+\\.)?([^.]+)$"
println(infoStr.replaceAll(pattern, "$1$2"))
// prints other.metric2
// for saledatatypes just prints nullsaledatatypes ??? but seems to work
I want to find a way to achieve this with Scala, without using Regex in order to expand my understanding of Scala features. Will be grateful for any ideas.
One-liner:
dataStr.split('.').takeRight(2).mkString(".")
takeRight(2) will take the last 2 if there are 2 to take, else it will take the last, and only, 1. mkString(".") will re-insert the dot only if there are 2 elements for the dot to go between, else it will leave the string unaltered.
Here's one with lots of scala features for you.
val string = "head.middle.last"
val split = string.split('.') // Array(head, middle, last)
val result = split.toSeq match {
case Seq(word) ⇒ word
case _ :+ before :+ after ⇒ s"$before.$after"
}
println(result) // middle.last
First we split the string on your . and get individual parts.
Then we pattern match those parts, first to check if there is only one (in which case we just return it), and second to grab the last two elements in the seq.
Finally we put a . back in between those last two using string interpolation.
One way of doing it:
val value = "sales_data.type.other.metric2"
val elems = value.split("\\.").toList
elems match {
case _:+beforeLast:+last => s"${beforeLast}.${last}"
case _ => throw new NoSuchElementException
}
for(s<-strs) yield {val s1 = s.split('.');
if(s1.size>=2)s1.takeRight(2).mkString(".") else s }
or
for(s<-strs) yield { val s1 = s.split('.');
if(s1.size>=2)s1.init.last+'.'+s1.last else s }
In Scala REPL:
scala> val strs =
Vector("sales_data_type","saledatatypes","sales_data.new.metric1","sales_data.type.other.metric2","sales_d
ata.type3.metric3")
strs: scala.collection.immutable.Vector[String] = Vector(sales_data_type, saledatatypes, sales_data.new.metric1, sales_data.
type.other.metric2, sales_data.type3.metric3)
scala> for(s<-strs) yield { val s1 = s.split('.');if(s1.size>=2)s1.takeRight(2).mkString(".") else s }
res62: scala.collection.immutable.Vector[String] = Vector(sales_data_type, saledatatypes, new.metric1, other.metric2, type3.
metric3)
scala> for(s<-strs) yield { val s1 = s.split('.');if(s1.size>=2)s1.init.last+'.'+s1.last else s }
res60: scala.collection.immutable.Vector[String] = Vector(sales_data_type, saledatatypes, new.metric1, other.metric2, type3.
metric3)
Use scala match and do like this
def getFormattedStr(str:String):String={
str.contains(".") match{
case true=>{
val arr=str.split("\\.")
val len=arr.length
len match{
case 1=>str
case _=>arr(len-2)+"."+arr(len-1)
}
}
case _=>str
}
}

Document Count of a Word in Spark/Scala

I have a text variable which is an RDD of String in scala
val data = sc.parallelize(List("i am a good boy.Are you a good boy.","You are also working here.","I am posting here today.You are good."))
I have another variable in Scala Map(given below)
//list of words for which doc count needs to be found,initial doc count is 1
val dictionary = Map( """good""" -> 1,"""working""" -> 1,"""posting""" -> 1 ).
I want to do a document count of each of the dictionary terms and get the output in key value format
My output should be like below for the above data.
(good,2)
(working,1)
(posting,1)
What i have tried is
dictionary.map { case(k,v) => k -> k.r.findFirstIn(data.map(line => line.trim()).collect().mkString(",")).size}
I am getting counts as 1 for all the words.
Please help me in fixing the above line
Thanks in advance.
Why not use flatMap to create the dictionary and then you can query that.
val dictionary = data.flatMap {case line => line.split(" ")}.map {case word => (word, 1)}.reduceByKey(_+_)
If I collect this in the REPL I get the following result:
res9: Array[(String, Int)] = Array((here,1), (good.,1), (good,2), (here.,1), (You,1), (working,1), (today.You,1), (boy.Are,1), (are,2), (a,2), (posting,1), (i,1), (boy.,1), (also,1), (I,1), (am,2), (you,1))
Obviously you would need to do a better split than in my simple example.
First of all your dictionary should be a Set, because in general sense you need to map the Set of terms to the number of documents which contain them.
So your data should look like:
scala> val docs = List("i am a good boy.Are you a good boy.","You are also working here.","I am posting here today.You are good.")
docs: List[String] = List(i am a good boy.Are you a good boy., You are also working here., I am posting here today.You are good.)
Your dictionary should look like:
scala> val dictionary = Set("good", "working", "posting")
dictionary: scala.collection.immutable.Set[String] = Set(good, working, posting)
Then you have to implement your transformation, for the simplest logic of the contains function it might look like:
scala> dictionary.map(k => k -> docs.count(_.contains(k))) toMap
res4: scala.collection.immutable.Map[String,Int] = Map(good -> 2, working -> 1, posting -> 1)
For better solution I'd recommend you to implement specific function for your requirements
(String, String) => Boolean
to determine the presence of the term in the document:
scala> def foo(doc: String, term: String): Boolean = doc.contains(term)
foo: (doc: String, term: String)Boolean
Then final solution will look like:
scala> dictionary.map(k => k -> docs.count(d => foo(d, k))) toMap
res3: scala.collection.immutable.Map[String,Int] = Map(good -> 2, working -> 1, posting -> 1)
The last thing you have to do is to calculate the result map using SparkContext. First of all you have to define what data you want to have parallelised. Let's assume we want to parallelize the collection of the documents, then solution might be like following:
val docsRDD = sc.parallelize(List(
"i am a good boy.Are you a good boy.",
"You are also working here.",
"I am posting here today.You are good."
))
docsRDD.mapPartitions(_.map(doc => dictionary.collect {
case term if doc.contains(term) => term -> 1
})).map(_.toMap) reduce { case (m1, m2) => merge(m1, m2) }
def merge(m1: Map[String, Int], m2: Map[String, Int]) =
m1 ++ m2 map { case (k, v) => k -> (v + m1.getOrElse(k, 0)) }

Scala - modify strings in a list based on their number of occurences

Another Scala newbie question since I am not getting how to achieve this in a functional way (mostly coming from a scripting language background):
I have a list of strings:
val food-list = List("banana-name", "orange-name", "orange-num", "orange-name", "orange-num", "grape-name")
and where they are duplicated, I'd like to add an incrementing number into the string and get that in a list similar to the input list, like so:
List("banana-name", "orange1-name", "orange1-num", "orange2-name", "orange2-num", "grape-name")
I've grouped them up to get counts for them with:
val freqs = list.groupBy(identity).mapValues(v => List.range(1, v.length + 1))
Which gives me:
Map(orange-num -> List(1, 2), banana-name -> List(1), grape-name -> List(1), orange-name -> List(1, 2))
The order of the list is important (it should be in the original order of food-list) so I know it's problematic for me to use a Map at this point. The closest I feel I have gotten to a solution is:
food-list.map{l =>
if (freqs(l).length > 1){
freqs(l).map(n =>
l.split("-")(0) + n.toString + "-" + l.split("-")(1))
} else {
l
}
}
This of course gives me a wonky output since I am mapping the list of frequencies from the words value in freqs
List(banana-name, List(orange1-name, orange2-name), List(orange1-num, orange2-num), List(orange1-name, orange2-name), List(orange1-num, orange2-num), grape-name)
How is this done in a Scala fp way without resorting to clumsy for loops and counters?
If the indices are important, sometimes it's best to keep track of them explicitly using zipWithIndex (very similar to Python's enumerate):
food-list.zipWithIndex.groupBy(_._1).values.toList.flatMap{
//if only one entry in this group, don't change the values
//x is actually a tuple, could write case (str, idx) :: Nil => (str, idx) :: Nil
case x :: Nil => x :: Nil
//case where there are duplicate strings
case xs => xs.zipWithIndex.map {
//idx is index in the original list, n is index in the new list i.e. count
case ((str, idx), n) =>
//destructuring assignment, like python's (fruit, suffix) = ...
val Array(fruit, suffix) = str.split("-")
//string interpolation, returning a tuple
(s"$fruit${n+1}-$suffix", idx)
}
//We now have our list of (string, index) pairs;
//sort them and map to a list of just strings
}.sortBy(_._2).map(_._1)
Efficient and simple:
val food = List("banana-name", "orange-name", "orange-num",
"orange-name", "orange-num", "grape-name")
def replaceName(s: String, n: Int) = {
val tokens = s.split("-")
tokens(0) + n + "-" + tokens(1)
}
val indicesMap = scala.collection.mutable.HashMap.empty[String, Int]
val res = food.map { name =>
{
val n = indicesMap.getOrElse(name, 1)
indicesMap += (name -> (n + 1))
replaceName(name, n)
}
}
Here is an attempt to provide what you expected with foldLeft:
foodList.foldLeft((List[String](), Map[String, Int]()))//initial value
((a/*accumulator, list, map*/, v/*value from the list*/)=>
if (a._2.isDefinedAt(v))//already seen
(s"$v+${a._2(v)}" :: a._1, a._2.updated(v, a._2(v) + 1))
else
(v::a._1, a._2.updated(v, 1)))
._1/*select the list*/.reverse/*because we created in the opposite order*/

HowTo get a Map from a csv string

I'm fairly new to Scala, but I'm doing my exercises now.
I have a string like "A>Augsburg;B>Berlin". What I want at the end is a map
val mymap = Map("A"->"Augsburg", "B"->"Berlin")
What I did is:
val st = locations.split(";").map(dynamicListExtract _)
with the function
private def dynamicListExtract(input: String) = {
if (input contains ">") {
val split = input split ">"
Some(split(0), split(1)) // return key , value
} else {
None
}
}
Now I have an Array[Option[(String, String)
How do I elegantly convert this into a Map[String, String]
Can anybody help?
Thanks
Just change your map call to flatMap:
scala> sPairs.split(";").flatMap(dynamicListExtract _)
res1: Array[(java.lang.String, java.lang.String)] = Array((A,Augsburg), (B,Berlin))
scala> Map(sPairs.split(";").flatMap(dynamicListExtract _): _*)
res2: scala.collection.immutable.Map[java.lang.String,java.lang.String] = Map((A,Augsburg), (B,Berlin))
For comparison:
scala> Map("A" -> "Augsburg", "B" -> "Berlin")
res3: scala.collection.immutable.Map[java.lang.String,java.lang.String] = Map((A,Augsburg), (B,Berlin))
In 2.8, you can do this:
val locations = "A>Augsburg;B>Berlin"
val result = locations.split(";").map(_ split ">") collect { case Array(k, v) => (k, v) } toMap
collect is like map but also filters values that aren't defined in the partial function. toMap will create a Map from a Traversable as long as it's a Traversable[(K, V)].
It's also worth seeing Randall's solution in for-comprehension form, which might be clearer, or at least give you a better idea of what flatMap is doing.
Map.empty ++ (for(possiblePair<-sPairs.split(";"); pair<-dynamicListExtract(possiblePair)) yield pair)
A simple solution (not handling error cases):
val str = "A>Aus;B>Ber"
var map = Map[String,String]()
str.split(";").map(_.split(">")).foreach(a=>map += a(0) -> a(1))
but Ben Lings' is better.
val str= "A>Augsburg;B>Berlin"
Map(str.split(";").map(_ split ">").map(s => (s(0),s(1))):_*)
--or--
str.split(";").map(_ split ">").foldLeft(Map[String,String]())((m,s) => m + (s(0) -> s(1)))

Resources