HowTo get a Map from a csv string - string

I'm fairly new to Scala, but I'm doing my exercises now.
I have a string like "A>Augsburg;B>Berlin". What I want at the end is a map
val mymap = Map("A"->"Augsburg", "B"->"Berlin")
What I did is:
val st = locations.split(";").map(dynamicListExtract _)
with the function
private def dynamicListExtract(input: String) = {
if (input contains ">") {
val split = input split ">"
Some(split(0), split(1)) // return key , value
} else {
None
}
}
Now I have an Array[Option[(String, String)
How do I elegantly convert this into a Map[String, String]
Can anybody help?
Thanks

Just change your map call to flatMap:
scala> sPairs.split(";").flatMap(dynamicListExtract _)
res1: Array[(java.lang.String, java.lang.String)] = Array((A,Augsburg), (B,Berlin))
scala> Map(sPairs.split(";").flatMap(dynamicListExtract _): _*)
res2: scala.collection.immutable.Map[java.lang.String,java.lang.String] = Map((A,Augsburg), (B,Berlin))
For comparison:
scala> Map("A" -> "Augsburg", "B" -> "Berlin")
res3: scala.collection.immutable.Map[java.lang.String,java.lang.String] = Map((A,Augsburg), (B,Berlin))

In 2.8, you can do this:
val locations = "A>Augsburg;B>Berlin"
val result = locations.split(";").map(_ split ">") collect { case Array(k, v) => (k, v) } toMap
collect is like map but also filters values that aren't defined in the partial function. toMap will create a Map from a Traversable as long as it's a Traversable[(K, V)].

It's also worth seeing Randall's solution in for-comprehension form, which might be clearer, or at least give you a better idea of what flatMap is doing.
Map.empty ++ (for(possiblePair<-sPairs.split(";"); pair<-dynamicListExtract(possiblePair)) yield pair)

A simple solution (not handling error cases):
val str = "A>Aus;B>Ber"
var map = Map[String,String]()
str.split(";").map(_.split(">")).foreach(a=>map += a(0) -> a(1))
but Ben Lings' is better.

val str= "A>Augsburg;B>Berlin"
Map(str.split(";").map(_ split ">").map(s => (s(0),s(1))):_*)
--or--
str.split(";").map(_ split ">").foldLeft(Map[String,String]())((m,s) => m + (s(0) -> s(1)))

Related

Merge two strings in kotlin

I have two strings
val a = "abc"
val b = "xyz"
I want to merge it and need output like below
axbycz
I added both strings to arraylist and then flatmap it
val c = listOf(a, b)
val d = c.flatMap {
it.toList()
}
but not getting the desired result
Use the zip function. It creates a list of pairs with "adjacent" letters. You can then use joinToString with a transformer to create your final result.
a.zip(b) // Returns the list [(a, x), (b, y), (c, z)]
.joinToString("") { (a, b) -> "$a$b" } // Joins the list back to a string with no separator
You can always use a simple loop, assuming both strings have the same size. That way You only allocate a StringBuilder and counter variable, without any lists, arrays or pairs:
val a = "abc"
val b = "xyz"
val sb = StringBuilder()
for(i in 0 until a.length){
sb.append(a[i]).append(b[i])
}
val d = sb.toString()
marstran's answer is really concise and Pawels answer is really fast. Using buildString you can have to best of both worlds:
buildString {
a.zip(b).forEach { (a, b) ->
append(a).append(b)
}
}
buildString creates a StringBuilder and offers it as receiver in the lambda. It returns the built string.
Try it out here: Kotlin Playground. Thanks to Pawel for creating the original benchmark.

Creating Pair RDDs

From the below RDD , I would like to create a pair RDD.
val line = sc.parallelize(Array("2,SMITH,AARON"))
I used the below code:
val pair = line.map(x => (x.split(",")(0).toInt, x))
The output generated is Array[(Int, String)] = Array((2,2,SMITH,AARON))
But I would like the desired output to be Array[(Int, String)] = Array((2,SMITH,AARON))
Pls help me out.
I am a newbie.
Just take the rest:
val pair = line.map(x => x.split(",") match {
case Array(x, xs # _ *) => (x.toInt, xs.join(",")}
})
Easy way to do this is to split and get the array in each position
line.map(r => {
val split = r.split(",")
(split(0).toInt, (split.tail.mkString(",")))
})

Document Count of a Word in Spark/Scala

I have a text variable which is an RDD of String in scala
val data = sc.parallelize(List("i am a good boy.Are you a good boy.","You are also working here.","I am posting here today.You are good."))
I have another variable in Scala Map(given below)
//list of words for which doc count needs to be found,initial doc count is 1
val dictionary = Map( """good""" -> 1,"""working""" -> 1,"""posting""" -> 1 ).
I want to do a document count of each of the dictionary terms and get the output in key value format
My output should be like below for the above data.
(good,2)
(working,1)
(posting,1)
What i have tried is
dictionary.map { case(k,v) => k -> k.r.findFirstIn(data.map(line => line.trim()).collect().mkString(",")).size}
I am getting counts as 1 for all the words.
Please help me in fixing the above line
Thanks in advance.
Why not use flatMap to create the dictionary and then you can query that.
val dictionary = data.flatMap {case line => line.split(" ")}.map {case word => (word, 1)}.reduceByKey(_+_)
If I collect this in the REPL I get the following result:
res9: Array[(String, Int)] = Array((here,1), (good.,1), (good,2), (here.,1), (You,1), (working,1), (today.You,1), (boy.Are,1), (are,2), (a,2), (posting,1), (i,1), (boy.,1), (also,1), (I,1), (am,2), (you,1))
Obviously you would need to do a better split than in my simple example.
First of all your dictionary should be a Set, because in general sense you need to map the Set of terms to the number of documents which contain them.
So your data should look like:
scala> val docs = List("i am a good boy.Are you a good boy.","You are also working here.","I am posting here today.You are good.")
docs: List[String] = List(i am a good boy.Are you a good boy., You are also working here., I am posting here today.You are good.)
Your dictionary should look like:
scala> val dictionary = Set("good", "working", "posting")
dictionary: scala.collection.immutable.Set[String] = Set(good, working, posting)
Then you have to implement your transformation, for the simplest logic of the contains function it might look like:
scala> dictionary.map(k => k -> docs.count(_.contains(k))) toMap
res4: scala.collection.immutable.Map[String,Int] = Map(good -> 2, working -> 1, posting -> 1)
For better solution I'd recommend you to implement specific function for your requirements
(String, String) => Boolean
to determine the presence of the term in the document:
scala> def foo(doc: String, term: String): Boolean = doc.contains(term)
foo: (doc: String, term: String)Boolean
Then final solution will look like:
scala> dictionary.map(k => k -> docs.count(d => foo(d, k))) toMap
res3: scala.collection.immutable.Map[String,Int] = Map(good -> 2, working -> 1, posting -> 1)
The last thing you have to do is to calculate the result map using SparkContext. First of all you have to define what data you want to have parallelised. Let's assume we want to parallelize the collection of the documents, then solution might be like following:
val docsRDD = sc.parallelize(List(
"i am a good boy.Are you a good boy.",
"You are also working here.",
"I am posting here today.You are good."
))
docsRDD.mapPartitions(_.map(doc => dictionary.collect {
case term if doc.contains(term) => term -> 1
})).map(_.toMap) reduce { case (m1, m2) => merge(m1, m2) }
def merge(m1: Map[String, Int], m2: Map[String, Int]) =
m1 ++ m2 map { case (k, v) => k -> (v + m1.getOrElse(k, 0)) }

Unable to access String value

Why unable to access the String value ?
I would expect the s1 to be "a" but instead its Ljava.lang.String;#d70d7a ?
val it = Iterator("(a,((a,b),1.0))") //> it : Iterator[String] = non-empty iterator
val s1 = it.next.replace("(" , "").replace(")" , "").split(",").toString.split(",")
//> s1 : Array[String] = Array([Ljava.lang.String;#d70d7a)
println("s1 is "+s1(0)) //> s1 is [Ljava.lang.String;#d70d7a
Let's go command-by-command:
val it = Iterator("(a,((a,b),1.0))")
// here we got iterator on one String
val s1 = it.next // "(a,((a,b),1.0))"
.replace("(" , "") // "a,a,b),1.0))"
.replace(")" , "") // "a,a,b,1.0"
.split(",") // multiple lines in array: "a", "a","b","1.0"
.toString // Array[String].toString returns what you got: Ljava.lang.String;#d70d7a
.split(",") // one String (because there's no "," signs)
Maybe you should run toList before toString, because toString is defined in a way you expect it to be defined in this implementation of List:
val s1 = ...
.split(",")
.toList
.toString
...
Maybe you should look at Java: split() returns [Ljava.lang.String;#186d4c1], why? for clarification.
.split(",") makes Array and .toString doesn't work on Array and after that you are again spliting by .split(",")
which i don't think helpful.
and you can also use replaceAll in place of multiple replace
scala> val it = Iterator("(a,((a,b),1.0))")
it: Iterator[String] = non-empty iterator
scala> val s1 = it.next.replaceAll("[()]" , "").split(",")
s1: Array[String] = Array(a, a, b, 1.0)
scala> println("s1 is "+s1(0))
s1 is a

Is there a combination of takeWhile,dropWhile in Scala?

In Scala, I want to split a string at a specific character like so:
scala> val s = "abba.aadd"
s: String = abba.aadd
scala> val (beforeDot,afterDot) = (s takeWhile (_!='.'), s dropWhile (_!='.'))
beforeDot: String = abba
afterDot: String = .aadd
This solution is slightly inefficient (maybe not asymptotically), but I have the feeling something like this might exist in the standard library already. Any ideas?
There is a span method:
scala> val (beforeDot, afterDot) = s.span{ _ != '.' }
beforeDot: String = abba
afterDot: String = .aadd
From the Scala documentation:
c span p is equivalent to (but possibly more efficient than) (c takeWhile p, c dropWhile p), provided the evaluation of the predicate p does not cause any side-effects.
You can use splitAt for what you want:
val s = "abba.aadd"
val (before, after) = s.splitAt(s.indexOf('.'))
Output:
before: String = abba
after: String = .aadd

Resources