How to split a string given a list of positions in Scala - string

How would you write a funcitonal implementation for split(positions:List[Int], str:String):List[String], which is similar to splitAt but splits a given string into a list of strings by a given list of positions?
For example
split(List(1, 2), "abc") returns List("a", "b", "c")
split(List(1), "abc") returns List("a", "bc")
split(List(), "abc") returns List("abc")

def lsplit(pos: List[Int], str: String): List[String] = {
val (rest, result) = pos.foldRight((str, List[String]())) {
case (curr, (s, res)) =>
val (rest, split) = s.splitAt(curr)
(rest, split :: res)
}
rest :: result
}

Something like this:
def lsplit(pos: List[Int], s: String): List[String] = pos match {
case x :: rest => s.substring(0,x) :: lsplit(rest.map(_ - x), s.substring(x))
case Nil => List(s)
}
(Fair warning: not tail recursive so will blow the stack for large lists; not efficient due to repeated remapping of indices and chains of substrings. You can solve these things by adding additional arguments and/or an internal method that does the recursion.)

How about ....
def lSplit( indices : List[Int], s : String) = (indices zip (indices.tail)) map { case (a,b) => s.substring(a,b) }
scala> lSplit( List(0,4,6,8), "20131103")
List[String] = List(2013, 11, 03)

Related

Comparing Lists of Strings in Scala

I know lists are immutable but I'm still confused on how I would go about this. I have two lists of strings - For example:
var list1: List[String] = List("M", "XW1", "HJ", "K")
var list2: List[String] = List("M", "XW4", "K", "YN")
I want to loop through these lists and see if the elements match. If it doesn't, the program would immediately return false. If it is a match, it will continue to iterate until it finds an element that begins with X. If it is indeed an X, I want to return true regardless of whether the number is the same or not.
Problem I'm having is that currently I have a conditional stating that if the two elements do not match, return false immediately. This is a problem because obviously XW1 and XW4 are not the same and it will return false. How can I bypass this and determine that it is a match to my eyes regardless of the number?
I also have a counter a two length variables to account for the fact the lists may be of differing length. My counter goes up to the shortest list: for (x <- 0 to (c-1)) (c being the counter).
You want to use zipAll & forall.
def compareLists(l1: List[String], l2: List[String]): Boolean =
l1.zipAll(l2, "", "").forall {
case (x, y) =>
(x == y) || (x.startsWith("X") && y.startsWith("X"))
}
Note that I am assuming an empty string will always be different than any other element.
If I understand your requirement correctly, to be considered a match, 1) each element in the same position of the two lists being simultaneously iterated must be the same except when both start with X (in which case it should return true without comparing any further), and 2) both lists must be of the same size.
If that's correct, I would recommend using a simple recursive function like below:
def compareLists(ls1: List[String], ls2: List[String]): Boolean = (ls1, ls2) match {
case (Nil, Nil) =>
true
case (h1 :: t1, h2 :: t2) =>
if (h1.startsWith("X") && h2.startsWith("X"))
true // short-circuiting
else
if (h1 != h2)
false
else
compareLists(t1, t2)
case _ =>
false
}
Based on your comment that, result should be true for lists given in question, you could do something like this:
val list1: List[String] = List("M", "XW1", "HJ", "K")
val list2: List[String] = List("M", "XW4", "K", "YN")
val (matched, unmatched) = list1.zipAll(list2, "", "").partition { case (x, y) => x == y }
val result = unmatched match {
case Nil => true
case (x, y) :: _ => (x.startsWith("X") && y.startsWith("X"))
}
You could also use cats foldM to iterate through the lists and terminate early if there is either (a) a mismatch, or (b) two elements that begin with 'X':
import cats.implicits._
val list1: List[String] = List("M", "XW1", "HJ", "K")
val list2: List[String] = List("M", "XW4", "K", "YN")
list1.zip(list2).foldM(()){
case (_, (s1, s2)) if s1 == s2 => ().asRight
case (_, (s1, s2)) if s1.startsWith("X") && s2.startsWith("X") => true.asLeft
case _ => false.asLeft
}.left.getOrElse(false)

Longest common suffix

I I would like to find the longest common suffix of two strings in Scala.
def longestSuffix(s1: String, s2: String) = {
val it = (s1.reverseIterator zip s2.reverseIterator) takeWhile {case (x, y) => x == y}
it.map (_._1).toList.reverse.mkString
}
This code is clumsy and probably inefficient (e.g. because of reversing). How would find the longest common suffix functionally, i.e. without mutable variables ?
One way to improve it would be to connect reverse and map in last operation:
str1.reverseIterator.zip(str2.reverseIterator).takeWhile( c => c._1 == c._2)
.toList.reverseMap(c => c._1) mkString ""
firstly make a list, and then reverseMap this list
We can iterate over substrings, without reverse:
def longestSuffix(s1: String, s2: String) = {
s1.substring(s1.length to 0 by -1 takeWhile { n => s2.endsWith(s1.substring(n)) } last)
}
Let tails produce the sub-strings and then return the first that fits.
def longestSuffix(s1: String, s2: String) =
s1.tails.dropWhile(!s2.endsWith(_)).next
Some efficiency might be gained by calling tails on the shorter of the two inputs.
I came up with a solution like this:def commonSuffix(s1: String, s2: String): String = {
val n = (s1.reverseIterator zip s2.reverseIterator) // mutable !
.takeWhile {case (a, b) => a == b}
.size
s1.substring(s1.length - n) // is it efficient ?
}
Note that I am using substring for efficiency (not sure if it's correct).
This solution also is not completely "functional" since I am using reverseIterator despite it's mutable because I did not find another way to iterate over strings in reverse order. How would you suggest fix/improve it ?

Scala - modify strings in a list based on their number of occurences

Another Scala newbie question since I am not getting how to achieve this in a functional way (mostly coming from a scripting language background):
I have a list of strings:
val food-list = List("banana-name", "orange-name", "orange-num", "orange-name", "orange-num", "grape-name")
and where they are duplicated, I'd like to add an incrementing number into the string and get that in a list similar to the input list, like so:
List("banana-name", "orange1-name", "orange1-num", "orange2-name", "orange2-num", "grape-name")
I've grouped them up to get counts for them with:
val freqs = list.groupBy(identity).mapValues(v => List.range(1, v.length + 1))
Which gives me:
Map(orange-num -> List(1, 2), banana-name -> List(1), grape-name -> List(1), orange-name -> List(1, 2))
The order of the list is important (it should be in the original order of food-list) so I know it's problematic for me to use a Map at this point. The closest I feel I have gotten to a solution is:
food-list.map{l =>
if (freqs(l).length > 1){
freqs(l).map(n =>
l.split("-")(0) + n.toString + "-" + l.split("-")(1))
} else {
l
}
}
This of course gives me a wonky output since I am mapping the list of frequencies from the words value in freqs
List(banana-name, List(orange1-name, orange2-name), List(orange1-num, orange2-num), List(orange1-name, orange2-name), List(orange1-num, orange2-num), grape-name)
How is this done in a Scala fp way without resorting to clumsy for loops and counters?
If the indices are important, sometimes it's best to keep track of them explicitly using zipWithIndex (very similar to Python's enumerate):
food-list.zipWithIndex.groupBy(_._1).values.toList.flatMap{
//if only one entry in this group, don't change the values
//x is actually a tuple, could write case (str, idx) :: Nil => (str, idx) :: Nil
case x :: Nil => x :: Nil
//case where there are duplicate strings
case xs => xs.zipWithIndex.map {
//idx is index in the original list, n is index in the new list i.e. count
case ((str, idx), n) =>
//destructuring assignment, like python's (fruit, suffix) = ...
val Array(fruit, suffix) = str.split("-")
//string interpolation, returning a tuple
(s"$fruit${n+1}-$suffix", idx)
}
//We now have our list of (string, index) pairs;
//sort them and map to a list of just strings
}.sortBy(_._2).map(_._1)
Efficient and simple:
val food = List("banana-name", "orange-name", "orange-num",
"orange-name", "orange-num", "grape-name")
def replaceName(s: String, n: Int) = {
val tokens = s.split("-")
tokens(0) + n + "-" + tokens(1)
}
val indicesMap = scala.collection.mutable.HashMap.empty[String, Int]
val res = food.map { name =>
{
val n = indicesMap.getOrElse(name, 1)
indicesMap += (name -> (n + 1))
replaceName(name, n)
}
}
Here is an attempt to provide what you expected with foldLeft:
foodList.foldLeft((List[String](), Map[String, Int]()))//initial value
((a/*accumulator, list, map*/, v/*value from the list*/)=>
if (a._2.isDefinedAt(v))//already seen
(s"$v+${a._2(v)}" :: a._1, a._2.updated(v, a._2(v) + 1))
else
(v::a._1, a._2.updated(v, 1)))
._1/*select the list*/.reverse/*because we created in the opposite order*/

Scala split string to tuple

I would like to split a string on whitespace that has 4 elements:
1 1 4.57 0.83
and I am trying to convert into List[(String,String,Point)] such that first two splits are first two elements in the list and the last two is Point. I am doing the following but it doesn't seem to work:
Source.fromFile(filename).getLines.map(string => {
val split = string.split(" ")
(split(0), split(1), split(2))
}).map{t => List(t._1, t._2, t._3)}.toIterator
How about this:
scala> case class Point(x: Double, y: Double)
defined class Point
scala> s43.split("\\s+") match { case Array(i, j, x, y) => (i.toInt, j.toInt, Point(x.toDouble, y.toDouble)) }
res00: (Int, Int, Point) = (1,1,Point(4.57,0.83))
You could use pattern matching to extract what you need from the array:
case class Point(pts: Seq[Double])
val lines = List("1 1 4.34 2.34")
val coords = lines.collect(_.split("\\s+") match {
case Array(s1, s2, points # _*) => (s1, s2, Point(points.map(_.toDouble)))
})
You are not converting the third and fourth tokens into a Point, nor are you converting the lines into a List. Also, you are not rendering each element as a Tuple3, but as a List.
The following should be more in line with what you are looking for.
case class Point(x: Double, y: Double) // Simple point class
Source.fromFile(filename).getLines.map(line => {
val tokens = line.split("""\s+""") // Use a regex to avoid empty tokens
(tokens(0), tokens(1), Point(tokens(2).toDouble, tokens(3).toDouble))
}).toList // Convert from an Iterator to List
case class Point(pts: Seq[Double])
val lines = "1 1 4.34 2.34"
val splitLines = lines.split("\\s+") match {
case Array(s1, s2, points # _*) => (s1, s2, Point(points.map(_.toDouble)))
}
And for the curious, the # in pattern matching binds a variable to the pattern, so points # _* is binding the variable points to the pattern *_ And *_ matches the rest of the array, so points ends up being a Seq[String].
There are ways to convert a Tuple to List or Seq, One way is
scala> (1,2,3).productIterator.toList
res12: List[Any] = List(1, 2, 3)
But as you can see that the return type is Any and NOT an INTEGER
For converting into different types you use Hlist of
https://github.com/milessabin/shapeless

HowTo get a Map from a csv string

I'm fairly new to Scala, but I'm doing my exercises now.
I have a string like "A>Augsburg;B>Berlin". What I want at the end is a map
val mymap = Map("A"->"Augsburg", "B"->"Berlin")
What I did is:
val st = locations.split(";").map(dynamicListExtract _)
with the function
private def dynamicListExtract(input: String) = {
if (input contains ">") {
val split = input split ">"
Some(split(0), split(1)) // return key , value
} else {
None
}
}
Now I have an Array[Option[(String, String)
How do I elegantly convert this into a Map[String, String]
Can anybody help?
Thanks
Just change your map call to flatMap:
scala> sPairs.split(";").flatMap(dynamicListExtract _)
res1: Array[(java.lang.String, java.lang.String)] = Array((A,Augsburg), (B,Berlin))
scala> Map(sPairs.split(";").flatMap(dynamicListExtract _): _*)
res2: scala.collection.immutable.Map[java.lang.String,java.lang.String] = Map((A,Augsburg), (B,Berlin))
For comparison:
scala> Map("A" -> "Augsburg", "B" -> "Berlin")
res3: scala.collection.immutable.Map[java.lang.String,java.lang.String] = Map((A,Augsburg), (B,Berlin))
In 2.8, you can do this:
val locations = "A>Augsburg;B>Berlin"
val result = locations.split(";").map(_ split ">") collect { case Array(k, v) => (k, v) } toMap
collect is like map but also filters values that aren't defined in the partial function. toMap will create a Map from a Traversable as long as it's a Traversable[(K, V)].
It's also worth seeing Randall's solution in for-comprehension form, which might be clearer, or at least give you a better idea of what flatMap is doing.
Map.empty ++ (for(possiblePair<-sPairs.split(";"); pair<-dynamicListExtract(possiblePair)) yield pair)
A simple solution (not handling error cases):
val str = "A>Aus;B>Ber"
var map = Map[String,String]()
str.split(";").map(_.split(">")).foreach(a=>map += a(0) -> a(1))
but Ben Lings' is better.
val str= "A>Augsburg;B>Berlin"
Map(str.split(";").map(_ split ">").map(s => (s(0),s(1))):_*)
--or--
str.split(";").map(_ split ">").foldLeft(Map[String,String]())((m,s) => m + (s(0) -> s(1)))

Resources