Understanding String in Scala and the map method - string

I wrote the following simple example to understand how the map method works:
object Main{
def main (args : Array[String]) = {
val test = "abc"
val t = Vector(97, 98, 99)
println(test.map(c => (c + 1))) //1 Vector(98, 99, 100)
println(test.map(c => (c + 1).toChar)) //2 bcd
println(t.map(i => (i + 1))) //3 Vector(98, 99, 100)
println(t.map(i => (i + 1).toChar)) //4 Vector(b, c, d)
};
}
I didn't quite understand why bcd is printed at //2. Since every String is treated by Scala as being a Seq I thought that test.map(c => (c + 1).toChar) should have produced another Seq. As //1 suggests Vector(b, c, d). But as you can see, it didn't. Why? How does it actually work?

This is a feature of Scala collections (String in this case is treated as a collection of characters). The real explanation is quite complex, and involves understanding of typeclasses (I guess, this is why Haskell was mentioned in the comment), but the simple explanation is, well, not quite hard.
The point is, Scala collections library authors tried very hard to avoid code duplication. For example, the map function on String is actually defined here: scala.collection.TraversableLike#map. On the other hand, a naive approach to such task would make map return TraversableLike, not the original type the map was called on (it was the String). That's why they've came up with an approach that allows to avoid both code duplication and unnecessary type casting or too general return type.
Basically, Scala collections methods like map produce the type that is as close to the type it was called at as possible. This is achieved using a typeclass called CanBuildFrom. The full signature of the map looks as follows:
def map[B, That](f: A => B)(implicit bf: CanBuildFrom[Repr, B, That]): That
There is a lot of explanations what is a typeclass and CanBuildFrom around. I'd suggest looking here first: http://docs.scala-lang.org/overviews/core/architecture-of-scala-collections.html#factoring-out-common-operations. Another good explanation is here: Scala 2.8 CanBuildFrom

When you use map, this is what is happening : [List|Seq|etc].map([eachElement] => [do something])
map applies some operation to each element of the variable on the left hand-side : "abc".map(letter => letter + 1) will add 1 to each element of the String "abc". And each element of the String abc is called here "letter" (which is of type Char)
"abc" is a String, and as in C++, it is treated as an array of characters. But since test is of type String, the map function gives a String as well.
I tried the following :
val test2 : Seq[Char] = "abc"
but I still get a result of type String, I guess Scala does the conversion automatically from a Seq[Char] to a String
I hope it helped!

Related

takeRightWhile() method in scala

I might be missing something but recently I came across a task to get last symbols according to some condition. For example I have a string: "this_is_separated_values_5". Now I want to extract 5 as Int.
Note: number of parts separated by _ is not defined.
If I would have a method takeRightWhile(f: Char => Boolean) on a string it would be trivial: takeRightWhile(ch => ch != '_'). Moreover it would be efficient: a straightforward implementation would actually involve finding the last index of _ and taking a substring while the use of this method would save first step and provide better average time complexity.
UPDATE: Guys, all the variations of str.reverse.takeWhile(_!='_').reverse are quite inefficient as you actually use additional O(n) space. If you want to implement method takeRightWhile efficiently you could iterate starting from the right, accumulating result in string builder of whatever else, and returning the result. I am asking about this kind of method, not implementation which was already described and declined in the question itself.
Question: Does this kind of method exist in scala standard library? If no, is there method combination from the standard library to achieve the same in minimum amount of lines?
Thanks in advance.
Possible solution:
str.reverse.takeWhile(_!='_').reverse
Update
You can go from right to left with following expression using foldRight:
str.toList.foldRight(List.empty[Char]) {
case (item, acc) => item::acc
}
Here you need to check condition and stop adding items after condition met. For this you can pass a flag to accumulated value:
val (_, list) = str.toList.foldRight((false, List.empty[Char])) {
case (item, (false, list)) if item!='_' => (false, item::list)
case (_, (_, list)) => (true, list)
}
val res = list.mkString.toInt
This solution is even more inefficient then solution with double reverse:
Implementation of foldRight uses combination of List reverse and foldLeft
You cannot break foldRight execution, so you need flag to skip all items after condition met
I'd go with this:
val s = "string_with_following_number_42"
s.split("_").reverse.head
// res:String = 42
This is a naive attempt and by no means optimized. What it does is splitting the String into an Array of Strings, reverses it and takes the first element. Note that, because the reversing happens after the splitting, the order of the characters is correct.
I am not exactly sure about the problem you are facing. My understanding is that you want have a string of format xxx_xxx_xx_...._xxx_123 and you want to extract the part at the end as Int.
import scala.util.Try
val yourStr = "xxx_xxx_xxx_xx...x_xxxxx_123"
val yourInt = yourStr.split('_').last.toInt
// But remember that the above is unsafe so you may want to take it as Option
val yourIntOpt = Try(yourStr.split('_').last.toInt).toOption
Or... lets say your requirement is to collect a right-suffix till some boolean condition remains true.
import scala.util.Try
val yourStr = "xxx_xxx_xxx_xx...x_xxxxx_123"
val rightSuffix = yourStr.reverse.takeWhile(c => c != '_').reverse
val yourInt = rightSuffix.toInt
// but above is unsafe so
val yourIntOpt = Try(righSuffix.toInt).toOption
Comment if your requirement is different from this.
You can use StringBuilder and lastIndexWhere.
val str = "this_is_separated_values_5"
val sb = new StringBuilder(str)
val lastIdx = sb.lastIndexWhere(ch => ch != '_')
val lastCh = str.charAt(lastIdx)

Groovy inject without a similar Java 8 reduce

Normally I think of Groovy's inject method as equivalent to Java 8's reduce, but I seem to have hit an unusual situation.
Say I have a POJO (or POGO) called Book
class Book {
int id
String name
}
If I have a collection of books and want to convert them to a map where the keys are the ids and the values are the books, then in Groovy it's easy enough to write:
Map bookMap = books.inject([:]) { map, b ->
map[b.id] = b
map
}
i.e., for each book, add it to the map under the book's id and return the map.
In Java 8, the same operation would take a completely different approach. Either this:
Map<Integer, Book> bookMap = books.stream()
.collect(Collectors.toMap(Book::getId, b -> b));
or, equivalently,
bookMap = books.stream()
.collect(Collectors.toMap(Book::getId, Function.identity()));
the difference being a matter of style.
What I'm wondering, however, is if there is a reduce operation in Java 8 that would be similar to the inject from Groovy. I can't just mimic what I did in Groovy, because in Java 8 the signature for reduce is:
T reduce(T identity, BinaryOperator<T> accumulator)
The BinaryOperator means that both elements of the lambda expression must be of the same type. If it was a BiFunction, I could make the lambda's first argument a HashMap<Integer, Book> and the second argument a Book, but I can't do that with a BinaryOperator. I know there's a three-argument version of reduce, but that doesn't seem to help either.
Am I missing something obvious? Is it just that inject is more general that reduce? Since I already have an idiomatic way of solving the problem in Java, this isn't critical, but I was struck by the differences here.
Yo Ken! :-D
You need the 3 parameter form of reduce, so given:
List<Book> books = Arrays.asList(
new Book(1, "Book One"),
new Book(2, "Tim's memoirs"),
new Book(3, "Harry Potter and the sarcastic cat")
);
You can do:
Map<Integer, Book> reduce = books.stream().reduce(
new HashMap<Integer, Book>(),
(map, value) -> {
map.put(value.id, value);
return map;
},
(a, b) -> {
a.putAll(b);
return a;
}
);
To give:
{
1=Book{id=1, name='Book One'},
2=Book{id=2, name='Tim's memoirs'},
3=Book{id=3, name='Harry Potter and the sarcastic cat'}
}
The first parameter is the thing to collect into:
new HashMap<Integer, Book>(),
The second parameter is a BiFunction that takes the current accumulator, and the next element in the stream, and combines them somehow:
(map, value) -> {
map.put(value.id, value);
return map;
},
The third binary operator in that reduce call:
(a, b) -> {
a.putAll(b);
return a;
}
Is how to join all the resulting maps back together assuming you are running a parallel stream...
put and putAll returning void make it a fugly mess :-( But I guess chaining wasn't a popular thing back in the late 90s...

How to find maximum overlap between two strings in Scala?

Suppose I have two strings: s and t. I need to write a function f to find a max. t prefix, which is also an s suffix. For example:
s = "abcxyz", t = "xyz123", f(s, t) = "xyz"
s = "abcxxx", t = "xx1234", f(s, t) = "xx"
How would you write it in Scala ?
This first solution is easily the most concise, also it's more efficient than a recursive version as it's using a lazily evaluated iteration
s.tails.find(t.startsWith).get
Now there has been some discussion regarding whether tails would end up copying the whole string over and over. In which case you could use toList on s then mkString the result.
s.toList.tails.find(t.startsWith(_: List[Char])).get.mkString
For some reason the type annotation is required to get it to compile. I've not actually trying seeing which one is faster.
UPDATE - OPTIMIZATION
As som-snytt pointed out, t cannot start with any string that is longer than it, and therefore we could make the following optimization:
s.drop(s.length - t.length).tails.find(t.startsWith).get
Efficient, this is not, but it is a neat (IMO) one-liner.
val s = "abcxyz"
val t ="xyz123"
(s.tails.toSet intersect t.inits.toSet).maxBy(_.size)
//res8: String = xyz
(take all the suffixes of s that are also prefixes of t, and pick the longest)
If we only need to find the common overlapping part, then we can recursively take tail of the first string (which should overlap with the beginning of the second string) until the remaining part will not be the one that second string begins with. This also covers the case when the strings have no overlap, because then the empty string will be returned.
scala> def findOverlap(s:String, t:String):String = {
if (s == t.take(s.size)) s else findOverlap (s.tail, t)
}
findOverlap: (s: String, t: String)String
scala> findOverlap("abcxyz", "xyz123")
res3: String = xyz
scala> findOverlap("one","two")
res1: String = ""
UPDATE: It was pointed out that tail might not be implemented in the most efficient way (i.e. it creates a new string when it is called). If that becomes an issue, then using substring(1) instead of tail (or converting both Strings to Lists, where it's tail / head should have O(1) complexity) might give a better performance. And by the same token, we can replace t.take(s.size) with t.substring(0,s.size).

String method to change particular element in Scala

I need to write a method in Scala that overrides the toString method. I wrote it but I also have to check that if there is an element that is '1' I will change it to 'a', else write the list as it is with the string method. Any suggestions how this can be done?
What error are you getting? seems to work for me
val l = List(1, 2, 3)
println(this)
override def toString(): String = {
val t = l.map({
case 1 => "a"
case x => x
})
t.toString
}
getting List(a, 2, 3) printed out
I see from the comments on your question that list is a List[List[Int]].
Look at the beginning of your code:
list.map { case 1 => 'a'; case x => x}
map expects a function that takes an element of list as a parameter - a List[Int], in your case. But your code works directly on Int.
With this information, it appears that the error you get is entirely correct: you declared a method that expects an Int, but you pass a List[Int] to it, which is indeed a type mismatch.
Try this:
list.map {_.map { case 1 => 'a'; case x => x}}
This way, the function you defined to transform 1 to a and leave everything else alone is applied to list's sublists, and this type-checks: you're applying a function that expects an Int to an Int.

Avoiding implicit def ambiguity in Scala

I am trying to create an implicit conversion from any type (say, Int) to a String...
An implicit conversion to String means RichString methods (like reverse) are not available.
implicit def intToString(i: Int) = String.valueOf(i)
100.toCharArray // => Array[Char] = Array(1, 0, 0)
100.reverse // => error: value reverse is not a member of Int
100.length // => 3
An implicit conversion to RichString means String methods (like toCharArray) are not available
implicit def intToRichString(i: Int) = new RichString(String.valueOf(i))
100.reverse // => "001"
100.toCharArray // => error: value toCharArray is not a member of Int
100.length // => 3
Using both implicit conversions means duplicated methods (like length) are ambiguous.
implicit def intToString(i: Int) = String.valueOf(i)
implicit def intToRichString(i: Int) = new RichString(String.valueOf(i))
100.toCharArray // => Array[Char] = Array(1, 0, 0)
100.reverse // => "001"
100.length // => both method intToString in object $iw of type
// (Int)java.lang.String and method intToRichString in object
// $iw of type (Int)scala.runtime.RichString are possible
// conversion functions from Int to ?{val length: ?}
So, is it possible to implicitly convert to String and still support all String and RichString methods?
I don't have a solution, but will comment that the reason RichString methods are not available after your intToString implicit is that Scala does not chain implicit calls (see 21.2 "Rules for implicits" in Programming in Scala).
If you introduce an intermediate String, Scala will make the implict converstion to a RichString (that implicit is defined in Predef.scala).
E.g.,
$ scala
Welcome to Scala version 2.7.5.final [...].
Type in expressions to have them evaluated.
Type :help for more information.
scala> implicit def intToString(i: Int) = String.valueOf(i)
intToString: (Int)java.lang.String
scala> val i = 100
i: Int = 100
scala> val s: String = i
s: String = 100
scala> s.reverse
res1: scala.runtime.RichString = 001
As of Scala 2.8, this has been improved. As per this paper (§ Avoiding Ambiguities) :
Previously, the most specific overloaded method or implicit conversion
would be chosen based solely on the method’s argument types. There was an
additional clause which said that the most specific method could not be
defined in a proper superclass of any of the other alternatives. This
scheme has been replaced in Scala 2.8 by the following, more liberal one:
When comparing two different applicable alternatives of an overloaded
method or of an implicit, each method gets one point for having more
specific arguments, and another point for being defined in a proper
subclass. An alternative “wins” over another if it gets a greater number
of points in these two comparisons. This means in particular that if
alternatives have identical argument types, the one which is defined in a
subclass wins.
See that other paper (§6.5) for an example.
Either make a huge proxy class, or suck it up and require the client to disambiguate it:
100.asInstanceOf[String].length
The only option I see is to create a new String Wrapper class MyString and let that call whatever method you want to be called in the ambiguous case. Then you could define implicit conversions to MyString and two implicit conversions from MyString to String and RichString, just in case you need to pass it to a library function.
The accepted solution (posted by Mitch Blevins) will never work: downcasting Int to String using asInstanceOf will always fail.
One solution to your problem is to add a conversion from any String-convertible type to RichString (or rather, to StringOps as it is now named):
implicit def stringLikeToRichString[T](x: T)(implicit conv: T => String) = new collection.immutable.StringOps(conv(x))
Then define your conversion(s) to string as before:
scala> implicit def intToString(i: Int) = String.valueOf(i)
warning: there was one feature warning; re-run with -feature for details
intToString: (i: Int)String
scala> 100.toCharArray
res0: Array[Char] = Array(1, 0, 0)
scala> 100.reverse
res1: String = 001
scala> 100.length
res2: Int = 3
I'm confused: can't you use .toString on any type anyway thus avoiding the need for implicit conversions?

Resources