How to use match on String value with Scala? - string

I'm trying to iterate on a String value to change each occurence of it.
For example i want that "1" become "one", "2" become "two", etc.
I've done this :
override def toString = {
val mapXX = init.map(_.clone);
var returnVALUE = mapXX.map(_.mkString).mkString("\n")
for(c <- returnVALUE){
c match {
case 1 => "one";
case 2 => "two";
...
case _ => "";
}
}
returnVALUE
}
}
It didn't change anything of my list, i have the same display of my list. Nothing has changed.
Did someone knows how can we iterate on each character of a String value in order to replace each character by something else ?
Thanks

It's not completely clear what you're doing. Try
returnVALUE.map {
case '1' => "one"
case '2' => "two"
case '3' => "three"
// ...
case _ => " "
}.mkString
and this should be the last line of toString.
String#map accepts a function from Char to something (e.g. to String).
If returnVALUE is "1 2 3" then this produces "one two three".
When the last line is returnVALUE this means you return the original value of returnVALUE, not the modified value.

A for comprehension without the yield clause doesn't create any results. It can only be used for side effects, which good Scala programmers try to avoid.
Maybe something like this.
val numberNames = Map(0 -> "zero", 1 -> "one", 2 -> "two").withDefaultValue("too big")
val result = List(2,0,1,4).map(numberNames)
//result: List[String] = List(two, zero, one, too big)

Related

What is an idiomatic Scala way to join strings and remove a specific duplicate element keeping first

I have some strings that have a specific duplicate word between them. I neet to join all these strings keeping the first "duplicated" word and remove the others.
Ex:
val p1 = "Hello John Doe"
val p2 = "Hello Jane Doe"
val p3 = "Hello Mary"
For the output, I'd pass the string to be removed, in this case "Hello" and would like to get a string (or a seq that I can later transform with mkString(" ")) with the contents Hello John Doe Jane Doe Mary.
The tricky part is that .distinct can't be used because it would remove the second "Doe" and it's not desired.
Just scan the words, counting occurrences. Boring.
#tailrec
def keepN(words: List[String], toDedup: String, toKeep: Int = 1, acc: List[String] = Nil) = words match {
case Nil => acc.reverse
case `toDedup` :: tail if toKeep > 0 => keepN(tail, toDedup, toKeep - 1, toDedup::acc)
case `toDedup` :: tail => keepN(tail, toDedup, 0, acc)
case head :: tail => keepN(tail, toDedup, toKeep, head :: acc)
}
You mention a "duplicate word BETWEEN them" which implies that the duplicate is always the first word of each String. So why not just dropping the first few characters?
val pList = p1 +: List(p2, p3).map(_.drop(5).trim)
Or if you want to refer a specific word...
val myWord = "Hello"
val pList = p1 +: List(p2, p3).map(_.drop(myWord.length).trim)

How to get the index of a character in a String in Scala?

If I have a String an I am looping through that String looking at each character, how do I get the index of that character in that String?
I have seen people use "indexOf()" however when I see them use this it only returns the index of the first occurrence of that character. But what if there are multiple occurrences of that same character? How do I get the index of the character I am currently looking at?
I began using:
for(c <- str)
to look at each character individually.
It's not quite clear why you need to get the index of the character you are currently iterating over, since because you are iterating, you already know what the index is (all you have to do is to keep count). For example, something like this:
val str = "Hello World"
for ((c, i) ← str.zipWithIndex) println(s"$c is at $i")
// H is at 0
// e is at 1
// l is at 2
// l is at 3
// o is at 4
// is at 5
// W is at 6
// o is at 7
// r is at 8
// l is at 9
// d is at 10
You can use zipWithIndex() together with filter() to search for an index.
val str = "12334563"
str.toList.zipWithIndex.filter((x) => x._1 == '3')
res9: List[(Char, Int)] = List(('3', 2), ('3', 3), ('3', 7))
If required you can also remove the toList() call.
Well... there are methods which can get this done for you, but lets say there was no such method... even then you can do this by using entry level programming.
val string: String = "my awesome string"
val char: Char = 'e'
Now, the most basic solution,
var index = 0
val indexListBuffer: ListBuffer[Int] = ListBuffer()
for (c <- string) {
if (c == char) {
indexListBuffer.append(index)
}
index = index + 1
}
println(indexListBuffer)

Scala/Spark efficient partial string match

I am writing a small program in Spark using Scala, and came across a problem. I have a List/RDD of single word strings and a List/RDD of sentences which might or might not contain words from the list of single words. i.e.
val singles = Array("this", "is")
val sentence = Array("this Date", "is there something", "where are something", "this is a string")
and I want to select the sentences that contains one or more of the words from singles such that the result should be something like:
output[(this, Array(this Date, this is a String)),(is, Array(is there something, this is a string))]
I thought about two approaches, one by splitting the sentence and filtering using .contains. The other is to split and format sentence into a RDD and use the .join for RDD intersection. I am looking at around 50 single words and 5 million sentences, which method would be faster? Are there any other solutions? Could you also help me with the coding, I seem to get no results with my code (although it compiles and run without error)
You can create a set of required keys, look up the keys in sentences and group by keys.
val singles = Array("this", "is")
val sentences = Array("this Date",
"is there something",
"where are something",
"this is a string")
val rdd = sc.parallelize(sentences) // create RDD
val keys = singles.toSet // words required as keys.
val result = rdd.flatMap{ sen =>
val words = sen.split(" ").toSet;
val common = keys & words; // intersect
common.map(x => (x, sen)) // map as key -> sen
}
.groupByKey.mapValues(_.toArray) // group values for a key
.collect // get rdd contents as array
// result:
// Array((this, Array(this Date, this is a string)),
// (is, Array(is there something, this is a string)))
I've just tried to solve your problem and I've ended up with this code:
def check(s:String, l: Array[String]): Boolean = {
var temp:Int = 0
for (element <- l) {
if (element.equals(s)) {temp = temp +1}
}
var result = false
if (temp > 0) {result = true}
result
}
val singles = sc.parallelize(Array("this", "is"))
val sentence = sc.parallelize(Array("this Date", "is there something", "where are something", "this is a string"))
val result = singles.cartesian(sentence)
.filter(x => check(x._1,x._2.split(" ")) == true )
.groupByKey()
.map(x => (x._1,x._2.mkString(", ") )) // pay attention here(*)
result.foreach(println)
The last map line (*) is there just beacause without it I get something with CompactBuffer, like this:
(is,CompactBuffer(is there something, this is a string))
(this,CompactBuffer(this Date, this is a string))
With that map line (with a mkString command) I get a more readable output like this:
(is,is there something, this is a string)
(this,this Date, this is a string)
Hope it could help in some way.
FF

Find substring of string w/o knowing the length of string

I have a string x: x = "{abc}{def}{ghi}"
And I need to print the string between second { and second }, in this case def. How can I do this without knowing the length of the string? For example, the string x could also be {abcde}{fghij}{klmno}"
This is where pattern matching is useful:
local x = "{abc}{def}{ghi}"
local result = x:match(".-{.-}.-{(.-)}")
print(result)
.- matches zero or more characters, non-greedy. The whole pattern .-{.-}.-{(.-)} captures what's between the second { and the second }.
Try also x:match(".-}{(.-)}"), which is simpler.
I would go about it in a different manner:
local i, x, result = 1, "{abc}{def}{ghi}"
for w in x:gmatch '{(.-)}' do
if i == 2 then
result = w
break
else
i = i + 1
end
end
print( result )

Trimming strings in Scala

How do I trim the starting and ending character of a string in Scala
For inputs such as ",hello" or "hello,", I need the output as "hello".
Is there is any built-in method to do this in Scala?
Try
val str = " foo "
str.trim
and have a look at the documentation. If you need to get rid of the , character, too, you could try something like:
str.stripPrefix(",").stripSuffix(",").trim
Another way to clean up the front-end of the string would be
val ignoreable = ", \t\r\n"
str.dropWhile(c => ignorable.indexOf(c) >= 0)
which would also take care of strings like ",,, ,,hello"
And for good measure, here's a tiny function, which does it all in one sweep from left to right through the string:
def stripAll(s: String, bad: String): String = {
#scala.annotation.tailrec def start(n: Int): String =
if (n == s.length) ""
else if (bad.indexOf(s.charAt(n)) < 0) end(n, s.length)
else start(1 + n)
#scala.annotation.tailrec def end(a: Int, n: Int): String =
if (n <= a) s.substring(a, n)
else if (bad.indexOf(s.charAt(n - 1)) < 0) s.substring(a, n)
else end(a, n - 1)
start(0)
}
Use like
stripAll(stringToCleanUp, charactersToRemove)
e.g.,
stripAll(" , , , hello , ,,,, ", " ,") => "hello"
To trim the start and ending character in a string, use a mix of drop and dropRight:
scala> " hello,".drop(1).dropRight(1)
res4: String = hello
The drop call removes the first character, dropRight removes the last. Note that this isn't "smart" like trim is. If you don't have any extra character at the start of "hello,", you will trim it to "ello". If you need something more complicated, regex replacement is probably the answer.
If you want to trim only commas and might have more than one on either end, you could do this:
str.dropWhile(_ == ',').reverse.dropWhile(_ == ',').reverse
The use of reverse here is because there is no dropRightWhile.
If you're looking at a single possible comma, stripPrefix and stripSuffix are the way to go, as indicated by Dirk.
Given you only want to trim off invalid characters from the prefix and the suffix of a given string (not scan through the entire string), here's a tiny trimPrefixSuffixChars function to quickly perform the desired effect:
def trimPrefixSuffixChars(
string: String
, invalidCharsFunction: (Char) => Boolean = (c) => c == ' '
): String =
if (string.nonEmpty)
string
.dropWhile(char => invalidCharsFunction(char)) //trim prefix
.reverse
.dropWhile(char => invalidCharsFunction(char)) //trim suffix
.reverse
else
string
This function provides a default for the invalidCharsFunction defining only the space (" ") character as invalid. Here's what the conversion would look like for the following input strings:
trimPrefixSuffixChars(" Tx ") //returns "Tx"
trimPrefixSuffixChars(" . Tx . ") //returns ". Tx ."
trimPrefixSuffixChars(" T x ") //returns "T x"
trimPrefixSuffixChars(" . T x . ") //returns ". T x ."
If you have you would prefer to specify your own invalidCharsFunction function, then pass it in the call like so:
trimPrefixSuffixChars(",Tx. ", (c) => !c.isLetterOrDigit) //returns "Tx"
trimPrefixSuffixChars(" ! Tx # ", (c) => !c.isLetterOrDigit) //returns "Tx"
trimPrefixSuffixChars(",T x. ", (c) => !c.isLetterOrDigit) //returns "T x"
trimPrefixSuffixChars(" ! T x # ", (c) => !c.isLetterOrDigit) //returns "T x"
This attempts to simplify a number of the example solutions provided in other answers.
Someone requested a regex-version, which would be something like this:
val result = " , ,, hello, ,,".replaceAll("""[,\s]+(|.*[^,\s])[,\s]+""", "'$1'")
Result is: result: String = hello
The drawback with regexes (not just in this case, but always), is that it is quite hard to read for someone who is not already intimately familiar with the syntax. The code is nice and concise, though.
Another tailrec function:
def trim(s: String, char: Char): String = {
if (s.stripSuffix(char.toString).stripPrefix(char.toString) == s)
{
s
} else
{
trim(s.stripSuffix(char.toString).stripPrefix(char.toString), char)
}
}
scala> trim(",hello",',')
res12: String = hello
scala> trim(",hello,,,,",',')
res13: String = hello

Resources