How to split strings into characters in Scala

How to split strings into characters in Scala - string

For example, there is a string val s = "Test". How do you separate it into t, e, s, t?

Do you need characters?
"Test".toList // Makes a list of characters
"Test".toArray // Makes an array of characters
Do you need bytes?
"Test".getBytes // Java provides this
Do you need strings?
"Test".map(_.toString) // Vector of strings
"Test".sliding(1).toList // List of strings
"Test".sliding(1).toArray // Array of strings
Do you need UTF-32 code points? Okay, that's a tougher one.
def UTF32point(s: String, idx: Int = 0, found: List[Int] = Nil): List[Int] = {
if (idx >= s.length) found.reverse
else {
val point = s.codePointAt(idx)
UTF32point(s, idx + java.lang.Character.charCount(point), point :: found)
}
}
UTF32point("Test")

You can use toList as follows:
scala> s.toList
res1: List[Char] = List(T, e, s, t)
If you want an array, you can use toArray
scala> s.toArray
res2: Array[Char] = Array(T, e, s, t)

Actually you don't need to do anything special. There is already implicit conversion in Predef to WrappedString and WrappedString extends IndexedSeq[Char] so you have all goodies that available in it, like:
"Test" foreach println
"Test" map (_ + "!")
Edit
Predef has augmentString conversion that has higher priority than wrapString in LowPriorityImplicits. So String end up being StringLike[String], that is also Seq of chars.

Additionally, it should be noted that if what you actually want isn't an actual list object, but simply to do something which each character, then Strings can be used as iterable collections of characters in Scala
for(ch<-"Test") println("_" + ch + "_") //prints each letter on a different line, surrounded by underscores

Related

Kotlin - Way to get a substring starting from a specified index until another specified index or end of string?

Example
val string = "Large mountain"
I would like to get a substring starting from the index of the "t" character until index of "t" + 7 with the 7 being arbitrary or end of string.
val substring = "tain"
Assuming that the string is larger
val string2 = "Large mountain and lake"
I would like to return
val substring2 = "tain and l"
If my I were to try to substring(indexOf("t") ,(indexOf("t") + 7) )
In this second case right now if I use "Large mountain" I would get an index out of bounds exception.

I don't think there's an especially elegant way to do this.
One fairly short and readable way is:
val substring = string.drop(string.indexOf('t')).take(7)
This uses indexOf() to locate the first 't' in the string, and then drop() to drop all the previous characters, and take() to take (up to) 7 characters from there.
However, it creates a couple of temporary strings, and will give an IllegalArgumentException if there's no 't' in the string.
Improving robustness and efficiency takes more code, e.g.:
val substring = string.indexOf('t').let {
if (it >= 0)
string.substring(it, min(it + 7, string.length))
else
string
}
That version lets you control the result when there's no 't' (in the else branch); it also avoids creating any temporary objects. As before, it uses indexOf() to locate the first 't', but then min() to work out how long the substring can be, and substring() to generate it in one go.
If you were doing this a lot, you could of course put it into your own function, e.g.:
fun String.substringFrom(char: Char, maxLen: Int)
= indexOf(char).let {
if (it >= 0)
substring(it, min(it + maxLen, length))
else
this
}
which you could then call with e.g. "Large mountain".substringFrom('t', 7)

convert string to list of int in kotlin

I have a string = "1337" and I want to convert it to a list of Int, I tried to get every element in the string and convert it to Int like this string[0].toInt but I didn't get the number I get the Ascii value, I can do it with this Character.getNumericValue(number), How I do it without using a built it function? with good complexity?

What do you mean "without using a built in function"?
string[0].toInt gives you the ASCII value of the character because the fun get(index: Int) on String has a return type of Char, and a Char behaves closer to a Number than a String. "0".toInt() == 0 will yield true, but '0'.toInt() == 0 will yield false. The difference being the first one is a string and the second is a character.
A oneliner
string.split("").filterNot { it.isBlank() }.map { it.toInt() }
Explanation: split("") will take the string and give you a list of every character as a string, however, it will give you an empty string at the beginning, which is why we have filterNot { it.isBlank() }, we then can use map to transform every string in our list to Int
If you want something less functional and more imperative that doesn't make use of functions to convert there is this
val ints = mutableListOf<Int>() //make a list to store the values in
for (c: Char in "1234") { //go through all of the characters in the string
val numericValue = c - '0' //subtract the character '0' from the character we are looking at
ints.add(numericValue) //add the Int to the list
}
The reason why c - '0' works is because the ASCII values for the digits are all in numerical order starting with 0, and when we subtract one character from another, we get the difference between their ASCII values.
This will give you some funky results if you give it a string that doesn't have only digits in it, but it will not throw any exceptions.

As in Java and by converting Char to Int you get the ascii equivalence.
You can instead:
val values = "1337".map { it.toString().toInt() }
println(values[0]) // 1
println(values[1]) // 3
// ...

Maybe like this? No-digits are filtered out. The digits are then converted into integers:
val string = "1337"
val xs = string.filter{ it.isDigit() }.map{ it.digitToInt() }
Requires Kotlin 1.4.30 or higher and this option:
#OptIn(ExperimentalStdlibApi::class)

Removing accents and diacritics in kotlin

Is there any way to convert string like 'Dziękuję' to 'Dziekuje' or 'šećer' to 'secer' in kotlin. I have tried using java.text.Normalizer but it doesn't seem to work the desired way.

Normalizer only does half the work. Here's how you could use it:
private val REGEX_UNACCENT = "\\p{InCombiningDiacriticalMarks}+".toRegex()
fun CharSequence.unaccent(): String {
val temp = Normalizer.normalize(this, Normalizer.Form.NFD)
return REGEX_UNACCENT.replace(temp, "")
}
assert("áéíóů".unaccent() == "aeiou")
And here's how it works:
We are calling the normalize(). If we pass à, the method returns a + ` . Then using a regular expression, we clean up the string to keep only valid US-ASCII characters.
Source: http://www.rgagnon.com/javadetails/java-0456.html
Note that Normalizer is a Java class; this is not pure Kotlin and it will only work on JVM.

TL;DR:
Use Normalizer to canonically decomposed the Unicode thext.
Remove non-spacing combining characters (\p{Mn}).
fun String.removeNonSpacingMarks() =
Normalizer.normalize(this, Normalizer.Form.NFD)
.replace("\\p{Mn}+".toRegex(), "")
Long answer:
Using Normalizer you can transform the original text into an equivalent composed or decomposed form.
NFD: Canonical decomposition.
NFC: Canonical decomposition, followed by canonical composition.
.
(more info about normalization can be found in the Unicode® Standard Annex #15)
In our case, we are interested in NFD normalization form because it allows us to separate all the combined characters from the base character.
After decomposing the text, we have to run a regex to remove all the new characters resulting from the decomposition that correspond to combined characters.
Combined characters are special characters intended to be positioned relative to an associated base character. The Unicode Standard distinguishes two types of combining characters: spacing and nonspacing.
We are only interested in non-spacing combining characters. Diacritics are the principal class (but not the only one) of this group used with Latin, Greek, and Cyrillic scripts and their relatives.
To remove non-spacing characters with a regex we have to use \p{Mn}. This group includes all the 1,826 non-spacing characters.
Other answers uses \p{InCombiningDiacriticalMarks}, this block only includes combining diacritical marks. It is a subset of \p{Mn} that includes only 112 characters.

This is an extension function you can use and extend further:
fun String.normalize(): String {
val original = arrayOf("ę", "š")
val normalized = arrayOf("e", "s")
return this.map { it ->
val index = original.indexOf(it.toString())
if (index >= 0) normalized[index] else it
}.joinToString("")
}
Use it like this:
val originalText = "aębšc"
val normalizedText = originalText.normalize()
println(normalizedText)
will print
aebsc
Extend the arrays original and normalized with as many elements as you need.

In case anyone is strugling to do this in kotlin, this code works like a charm.
To avoid inconsistencies I also use .toUpperCase and Trim(). then i cast this function:
fun stripAccents(s: String):String{
if (s == null) {
return "";
}
val chars: CharArray = s.toCharArray()
var sb = StringBuilder(s)
var cont: Int = 0
while (chars.size > cont) {
var c: kotlin.Char
c = chars[cont]
var c2:String = c.toString()
//these are my needs, in case you need to convert other accents just Add new entries aqui
c2 = c2.replace("Ã", "A")
c2 = c2.replace("Õ", "O")
c2 = c2.replace("Ç", "C")
c2 = c2.replace("Á", "A")
c2 = c2.replace("Ó", "O")
c2 = c2.replace("Ê", "E")
c2 = c2.replace("É", "E")
c2 = c2.replace("Ú", "U")
c = c2.single()
sb.setCharAt(cont, c)
cont++
}
return sb.toString()
}
to use these fun cast the code like this:
var str: String
str = editText.text.toString() //get the text from EditText
str = str.toUpperCase().trim()
str = stripAccents(str) //call the function

Scala FlatMap returning a vector instead of a String

I am following martin odesky course. And there is example where he applies flatMap to String and gets a string in return but I am getting a Vector. Here is the code that I am using
val str = "Hello"
println(str flatMap (x => List("." , x)))
output: Vector(., H, ., e, ., l, ., l, ., o)
outputExpected: .H.e.l.l.o.w

"." is a String while '.' is a Char.
List('.', x) is a List[Char] (if x is a Char) which can be flattened to a String.
List(".", x) is a List[Any] (if x is not a String) which cannot be flattened to a String.
UPDATE -- This behavior has changed as of Scala 2.13.0.
"abc".flatMap(c => List('.', c))
//Scala 2.12.x returns String
//Scala 2.13.x returns IndexedSeq[Char] (REPL interprets as Vector)
This might be to insure a more consistent translation:
"abc".map(c => List('.', c)).flatten
//always returns IndexedSeq[Char]

A string is a collection of characters, not a collection of strings. So when you use flatMap to create a collection of characters, it'll choose String as the type of collection, but when you create a collection of strings, it can't use String, so it has to use Vector instead.

Is it possible to compare two characters in Processing?

I am a novice programmer and I am trying to compare two characters from different strings, such that I can give an arbitrary index from each string and check to see if they match. From the processing website it seems that you can compare two strings, which I have done, but when I try to do so with characters it seems that the arguments (char,char) are not applicable. Can someone tell me where I am going wrong? Thanks.

You can use String's charAt() method/function to get character from each string at the desired index, then simply compare:
String s1 = ":)";
String s2 = ";)";
void setup(){
println(CompareCharAt(s1,s2,0));
println(CompareCharAt(s1,s2,1));
}
boolean CompareCharAt(String str1,String str2,int index){
return s1.charAt(index) == s2.charAt(index);
}
Note that when you're comparing strings == doesn't help, you need to use String's equal()
String s1 = ":)";
String s2 = ";)";
println(s1.equals(s2));
println(s1.equals(":)"));
Also, if data comes from external sources, it's usually a good idea to compare both strings at using the same case:
println("MyString".equals("myString"));
println("MyString".toLowerCase().equals("myString".toLowerCase()));

maybe you can pass the argument after converting(typecasting) the char to string.
(string(char),string(char))

Yep. Just use == as it gets interpreted as a char datatype.
This is assuming you've split the char from the String...
char a = 'a';
char b = 'a';
if(a == b) {
// etc
}
As mentioned above, use .equals() for String comparison.
String a = "a";
String b = "a";
if(a.equals(b)) {
// etc
}
Also, the proper way to cast a char as a String is str() not string()

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

How to split strings into characters in Scala - string

For example, there is a string val s = "Test". How do you separate it into t, e, s, t?

You can use toList as follows: scala> s.toList res1: List[Char] = List(T, e, s, t) If you want an array, you can use toArray scala> s.toArray res2: Array[Char] = Array(T, e, s, t)

Related

Kotlin - Way to get a substring starting from a specified index until another specified index or end of string?

convert string to list of int in kotlin

Removing accents and diacritics in kotlin

Scala FlatMap returning a vector instead of a String

Is it possible to compare two characters in Processing?

Categories

Resources