scala fixed length string format not working for Chinese

scala fixed length string format not working for Chinese - string

I'm trying to pad strings to make them have same length using string format, which works well on English strings:
def main(args: Array[String]): Unit = {
val a = "this is first"
val b = "second"
val c = "third word"
println(f"$a%-30s" + "|")
println(f"$b%-30s" + "|")
println(f"$c%-30s" + "|")
}
// gives:
this is first |
second |
third word |
However, when I test it on Chinese strings, this method doesn't work anymore:
def main(args: Array[String]): Unit = {
val a = "小伙_6"
val b = "女网友_6"
val c = "有期徒刑_6"
println(f"$a%-30s" + "|")
println(f"$b%-30s" + "|")
println(f"$c%-30s" + "|")
// gives:
小伙_6 |
女网友_6 |
有期徒刑_6 |
How can I get this work for Chinese strings?

Related

wxPython ListCtrl string size

I have a wx.ListCtrl with 2 columns: the second is editable to receive a string. My string is 64Kb in size and it looks like the editable field can only accommodate 32738 characters. How to pass my 64000 bytes?
self.sndList[g] = EditableListCtrl(self.panel, size=(largListE, hautList), style=wx.LC_REPORT | wx.BORDER_SUNKEN, pos=(X,Y+25))
self.sndList[g].InsertColumn(0, '<<< ' + self.vmc.prod[g]['name'] + ' >>>', width=largeurN )
self.sndList[g].InsertColumn(1, u'<<< value >>>', wx.LIST_FORMAT_RIGHT, width=largeurV)
for var in self.vmc.snd_varlist_PDProd[g]:
index = self.sndList[g].InsertStringItem(10000, str(var))
val = str(self.vmc.snd_assem_PDProd[g][var])
if self.vmc.snd_assem_PDProd_type[g][var] == RPSDK_TYPE_REAL:
res=val.find('.')
if res > 0:
val = val[0:res+7]
self.sndList_PrevVal[g].append(val)
self.sndList_LongRefresh[g].append(0)
if val == '':
val = '<EMPTY>'
self.sndList[g].SetStringItem(index, 1, val)

how to extract an integer range from a string

I have a string that contains different ranges and I need to find their value
var str = "some text x = 1..14, y = 2..4 some text"
I used the substringBefore() and substringAfter() methodes to get the x and y but I can't find a way to get the values because the numbers could be one or two digits or even negative numbers.

One approach is to use a regex, e.g.:
val str = "some text x = 1..14, y = 2..4 some text"
val match = Regex("x = (-?\\d+[.][.]-?\\d+).* y = (-?\\d+[.][.]-?\\d+)")
.find(str)
if (match != null)
println("x=${match.groupValues[1]}, y=${match.groupValues[2]}")
// prints: x=1..14, y=2..4
\\d matches a single digit, so \\d+ matches one or more digits; -? matches an optional minus sign; [.] matches a dot; and (…) marks a group that you can then retrieve from the groupValues property. (groupValues[0] is the whole match, so the individual values start from index 1.)
You could easily add extra parens to pull out each number separately, instead of whole ranges.
(You may or may not find this as readable or maintainable as string-manipulation approaches…)

Is this solution fit for you?
val str = "some text x = 1..14, y = 2..4 some text"
val result = str.replace(",", "").split(" ")
var x = ""; var y = ""
for (i in 0..result.count()-1) {
if (result[i] == "x") {
x = result[i+2]
} else if (result[i] == "y") {
y = result[i+2]
}
}
println(x)
println(y)

Using KotlinSpirit library
val rangeParser = object : Grammar<IntRange>() {
private var first: Int = -1
private var last: Int = -1
override val result: IntRange
get() = first..last
override fun defineRule(): Rule<*> {
return int {
first = it
} + ".." + int {
last = it
}
}
}.toRule().compile()
val str = "some text x = 1..14, y = 2..4 some text"
val ranges = rangeParser.findAll(str)
https://github.com/tiksem/KotlinSpirit

Merging overlapping strings

Suppose I need to merge two overlapping strings like that:
def mergeOverlap(s1: String, s2: String): String = ???
mergeOverlap("", "") // ""
mergeOverlap("", "abc") // abc
mergeOverlap("xyz", "abc") // xyzabc
mergeOverlap("xab", "abc") // xabc
I can write this function using the answer to one of my previous questions:
def mergeOverlap(s1: String, s2: String): String = {
val n = s1.tails.find(tail => s2.startsWith(tail)).map(_.size).getOrElse(0)
s1 ++ s2.drop(n)
}
Could you suggest either a simpler or maybe more efficient implementation of mergeOverlap?

You can find the overlap between two strings in time proportional to the total length of the strings O(n + k) using the algorithm to calculate the prefix function. Prefix function of a string at index i is defined as the size of the longest suffix at index i that is equal to the prefix of the whole string (excluding the trivial case).
See those links for more explanation of the definition and the algorithm to compute it:
https://cp-algorithms.com/string/prefix-function.html
https://hyperskill.org/learn/step/6413#a-definition-of-the-prefix-function
Here is an implementation of a modified algorithm that calculates the longest prefix of the second argument, equal to the suffix of the first argument:
import scala.collection.mutable.ArrayBuffer
def overlap(hasSuffix: String, hasPrefix: String): Int = {
val overlaps = ArrayBuffer(0)
for (suffixIndex <- hasSuffix.indices) {
val currentCharacter = hasSuffix(suffixIndex)
val currentOverlap = Iterator.iterate(overlaps.last)(overlap => overlaps(overlap - 1))
.find(overlap =>
overlap == 0 ||
hasPrefix.lift(overlap).contains(currentCharacter))
.getOrElse(0)
val updatedOverlap = currentOverlap +
(if (hasPrefix.lift(currentOverlap).contains(currentCharacter)) 1 else 0)
overlaps += updatedOverlap
}
overlaps.last
}
And with that mergeOverlap is just
def mergeOverlap(s1: String, s2: String) =
s1 ++ s2.drop(overlap(s1, s2))
And some tests of this implementation:
scala> mergeOverlap("", "")
res0: String = ""
scala> mergeOverlap("abc", "")
res1: String = abc
scala> mergeOverlap("", "abc")
res2: String = abc
scala> mergeOverlap("xyz", "abc")
res3: String = xyzabc
scala> mergeOverlap("xab", "abc")
res4: String = xabc
scala> mergeOverlap("aabaaab", "aab")
res5: String = aabaaab
scala> mergeOverlap("aabaaab", "aabc")
res6: String = aabaaabc
scala> mergeOverlap("aabaaab", "bc")
res7: String = aabaaabc
scala> mergeOverlap("aabaaab", "bbc")
res8: String = aabaaabbc
scala> mergeOverlap("ababab", "ababc")
res9: String = abababc
scala> mergeOverlap("ababab", "babc")
res10: String = abababc
scala> mergeOverlap("abab", "aab")
res11: String = ababaab

It's not tail recursive but it is a very simple algorithm.
def mergeOverlap(s1: String, s2: String): String =
if (s2 startsWith s1) s2
else s1.head +: mergeOverlap(s1.tail, s2)

How reverse words in string and keep punctuation marks and upper case symbol

private def reverseHelper(word: String): String = {
var result = new StringBuilder(word)
if (word.head.isUpper) {
result.setCharAt(0, word.head.toLower)
result.setCharAt(word.length - 1, word.last.toUpper)
}
result.reverse.result()
}
val formatString = str
.split("[.,!?: ]+")
.map(result => str.replaceFirst(result, reverseHelper(result)))
.foreach(println)
Example:
Input: What is a sentence?
Ouput: Tahw si a ecnetnes?
but i have Array[String]: Tahw is a sentence?, What si a sentence?, What is a sentence?, What is a ecnetnes?
How i can write this in right format?

Restoring the original capitalization is a bit tricky.
def reverser(s:Seq[Char], idx:Int = 0) :String = {
val strt = s.indexWhere(_.isLetter, idx)
if (strt < 0) s.mkString
else {
val end = s.indexWhere(!_.isLetter, strt)
val len = end - strt
val rev = Range(0,len).map{ x =>
if (s(strt+x).isUpper) s(end-1-x).toUpper
else s(end-1-x).toLower
}
reverser(s.patch(strt,rev,len), end)
}
}
testing:
reverser( "What, is A sEntence?")
//res0: String = Tahw, si A eCnetnes?

You can first split your string at a list of special characters and then reverse each individual word and store it in a temporary string. After that traverse the original string and temporary string and replace word matching any special characters with current character in temporary string.
private def reverseHelper(word: String): String = {
var result = new StringBuilder(word)
if (word.head.isUpper) {
result.setCharAt(0, word.head.toLower)
result.setCharAt(word.length - 1, word.last.toUpper)
}
result.reverse.result()
}
val tempStr = str
.split("[.,!?: ]+")
.map(result => reverseHelper(result))
.mkString("")
val sList = "[.,!?: ]+".toList
var curr = 0
val formatString = str.map(c => {
if(!sList.contains(c)) {
curr = curr + 1
tempStr(curr-1)
}
else c
})

Here's one approach that uses a Regex pattern to generate a list of paired strings of Seq(word, nonWord), followed by reversal and positional uppercasing of the word strings:
def reverseWords(s: String): String = {
val pattern = """(\w+)(\W*)""".r
pattern.findAllMatchIn(s).flatMap(_.subgroups).grouped(2).
map{ case Seq(word, nonWord) =>
val caseList = word.map(_.isUpper)
val newWord = (word.reverse zip caseList).map{
case (c, true) => c.toUpper
case (c, false) => c.toLower
}.mkString
newWord + nonWord
}.
mkString
}
reverseWords("He likes McDonald's burgers. I prefer In-and-Out's.")
//res1: String = "Eh sekil DlAnodcm's sregrub. I referp Ni-dna-Tuo's."

A version using split on word boundaries:
def reverseWords(string: String): String = {
def revCap(s: String): String =
s.headOption match {
case Some(c) if c.isUpper =>
(c.toLower +: s.drop(1)).reverse.capitalize
case Some(c) if c.isLower =>
s.reverse
case _ => s
}
string
.split("\\b")
.map(revCap)
.mkString("")
}

Scala: Remove the last occurrence of a character

I am trying to remove the last occurrence of a character in a string. I can get its index:
str.lastIndexOf(',')
I have already tried to use split and the replace function on the string.

You could use patch.
scala> val s = "s;dfkj;w;erw"
s: String = s;dfkj;w;erw
scala> s.patch(s.lastIndexOf(';'), "", 1)
res6: String = s;dfkj;werw

Curious why Scala doesn't have a .replaceLast but there must be a reason...
Reverse the String and use str.replaceFirst then reverse again
I doubt this is terrible efficient but it is effective :)
scala> "abc.xyz.abc.xyz".reverse.replaceFirst("zyx.", "").reverse
res5: String = abc.xyz.abc
As a def it would look like this:
def replaceLast(input: String, matchOn: String, replaceWith: String) = {
input.reverse.replaceFirst(matchOn.reverse, replaceWith.reverse).reverse
}

scala> def removeLast(x: Char, xs: String): String = {
|
| val accumulator: (Option[Char], String) = (None, "")
|
| val (_, applied) = xs.foldRight(accumulator){(e: Char, acc: (Option[Char], String)) =>
| val (alreadyReplaced, runningAcc) = acc
| alreadyReplaced match {
| case some # Some(_) => (some, e + runningAcc)
| case None => if (e == x) (Some(e), runningAcc) else (None, e + runningAcc)
| }
| }
|
| applied
| }
removeLast: (x: Char, xs: String)String
scala> removeLast('f', "foobarf")
res7: String = foobar
scala> removeLast('f', "foobarfff")
res8: String = foobarff

You could try the following:
val input = "The quick brown fox jumps over the lazy dog"
val lastIndexOfU = input.lastIndexOf("u")
val splits = input.splitAt(lastIndexOfU)
val inputWithoutLastU = splits._1 + splits._2.drop(1) // "The quick brown fox jmps over the lazy dog"

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

scala fixed length string format not working for Chinese - string

Related

wxPython ListCtrl string size

how to extract an integer range from a string

Merging overlapping strings

How reverse words in string and keep punctuation marks and upper case symbol

Scala: Remove the last occurrence of a character

Categories

Resources