Speeding up function that compares strings

Speeding up function that compares strings - string

I have a function that takes two strings, s and obj. It checks if string obj can be formed from string s by removing 1 char. Implemention works okay but if becomes awfully slow when strings are larger. I was trying to figure out a way to make this piece of code work much faster. Could anyone figure out an implemention?
def check_extra_char(s: String, obj: String): Boolean = {
if(s.length != obj.length+1) return false // Automatically false if obj is not one char smaller than s
for (i <- 0 until s.length)
if (s.take(i) + s.substring(1+i) == obj) return true
return false
}

Keep comparing characters of two strings one by one when mismatch happens skip the mismatch char and keep count of mismatches. If more than 1 mismatch happens check returns false. In the worst case time complexity of the check is O(n).
def check(a: String, b: String): Boolean = {
val smallerStr = if (a.length < b.length) a else b
val largerStr = if (a.length > b.length) a else b
if (largerStr.length - smallerStr.length > 1) false
else {
def countMismatches(aIndex: Int, bIndex: Int, mismatchCount: Int): Int = {
if (bIndex < largerStr.length && aIndex < smallerStr.length) {
if (smallerStr(aIndex) != largerStr(bIndex)) {
if (mismatchCount > 1) mismatchCount
else countMismatches(aIndex, bIndex + 1, mismatchCount + 1)
}
else countMismatches(aIndex + 1, bIndex + 1, mismatchCount)
} else mismatchCount
}
countMismatches(0, 0, 0) <= 1
}
}
REPL
res12: Boolean = true
# check("zapple", "apple")
res13: Boolean = true
# check("apple", "apzple")
res14: Boolean = true
# check("apple", "apzzple")
res15: Boolean = false
# check("apple", "applez")
res16: Boolean = true
# check("apple", "applzz")
res17: Boolean = false

You can speed it up by removing extra s.take(i) + s.substring(i+1). You can go through s and compare index to its counterpart in obj. When you notice the difference, you use your s.take(i) + s.substring(i+1).
def check_extra_char(s: String, obj: String): Boolean = {
if(s.length != obj.length+1) return false // Automatically false if obj is not one char smaller than s
if(s.dropRight(1) == obj) return true // so we don't go outOfIndex later
for (i <- 0 until s.length)
if (s(i) != obj(i)){
if (s.take(i) + s.substring(1+i) == obj) return true else return false
return false
}

The problem with your code is s.take(i) + s.substring(1+i) == obj part. Both String.take(i: Int) and String.substring(start: Int, end: Int) have O(n) time-complexity.
There are numerous ways to avoid that and I am providing one of them in idiomatic scala with tail-recursion,
import scala.annotation.tailrec
def checkExtraChar(source: String, target: String): Boolean = {
val sourceLength = source.length
val targetLength = target.length
// Assumption :: source.length == target.length + 1
#tailrec
def _check(srcIndex: Int, tgtIndex: Int, mismatchFound: Boolean): Boolean = srcIndex match {
case index if index == (sourceLength - 1) && !mismatchFound => true
case index if index == (sourceLength - 1) => source(srcIndex) == target(tgtIndex)
case _ => (source(srcIndex) == target(tgtIndex), mismatchFound) match {
case (true, _) => _check(srcIndex + 1, tgtIndex + 1, mismatchFound)
case (false, false) => _check(srcIndex + 1, tgtIndex, true)
case (false, true) => false
}
}
(sourceLength == targetLength + 1) match {
case false => false
case true => _check(0, 0, false)
}
}
checkExtraChar("qwerty", "werty") // true
checkExtraChar("wqerty", "werty") // true
checkExtraChar("qqwerty", "werty") // false

Got something that should yield some improvements and is more idiomatic.
def check(s: String, obj: String): Boolean = {
if(s.length == obj.length + 1)
streamFromString(s, 0).exists { case(substring) => substring == obj }
else false
}
def streamFromString(s: String, withoutIndex: Int): Stream[String] = {
lazy val next: Stream[String] = if(withoutIndex > s.length - 1) Stream.empty else streamFromString(s, withoutIndex + 1)
s.patch(withoutIndex, Nil, 1) #:: next
}
First of all, you could use patch to create subcollections without the given element; it's defined on Seq so feel free to read up on that.
Secondly, instead of looping mindlessly and reinventing the wheel I decided to create a Stream[String] of the created patches but since Stream is a lazy collection we will create subsequent patches only as we progress through the stream.
You could replace the exists call with collectFirst that takes a PartialFunction for more control over the body but it would be an overkill in this case, imo.
streamFromString(s, 0)
.collectFirst { case (substring) if substring == obj => "" } //could be anything here really, it's all about getting Option.Some
.fold(false)(_ => true)

Related

How reverse words in string and keep punctuation marks and upper case symbol

private def reverseHelper(word: String): String = {
var result = new StringBuilder(word)
if (word.head.isUpper) {
result.setCharAt(0, word.head.toLower)
result.setCharAt(word.length - 1, word.last.toUpper)
}
result.reverse.result()
}
val formatString = str
.split("[.,!?: ]+")
.map(result => str.replaceFirst(result, reverseHelper(result)))
.foreach(println)
Example:
Input: What is a sentence?
Ouput: Tahw si a ecnetnes?
but i have Array[String]: Tahw is a sentence?, What si a sentence?, What is a sentence?, What is a ecnetnes?
How i can write this in right format?

Restoring the original capitalization is a bit tricky.
def reverser(s:Seq[Char], idx:Int = 0) :String = {
val strt = s.indexWhere(_.isLetter, idx)
if (strt < 0) s.mkString
else {
val end = s.indexWhere(!_.isLetter, strt)
val len = end - strt
val rev = Range(0,len).map{ x =>
if (s(strt+x).isUpper) s(end-1-x).toUpper
else s(end-1-x).toLower
}
reverser(s.patch(strt,rev,len), end)
}
}
testing:
reverser( "What, is A sEntence?")
//res0: String = Tahw, si A eCnetnes?

You can first split your string at a list of special characters and then reverse each individual word and store it in a temporary string. After that traverse the original string and temporary string and replace word matching any special characters with current character in temporary string.
private def reverseHelper(word: String): String = {
var result = new StringBuilder(word)
if (word.head.isUpper) {
result.setCharAt(0, word.head.toLower)
result.setCharAt(word.length - 1, word.last.toUpper)
}
result.reverse.result()
}
val tempStr = str
.split("[.,!?: ]+")
.map(result => reverseHelper(result))
.mkString("")
val sList = "[.,!?: ]+".toList
var curr = 0
val formatString = str.map(c => {
if(!sList.contains(c)) {
curr = curr + 1
tempStr(curr-1)
}
else c
})

Here's one approach that uses a Regex pattern to generate a list of paired strings of Seq(word, nonWord), followed by reversal and positional uppercasing of the word strings:
def reverseWords(s: String): String = {
val pattern = """(\w+)(\W*)""".r
pattern.findAllMatchIn(s).flatMap(_.subgroups).grouped(2).
map{ case Seq(word, nonWord) =>
val caseList = word.map(_.isUpper)
val newWord = (word.reverse zip caseList).map{
case (c, true) => c.toUpper
case (c, false) => c.toLower
}.mkString
newWord + nonWord
}.
mkString
}
reverseWords("He likes McDonald's burgers. I prefer In-and-Out's.")
//res1: String = "Eh sekil DlAnodcm's sregrub. I referp Ni-dna-Tuo's."

A version using split on word boundaries:
def reverseWords(string: String): String = {
def revCap(s: String): String =
s.headOption match {
case Some(c) if c.isUpper =>
(c.toLower +: s.drop(1)).reverse.capitalize
case Some(c) if c.isLower =>
s.reverse
case _ => s
}
string
.split("\\b")
.map(revCap)
.mkString("")
}

little problem on code for finding substring within string scala

I am currently working on a small code that should allow to tell if a given substring is within a string. I checked all the other similar questions but everybody is using predefined functions. I need to build it from scratch… could you please tell me what I did wrong?
def substring(s: String, t: String): Boolean ={
var i = 0 // position on substring
var j = 0 // position on string
var result = false
var isSim = true
var n = s.length // small string size
var m = t.length // BIG string size
// m must always be bigger than n + j
while (m>n+j && isSim == true){
// j grows with i
// stopping the loop at i<n
while (i<n && isSim == true){
// if characters are similar
if (s(i)==t(j)){
// add 1 to i. So j will increase by one as well
// this will run the loop looking for similarities. If not, exit the loop.
i += 1
j = i+1
// exciting the loop if no similarity is found
}
// answer given if no similarity is found
isSim = false
}
}
// printing the output
isSim
}
substring("moth", "ramathaaaaaaa")

The problem consists of two subproblems of same kind. You have to check whether
there exists a start index j such that
for all i <- 0 until n it holds that substring(i) == string(j + i)
Whenever you have to check whether some predicate holds for some / for all elements of a sequence, it can be quite handy if you can short-circuit and exit early by using the return keyword. Therefore, I'd suggest to eliminate all variables and while-loops, and use a nested method instead:
def substring(s: String, t: String): Boolean ={
val n = s.length // small string size
val m = t.length // BIG string size
def substringStartingAt(startIndex: Int): Boolean = {
for (i <- 0 until n) {
if (s(i) != t(startIndex + i)) return false
}
true
}
for (possibleStartIndex <- 0 to m - n) {
if (substringStartingAt(possibleStartIndex)) return true
}
false
}
The inner method checks whether all s(j + i) == t(i) for a given j. The outer for-loop checks whether there exists a suitable offset j.
Example:
for (
(sub, str) <- List(
("moth", "ramathaaaaaaa"),
("moth", "ramothaaaaaaa"),
("moth", "mothraaaaaaaa"),
("moth", "raaaaaaaamoth"),
("moth", "mmoth"),
("moth", "moth"),
)
) {
println(sub + " " + " " + str + ": " + substring(sub, str))
}
output:
moth ramathaaaaaaa: false
moth ramothaaaaaaa: true
moth mothraaaaaaaa: true
moth raaaaaaaamoth: true
moth mmoth: true
moth moth: true
If you were allowed to use built-in methods, you could of course also write
def substring(s: String, t: String): Boolean = {
val n = s.size
val m = t.size
(0 to m-n).exists(j => (0 until n).forall(i => s(i) == t(j + i)))
}

I offer the following slightly more idiomatic Scala code, not because I think it will perform better than Andrey's code--I don't--but simply because it uses recursion and is, perhaps, slightly easier to read:
/**
* Method to determine if "sub" is a substring of "string".
*
* #param sub the candidate substring.
* #param string the full string.
* #return true if "sub" is a substring of "string".
*/
def substring(sub: String, string: String): Boolean = {
val p = sub.toList
/**
* Tail-recursive method to determine if "p" is a subsequence of "s"
*
* #param s the super-sequence to be tested (part of the original "string").
* #return as follows:
* (1) "p" longer than "s" => false;
* (2) "p" elements match the corresponding "s" elements (starting at the start of "s") => true
* (3) recursively invoke substring on "p" and the tail of "s".
*/
#tailrec def substring(s: Seq[Char]): Boolean = p.length <= s.length && (
s.startsWith(p) || (
s match {
case Nil => false
case _ :: z => substring(z)
}
)
)
p.isEmpty || substring(string.toList)
}
If you object to using the built-in method startsWith then we could use something like:
(p zip s forall (t => t._1==t._2))
But we have to draw the line somewhere between creating everything from scratch and using built-in functions.

Parallel Merge Sort in Scala

I have been trying to implement parallel merge sort in Scala. But with 8 cores, using .sorted is still about twice as fast.
edit:
I rewrote most of the code to minimize object creation. Now it runs about as fast as the .sorted
Input file with 1.2M integers:
1.333580 seconds (my implementation)
1.439293 seconds (.sorted)
How should I parallelize this?
New implementation
object Mergesort extends App
{
//=====================================================================================================================
// UTILITY
implicit object comp extends Ordering[Any] {
def compare(a: Any, b: Any) = {
(a, b) match {
case (a: Int, b: Int) => a compare b
case (a: String, b: String) => a compare b
case _ => 0
}
}
}
//=====================================================================================================================
// MERGESORT
val THRESHOLD = 30
def inssort[A](a: Array[A], left: Int, right: Int): Array[A] = {
for (i <- (left+1) until right) {
var j = i
val item = a(j)
while (j > left && comp.lt(item,a(j-1))) {
a(j) = a(j-1)
j -= 1
}
a(j) = item
}
a
}
def mergesort_merge[A](a: Array[A], temp: Array[A], left: Int, right: Int, mid: Int) : Array[A] = {
var i = left
var j = right
while (i < mid) { temp(i) = a(i); i+=1; }
while (j > mid) { temp(i) = a(j-1); i+=1; j-=1; }
i = left
j = right-1
var k = left
while (k < right) {
if (comp.lt(temp(i), temp(j))) { a(k) = temp(i); i+=1; k+=1; }
else { a(k) = temp(j); j-=1; k+=1; }
}
a
}
def mergesort_split[A](a: Array[A], temp: Array[A], left: Int, right: Int): Array[A] = {
if (right-left == 1) a
if ((right-left) > THRESHOLD) {
val mid = (left+right)/2
mergesort_split(a, temp, left, mid)
mergesort_split(a, temp, mid, right)
mergesort_merge(a, temp, left, right, mid)
}
else
inssort(a, left, right)
}
def mergesort[A: ClassTag](a: Array[A]): Array[A] = {
val temp = new Array[A](a.size)
mergesort_split(a, temp, 0, a.size)
}
Previous implementation
Input file with 1.2M integers:
4.269937 seconds (my implementation)
1.831767 seconds (.sorted)
What sort of tricks there are to make it faster and cleaner?
object Mergesort extends App
{
//=====================================================================================================================
// UTILITY
val StartNano = System.nanoTime
def dbg(msg: String) = println("%05d DBG ".format(((System.nanoTime - StartNano)/1e6).toInt) + msg)
def time[T](work: =>T) = {
val start = System.nanoTime
val res = work
println("%f seconds".format((System.nanoTime - start)/1e9))
res
}
implicit object comp extends Ordering[Any] {
def compare(a: Any, b: Any) = {
(a, b) match {
case (a: Int, b: Int) => a compare b
case (a: String, b: String) => a compare b
case _ => 0
}
}
}
//=====================================================================================================================
// MERGESORT
def merge[A](left: List[A], right: List[A]): Stream[A] = (left, right) match {
case (x :: xs, y :: ys) if comp.lteq(x, y) => x #:: merge(xs, right)
case (x :: xs, y :: ys) => y #:: merge(left, ys)
case _ => if (left.isEmpty) right.toStream else left.toStream
}
def sort[A](input: List[A], length: Int): List[A] = {
if (length < 100) return input.sortWith(comp.lt)
input match {
case Nil | List(_) => input
case _ =>
val middle = length / 2
val (left, right) = input splitAt middle
merge(sort(left, middle), sort(right, middle + length%2)).toList
}
}
def msort[A](input: List[A]): List[A] = sort(input, input.length)
//=====================================================================================================================
// PARALLELIZATION
//val cores = Runtime.getRuntime.availableProcessors
//dbg("Detected %d cores.".format(cores))
//lazy implicit val ec = ExecutionContext.fromExecutorService(Executors.newFixedThreadPool(cores))
def futuremerge[A](fa: Future[List[A]], fb: Future[List[A]])(implicit order: Ordering[A], ec: ExecutionContext) =
{
for {
a <- fa
b <- fb
} yield merge(a, b).toList
}
def parallel_msort[A](input: List[A], length: Int)(implicit order: Ordering[A]): Future[List[A]] = {
val middle = length / 2
val (left, right) = input splitAt middle
if(length > 500) {
val fl = parallel_msort(left, middle)
val fr = parallel_msort(right, middle + length%2)
futuremerge(fl, fr)
}
else {
Future(msort(input))
}
}
//=====================================================================================================================
// MAIN
val results = time({
val src = Source.fromFile("in.txt").getLines
val header = src.next.split(" ").toVector
val lines = if (header(0) == "i") src.map(_.toInt).toList else src.toList
val f = parallel_msort(lines, lines.length)
Await.result(f, concurrent.duration.Duration.Inf)
})
println("Sorted as comparison...")
val sorted_src = Source.fromFile(input_folder+"in.txt").getLines
sorted_src.next
time(sorted_src.toList.sorted)
val writer = new PrintWriter("out.txt", "UTF-8")
try writer.print(results.mkString("\n"))
finally writer.close
}

My answer is probably going to be a bit long, but i hope that it will be useful for both you and me.
So, first question is: "how scala is doing sorting for a List?" Let's have a look at the code from scala repo!
def sorted[B >: A](implicit ord: Ordering[B]): Repr = {
val len = this.length
val b = newBuilder
if (len == 1) b ++= this
else if (len > 1) {
b.sizeHint(len)
val arr = new Array[AnyRef](len) // Previously used ArraySeq for more compact but slower code
var i = 0
for (x <- this) {
arr(i) = x.asInstanceOf[AnyRef]
i += 1
}
java.util.Arrays.sort(arr, ord.asInstanceOf[Ordering[Object]])
i = 0
while (i < arr.length) {
b += arr(i).asInstanceOf[A]
i += 1
}
}
b.result()
}
So what the hell is going on here? Long story short: with java. Everything else is just size justification and casting. Basically this is the line which defines it:
java.util.Arrays.sort(arr, ord.asInstanceOf[Ordering[Object]])
Let's go one level deeper into JDK sources:
public static <T> void sort(T[] a, Comparator<? super T> c) {
if (c == null) {
sort(a);
} else {
if (LegacyMergeSort.userRequested)
legacyMergeSort(a, c);
else
TimSort.sort(a, 0, a.length, c, null, 0, 0);
}
}
legacyMergeSort is nothing but single threaded implementation of merge sort algorithm.
The next question is: "what is TimSort.sort and when do we use it?"
To my best knowledge default value for this property is false, which leads us to TimSort.sort algorithm. Description can be found here. Why is it better? Less comparisons that in merge sort according to comments in JDK sources.
Moreover you should be aware that it is all single threaded, so no parallelization here.
Third question, "your code":
You create too many objects. When it comes to performance, mutation (sadly) is your friend.
Premature optimization is the root of all evil -- Donald Knuth. Before making any optimizations (like parallelism), try to implement single threaded version and compare the results.
Use something like JMH to test performance of your code.
You should not probably use Stream class if you want to have the best performance as it does additional caching.
I intentionally did not give you answer like "super-fast merge sort in scala can be found here", but just some tips for you to apply to your code and coding practices.
Hope it will help you.

How doing String-Programming in Swift

I miss usable String-functions, that are easy to use, without typing lines of strange identifiers. So I decided to built up a libary with useful and recognicable String-Functions.

I first tried to use Cocoa String-Functions to solve this problem. So I tried in the playground:
import Cocoa
func PartOfString(s: String, start: Int, length: Int) -> String
{
return s.substringFromIndex(advance(s.startIndex, start - 1)).substringToIndex(advance(s.startIndex, length))
}
PartOfString("HelloaouAOUs.World", 1, 5) --> "Hello"
PartOfString("HelloäöüÄÖÜß…World", 1, 5) --> "Hello"
PartOfString("HelloaouAOUs.World", 1, 18) --> "HelloaouAOUs.World"
PartOfString("HelloäöüÄÖÜß…World", 1, 18) --> "HelloäöüÄÖÜß…World"
PartOfString("HelloaouAOUs.World", 6, 7) --> "aouAOUs"
PartOfString("HelloäöüÄÖÜß…World", 6, 7) --> "äöüÄO"
If UnCode Characters are in the String for the case, that "substringFromIndex" is not the Start-Index. And even worse, the Swift-Program crashes sometimes at running time, if UnCode-Characters are in a String, for the case, that "substringFromIndex" is not the Start-Index. So I decided to create a set of new Functions, that take care of this problem and work with UnCode-Characters. Please note, that filenames can contain UnCode-Characters as well. So if you think you do not need UnCode-Characters you are wrong.
If you want to reproduce this, you need the same String I used, because copying from this Web-Page does not reproduce the problem.
var s: String = "HelloäöüÄÖÜß…World"
var t: String = s.stringByAddingPercentEscapesUsingEncoding(NSUTF8StringEncoding)!
var u: String = "Helloa%CC%88o%CC%88u%CC%88A%CC%88O%CC%88U%CC%88%C3%9F%E2%80%A6World".stringByRemovingPercentEncoding!
var b: Bool = (s == u) --> true
PartOfString(s, 6, 7) --> "äöüÄO"
Now you could get the idea, to convert the disturbing Canonical-Mapping UniCodes to compatible one with the following function:
func percentescapesremove (s: String) -> String
{
return (s.stringByRemovingPercentEncoding!.precomposedStringWithCompatibilityMapping)
}
And the result you will get is:
var v: String = percentescapesremove(t) --> "HelloäöüÄÖÜß...World"
PartOfString(v, 6, 7) --> "äöüÄÖÜß"
var a: Bool = (s == v) --> false
When you do so, the "äöüÄÖÜß" looks good and you think, everything is OK but look at the "..." which has been permanently converted from UniCode "…" to non-UniCode "..." and has the result which is not identically to the first string. If you have UniCode-filenames, then converting will result in not finding the file on a volume. So it is a good idea to convert only for scree-output and keep the original String in a save place.
The problem with the PartOfString-Function above is, that it generates a new String in the first part of the assignment and uses this new String with the index of the old one, which does not work, because the UniCodes have a different length than the normal letters. So I improved the funktion (thank to Martin R for his help):
func NewPartOfString(s: String, start: Int, length: Int) -> String
{
let t: String = s.substringFromIndex(advance(s.startIndex, start - 1))
return t.substringToIndex(advance(t.startIndex, length))
}
And the result is correct:
NewPartOfString("HelloaouAOUs.World", 1, 5) --> "Hello"
NewPartOfString("HelloäöüÄÖÜß…World", 1, 5) --> "Hello"
NewPartOfString("HelloaouAOUs.World", 1, 18) --> "HelloaouAOUs.World"
NewPartOfString("HelloäöüÄÖÜß…World", 1, 18) --> "HelloäöüÄÖÜß…World"
NewPartOfString("HelloaouAOUs.World", 6, 7) --> "aouAOUs"
NewPartOfString("HelloäöüÄÖÜß…World", 6, 7) --> "äöüÄÖÜß"
In the next step I will show a few functions, that can be used and work well. All of them are based on Integer-Index-Values that will start at 1 for the first character end end with the index for the last character being identically to the length of the String.
This function returns the length of a string:
func len (s: String) -> Int
{
return (countElements(s)) // This works not really fast, because of UniCode
}
This function returns the UniCode-Number of the first UniCode-Character in the String:
func asc (s: String) -> Int
{
if (s == "")
{
return 0
}
else
{
return (Int(s.unicodeScalars[s.unicodeScalars.startIndex].value))
}
}
This function returns the UniCode-Character of the given UniCode-Number:
func char (c: Int) -> String
{
var s: String = String(UnicodeScalar(c))
return (s)
}
This function returns the Upper-Case representation of a String:
func ucase (s: String) -> String
{
return (s.uppercaseString)
}
This function returns the Lower-Case representation of a String:
func lcase (s: String) -> String
{
return (s.lowercaseString)
}
The next Function gives the left part of a String with a given length:
func left (s: String, length: Int) -> String
{
if (length < 1)
{
return ("")
}
else
{
if (length > len(s))
{
return (s)
}
else
{
return (s.substringToIndex(advance(s.startIndex, length)))
}
}
}
The next Function gives the right part of a String with a given length:
func right (s: String, laenge: Int) -> String
{
var L: Int = len(s)
if (L <= laenge)
{
return(s)
}
else
{
if (laenge < 1)
{
return ("")
}
else
{
let t: String = s.substringFromIndex(advance(s.startIndex, L - laenge))
return t.substringToIndex(advance(t.startIndex, laenge))
}
}
}
The next Function gives the part of a String with a given length:
func mid (s: String, start: Int, laenge: Int) -> String
{
if (start <= 1)
{
return (left(s, laenge))
}
else
{
var L: Int = len(s)
if ((start > L) || (laenge < 1))
{
return ("")
}
else
{
if (start + laenge > L)
{
let t: String = s.substringFromIndex(advance(s.startIndex, start - 1))
return t.substringToIndex(advance(t.startIndex, L - start + 1))
}
else
{
let t: String = s.substringFromIndex(advance(s.startIndex, start - 1))
return t.substringToIndex(advance(t.startIndex, laenge))
}
}
}
}
A little more difficult is to get a character at a given position, because we cannot use "substringFromIndex" and "substringToIndex" with "substringFromIndex" is not the Start-Index. So the idea is to trace through the string, character for character, and get the needed substring.
func CharacterOfString(s: String, index: Int, length: Int) -> String
{
var c: String = ""
var i: Int = 0
for UniCodeChar in s.unicodeScalars
{
i = i + 1
if ((i >= index) && (i < index + length))
{
c = c + String(UniCodeChar)
}
}
return (c)
}
But this works not correctly for Strings which contain UniCode-Characters. The following examples show what happens:
CharacterOfString("Swift Example Text aouAOUs.", 16, 8) --> "ext aouA"
len(CharacterOfString("Swift Example Text aouAOUs.", 16, 8)) --> 8
CharacterOfString("Swift Example Text äöüÄÖÜß…", 16, 8) --> "ext äö"
len(CharacterOfString("Swift Example Text äöüÄÖÜß…", 16, 8)) --> 6
So we see, that the resulting String is too short, because a UniCode-Character can contain more than one character. This is because "ä" can be one UniCode-Character and also written as two "a¨" UniCode-Character. So we need another way to get a valid substring.
The solution is, to convert the UniCode-String to an array of UniCode-Characters and to use the index af the array to get a valid character. This works in all cases to get a single Character of an UniCode-String at a given index:
func indchar (s: String, i: Int) -> String
{
if ((i < 1) || (i > len(s)))
{
return ("")
}
else
{
return String(Array(s)[i - 1])
}
}
And with this knowledge, I have built a Function, which can get a valid UniCode-Substring with a given Start-Index and a given length:
func substring(s: String, Start: Int, Length: Int) -> String
{
var L: Int = len(s)
var UniCode = Array(s)
var result: String = ""
var TheEnd: Int = Start + Length - 1
if ((Start < 1) || (Start > L))
{
return ("")
}
else
{
if ((Length < 0) || (TheEnd > L))
{
TheEnd = L
}
for var i: Int = Start; i <= TheEnd; ++i
{
result = result + String(UniCode[i - 1])
}
return (result)
}
}
The next Function searches for the position of a given String in another String:
func position (original: String, search: String, start: Int) -> Int
{
var index = part(original, start).rangeOfString(search)
if (index != nil)
{
var pos: Int = distance(original.startIndex, index!.startIndex)
return (pos + start)
}
else
{
return (0)
}
}
This function looks, if a given Character-Code is a number (0-9):
func number (n: Int) -> Bool
{
return ((n >= 48) & (n <= 57)) // "0" to "9"
}
Now the basic String-Operations are shown, but what about Numbers? How will numbers converted to Strings and vice versa? Let's have a look at converting Strings to Numbers. Please not the "!" in the second line, which is used to get a Int and not an optional Int.
var s: String = "123" --> "123"
var v: Int = s.toInt() --> (123)
var v: Int = s.toInt()! --> 123
But this does not work, if the String contains some characters:
var s: String = "123." --> "123."
var v: Int = s.toInt()! --> Will result in a Runtime Error, because s.toInt() = nil
So I decided to built a smater Function to get the value of a String:
func val (s: String) -> Int
{
var p: Int = 0
var sign: Int = 0
if (indchar(s, 1) == "-")
{
sign = 1
p = 1
}
while(number(asc(indchar(s, p + 1))))
{
p = p + 1
}
if (p > sign)
{
return (left(s, p).toInt()!)
}
else
{
return (0)
}
}
Now the result is correct and does not produce a Runtime-Error:
var s: String = "123." --> "123."
var v: Int = val(s) --> 123
And now the same for Floating-Point Numbers:
func realval (s: String) -> Double
{
var r: Double = 0
var p: Int = 1
var a: Int = asc(indchar(s, p))
if (indchar(s, 1) == "-")
{
p = 2
}
while ((a != 44) && (a != 46) && ((a >= 48) & (a <= 57)))
{
p = p + 1
a = asc(indchar(s, p))
}
if (p >= len(s)) // Integer Number
{
r = Double(val(s))
}
else // Number with fractional part
{
var mantissa: Int = val(substring(s, p + 1, -1))
var fract: Double = 0
while (mantissa != 0)
{
fract = (fract / 10) + (Double(mantissa % 10) / 10)
mantissa = mantissa / 10
p = p + 1
}
r = Double(val(s)) + fract
p = p + 1
}
a = asc(indchar(s, p))
if ((a == 69) || (a == 101)) // Exponent
{
var exp: Int = val(substring(s, p + 1, -1))
if (exp != 0)
{
for var i: Int = 1; i <= abs(exp); ++i
{
if (exp > 0)
{
r = r * 10
}
else
{
r = r / 10
}
}
}
}
return (r)
}
This works for Floating points numbers with exponents:
var s: String = "123.456e3"
var t: String = "123.456e-3"
var v: Double = realval(s) --> 123456
var w: Double = realval(t) --> 0.123456
To generate a String from an Integer is much more simple:
func str (n: Int) -> String
{
return (String(n))
}
A String of a floating point variable does not work with String(n) but can be done with:
func strreal (n: Double) -> String
{
return ("\(n)")
}

Return all the indexes of a particular substring

Is there a Scala library API method (and if not, an idiomatic way) to obtain a list of all the indexes for a substring (target) within a larger string (source)? I have tried to look through the ScalaDoc, but was not able to find anything obvious. There are SO many methods doing so many useful things, I am guessing I am just not submitting the right search terms.
For example, if I have a source string of "name:Yo,name:Jim,name:name,name:bozo" and I use a target string of "name:", I would like to get back a List[Int] of List(0, 8, 17, 27).
Here's my quick hack to resolve the problem:
def indexesOf(source: String, target: String, index: Int = 0, withinOverlaps: Boolean = false): List[Int] = {
def recursive(index: Int, accumulator: List[Int]): List[Int] = {
if (!(index < source.size)) accumulator
else {
val position = source.indexOf(target, index)
if (position == -1) accumulator
else {
recursive(position + (if (withinOverlaps) 1 else target.size), position :: accumulator)
}
}
}
if (target.size <= source.size) {
if (!source.equals(target)) {
recursive(0, Nil).reverse
}
else List(0)
}
else Nil
}
Any guidance you can give me replacing this with a proper standard library entry point would be greatly appreciated.
UPDATE 2019/Jun/16:
Further code tightening:
def indexesOf(source: String, target: String, index: Int = 0, withinOverlaps: Boolean = false): List[Int] = {
def recursive(indexTarget: Int = index, accumulator: List[Int] = Nil): List[Int] = {
val position = source.indexOf(target, indexTarget)
if (position == -1)
accumulator
else
recursive(position + (if (withinOverlaps) 1 else target.size), position :: accumulator)
}
recursive().reverse
}
UPDATE 2014/Jul/22:
Inspired by Siddhartha Dutta's answer, I tighted up my code. It now looks like this:
def indexesOf(source: String, target: String, index: Int = 0, withinOverlaps: Boolean = false): List[Int] = {
#tailrec def recursive(indexTarget: Int, accumulator: List[Int]): List[Int] = {
val position = source.indexOf(target, indexTarget)
if (position == -1) accumulator
else
recursive(position + (if (withinOverlaps) 1 else target.size), position :: accumulator)
}
recursive(index, Nil).reverse
}
Additionally, if I have a source string of "aaaaaaaa" and I use a target string of "aa", I would like by default to get back a List[Int] of List(0, 2, 4, 6) which skips a search starting inside of a found substring. The default can be overridden by passing "true" for the withinOverlaps parameter which in the "aaaaaaaa"/"aa" case would return List(0, 1, 2, 3, 4, 5, 6).

I am always inclined to reach into the bag of regex tricks with problems like this one. I wouldn't say it is proper, but it's a hell of a lot less code. :)
val r = "\\Qname\\E".r
val ex = "name:Yo,name:Jim,name:name,name:bozo"
val is = r.findAllMatchIn(ex).map(_.start).toList
The quotes \\Q and \\E aren't necessary for this case, but if the string you're looking for has any special characters, then it will be.

A small code to get all the indexes
call the below method as getAllIndexes(source, target)
def getAllIndexes(source: String, target: String, index: Int = 0): List[Int] = {
val targetIndex = source.indexOf(target, index)
if(targetIndex != -1)
List(targetIndex) ++ getAllIndexes(source, target, targetIndex+1)
else
List()
}

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Speeding up function that compares strings - string

Related

How reverse words in string and keep punctuation marks and upper case symbol

little problem on code for finding substring within string scala

Parallel Merge Sort in Scala

How doing String-Programming in Swift

Return all the indexes of a particular substring

Categories

Resources