Merging overlapping strings - string

Suppose I need to merge two overlapping strings like that:
def mergeOverlap(s1: String, s2: String): String = ???
mergeOverlap("", "") // ""
mergeOverlap("", "abc") // abc
mergeOverlap("xyz", "abc") // xyzabc
mergeOverlap("xab", "abc") // xabc
I can write this function using the answer to one of my previous questions:
def mergeOverlap(s1: String, s2: String): String = {
val n = s1.tails.find(tail => s2.startsWith(tail)).map(_.size).getOrElse(0)
s1 ++ s2.drop(n)
}
Could you suggest either a simpler or maybe more efficient implementation of mergeOverlap?

You can find the overlap between two strings in time proportional to the total length of the strings O(n + k) using the algorithm to calculate the prefix function. Prefix function of a string at index i is defined as the size of the longest suffix at index i that is equal to the prefix of the whole string (excluding the trivial case).
See those links for more explanation of the definition and the algorithm to compute it:
https://cp-algorithms.com/string/prefix-function.html
https://hyperskill.org/learn/step/6413#a-definition-of-the-prefix-function
Here is an implementation of a modified algorithm that calculates the longest prefix of the second argument, equal to the suffix of the first argument:
import scala.collection.mutable.ArrayBuffer
def overlap(hasSuffix: String, hasPrefix: String): Int = {
val overlaps = ArrayBuffer(0)
for (suffixIndex <- hasSuffix.indices) {
val currentCharacter = hasSuffix(suffixIndex)
val currentOverlap = Iterator.iterate(overlaps.last)(overlap => overlaps(overlap - 1))
.find(overlap =>
overlap == 0 ||
hasPrefix.lift(overlap).contains(currentCharacter))
.getOrElse(0)
val updatedOverlap = currentOverlap +
(if (hasPrefix.lift(currentOverlap).contains(currentCharacter)) 1 else 0)
overlaps += updatedOverlap
}
overlaps.last
}
And with that mergeOverlap is just
def mergeOverlap(s1: String, s2: String) =
s1 ++ s2.drop(overlap(s1, s2))
And some tests of this implementation:
scala> mergeOverlap("", "")
res0: String = ""
scala> mergeOverlap("abc", "")
res1: String = abc
scala> mergeOverlap("", "abc")
res2: String = abc
scala> mergeOverlap("xyz", "abc")
res3: String = xyzabc
scala> mergeOverlap("xab", "abc")
res4: String = xabc
scala> mergeOverlap("aabaaab", "aab")
res5: String = aabaaab
scala> mergeOverlap("aabaaab", "aabc")
res6: String = aabaaabc
scala> mergeOverlap("aabaaab", "bc")
res7: String = aabaaabc
scala> mergeOverlap("aabaaab", "bbc")
res8: String = aabaaabbc
scala> mergeOverlap("ababab", "ababc")
res9: String = abababc
scala> mergeOverlap("ababab", "babc")
res10: String = abababc
scala> mergeOverlap("abab", "aab")
res11: String = ababaab

It's not tail recursive but it is a very simple algorithm.
def mergeOverlap(s1: String, s2: String): String =
if (s2 startsWith s1) s2
else s1.head +: mergeOverlap(s1.tail, s2)

Related

Print characters at even and odd indices from a String

Using scala, how to print string in even and odd indices of a given string? I am aware of the imperative approach using var. I am looking for an approach that uses immutability, avoids side-effects (of course, until need to print result) and concise.
Here is a tail-recursive solution returning even and odd chars (List[Char], List[Char]) in one go
def f(in: String): (List[Char], List[Char]) = {
#tailrec def run(s: String, idx: Int, accEven: List[Char], accOdd: List[Char]): (List[Char], List[Char]) = {
if (idx < 0) (accEven, accOdd)
else if (idx % 2 == 0) run(s, idx - 1, s.charAt(idx) :: accEven, accOdd)
else run(s, idx - 1, accEven, s.charAt(idx) :: accOdd)
}
run(in, in.length - 1, Nil, Nil)
}
which could be printed like so
val (even, odd) = f("abcdefg")
println(even.mkString)
Another way to explore is using zipWithIndex
def printer(evenOdd: Int) {
val str = "1234"
str.zipWithIndex.foreach { i =>
i._2 % 2 match {
case x if x == evenOdd => print(i._1)
case _ =>
}
}
}
In this case you can check the results by using the printer function
scala> printer(1)
24
scala> printer(0)
13
.zipWithIndex takes a List and returns tuples of the elements coupled with their index. Knowing that a String is a list of Char
Looking at str
scala> val str = "1234"
str: String = 1234
str.zipWithIndex
res: scala.collection.immutable.IndexedSeq[(Char, Int)] = Vector((1,0), (2,1), (3,2), (4,3))
Lastly, as you only need to print, using foreach instead of map is more ideal as you aren't expecting values to be returned
You can use the sliding function, which is quite simple:
scala> "abcdefgh".sliding(1,2).mkString("")
res16: String = aceg
scala> "abcdefgh".tail.sliding(1,2).mkString("")
res17: String = bdfh
val s = "abcd"
// ac
(0 until s.length by 2).map(i => s(i))
// bd
(1 until s.length by 2).map(i => s(i))
just pure functions with map operator

How reverse words in string and keep punctuation marks and upper case symbol

private def reverseHelper(word: String): String = {
var result = new StringBuilder(word)
if (word.head.isUpper) {
result.setCharAt(0, word.head.toLower)
result.setCharAt(word.length - 1, word.last.toUpper)
}
result.reverse.result()
}
val formatString = str
.split("[.,!?: ]+")
.map(result => str.replaceFirst(result, reverseHelper(result)))
.foreach(println)
Example:
Input: What is a sentence?
Ouput: Tahw si a ecnetnes?
but i have Array[String]: Tahw is a sentence?, What si a sentence?, What is a sentence?, What is a ecnetnes?
How i can write this in right format?
Restoring the original capitalization is a bit tricky.
def reverser(s:Seq[Char], idx:Int = 0) :String = {
val strt = s.indexWhere(_.isLetter, idx)
if (strt < 0) s.mkString
else {
val end = s.indexWhere(!_.isLetter, strt)
val len = end - strt
val rev = Range(0,len).map{ x =>
if (s(strt+x).isUpper) s(end-1-x).toUpper
else s(end-1-x).toLower
}
reverser(s.patch(strt,rev,len), end)
}
}
testing:
reverser( "What, is A sEntence?")
//res0: String = Tahw, si A eCnetnes?
You can first split your string at a list of special characters and then reverse each individual word and store it in a temporary string. After that traverse the original string and temporary string and replace word matching any special characters with current character in temporary string.
private def reverseHelper(word: String): String = {
var result = new StringBuilder(word)
if (word.head.isUpper) {
result.setCharAt(0, word.head.toLower)
result.setCharAt(word.length - 1, word.last.toUpper)
}
result.reverse.result()
}
val tempStr = str
.split("[.,!?: ]+")
.map(result => reverseHelper(result))
.mkString("")
val sList = "[.,!?: ]+".toList
var curr = 0
val formatString = str.map(c => {
if(!sList.contains(c)) {
curr = curr + 1
tempStr(curr-1)
}
else c
})
Here's one approach that uses a Regex pattern to generate a list of paired strings of Seq(word, nonWord), followed by reversal and positional uppercasing of the word strings:
def reverseWords(s: String): String = {
val pattern = """(\w+)(\W*)""".r
pattern.findAllMatchIn(s).flatMap(_.subgroups).grouped(2).
map{ case Seq(word, nonWord) =>
val caseList = word.map(_.isUpper)
val newWord = (word.reverse zip caseList).map{
case (c, true) => c.toUpper
case (c, false) => c.toLower
}.mkString
newWord + nonWord
}.
mkString
}
reverseWords("He likes McDonald's burgers. I prefer In-and-Out's.")
//res1: String = "Eh sekil DlAnodcm's sregrub. I referp Ni-dna-Tuo's."
A version using split on word boundaries:
def reverseWords(string: String): String = {
def revCap(s: String): String =
s.headOption match {
case Some(c) if c.isUpper =>
(c.toLower +: s.drop(1)).reverse.capitalize
case Some(c) if c.isLower =>
s.reverse
case _ => s
}
string
.split("\\b")
.map(revCap)
.mkString("")
}

Scala: Remove the last occurrence of a character

I am trying to remove the last occurrence of a character in a string. I can get its index:
str.lastIndexOf(',')
I have already tried to use split and the replace function on the string.
You could use patch.
scala> val s = "s;dfkj;w;erw"
s: String = s;dfkj;w;erw
scala> s.patch(s.lastIndexOf(';'), "", 1)
res6: String = s;dfkj;werw
Curious why Scala doesn't have a .replaceLast but there must be a reason...
Reverse the String and use str.replaceFirst then reverse again
I doubt this is terrible efficient but it is effective :)
scala> "abc.xyz.abc.xyz".reverse.replaceFirst("zyx.", "").reverse
res5: String = abc.xyz.abc
As a def it would look like this:
def replaceLast(input: String, matchOn: String, replaceWith: String) = {
input.reverse.replaceFirst(matchOn.reverse, replaceWith.reverse).reverse
}
scala> def removeLast(x: Char, xs: String): String = {
|
| val accumulator: (Option[Char], String) = (None, "")
|
| val (_, applied) = xs.foldRight(accumulator){(e: Char, acc: (Option[Char], String)) =>
| val (alreadyReplaced, runningAcc) = acc
| alreadyReplaced match {
| case some # Some(_) => (some, e + runningAcc)
| case None => if (e == x) (Some(e), runningAcc) else (None, e + runningAcc)
| }
| }
|
| applied
| }
removeLast: (x: Char, xs: String)String
scala> removeLast('f', "foobarf")
res7: String = foobar
scala> removeLast('f', "foobarfff")
res8: String = foobarff
You could try the following:
val input = "The quick brown fox jumps over the lazy dog"
val lastIndexOfU = input.lastIndexOf("u")
val splits = input.splitAt(lastIndexOfU)
val inputWithoutLastU = splits._1 + splits._2.drop(1) // "The quick brown fox jmps over the lazy dog"

Scala , Spark code : Iterating over array and evaluating expression using an element in the array

I am coding in scala-spark and trying to segregate all strings and column datatypes .
I am getting the output for columns(2)_2 albeit with a warning but when i apply the same thing in the if statement i get an error . Any idea why. This part got solved by adding columns(2)._2 : David Griffin
var df = some dataframe
var columns = df.dtypes
var colnames = df.columns.size
var stringColumns:Array[(String,String)] = null;
var doubleColumns:Array[(String,String)] = null;
var otherColumns:Array [(String,String)] = null;
columns(2)._2
columns(2)._1
for (x<-1 to colnames)
{
if (columns(x)._2 == "StringType")
{stringColumns = stringColumns ++ Seq((columns(x)))}
if (columns(x)._2 == "DoubleType")
{doubleColumns = doubleColumns ++ Seq((columns(x)))}
else
{otherColumns = otherColumns ++ Seq((columns(x)))}
}
Previous Output:
stringColumns: Array[(String, String)] = null
doubleColumns: Array[(String, String)] = null
otherColumns: Array[(String, String)] = null
res158: String = DoubleType
<console>:127: error: type mismatch;
found : (String, String)
required: scala.collection.GenTraversableOnce[?]
{stringColumns = stringColumns ++ columns(x)}
Current Output:
stringColumns: Array[(String, String)] = null
doubleColumns: Array[(String, String)] = null
otherColumns: Array[(String, String)] = null
res382: String = DoubleType
res383: String = CVB
java.lang.NullPointerException
^
I believe you are missing a .. Change this:
columns(2)_2
to
columns(2)._2
If nothing else, it will get rid of the warning.
And then, you need to do:
++ Seq(columns(x))
Here's a cleaner example:
scala> val arr = Array[(String,String)]()
arr: Array[(String, String)] = Array()
scala> arr ++ (("foo", "bar"))
<console>:9: error: type mismatch;
found : (String, String)
required: scala.collection.GenTraversableOnce[?]
arr ++ (("foo", "bar"))
scala> arr ++ Seq(("foo", "bar"))
res2: Array[(String, String)] = Array((foo,bar))
This is the answer modified from David Griffins answer so please up-vote him too . Just altered ++ to +:=
var columns = df.dtypes
var colnames = df.columns.size
var stringColumns= Array[(String,String)]();
var doubleColumns= Array[(String,String)]();
var otherColumns= Array[(String,String)]();
for (x<-0 to colnames-1)
{
if (columns(x)._2 == "StringType"){
stringColumns +:= columns(x)
}else if (columns(x)._2 == "DoubleType") {
doubleColumns +:= columns(x)
}else {
otherColumns +:= columns(x)
}
}
println(stringColumns)
println(doubleColumns)
println(otherColumns)

Scala split string and sort data

Hi I am new in scala and I achieved following things in scala, my string contain following data
CLASS: Win32_PerfFormattedData_PerfProc_Process$$(null)|CreatingProcessID|Description|ElapsedTime|Frequency_Object|Frequency_PerfTime|Frequency_Sys100NS|HandleCount|IDProcess|IODataBytesPersec|IODataOperationsPersec|IOOtherBytesPersec|IOOtherOperationsPersec|IOReadBytesPersec|IOReadOperationsPersec|IOWriteBytesPersec|IOWriteOperationsPersec|Name|PageFaultsPersec|PageFileBytes|PageFileBytesPeak|PercentPrivilegedTime|PercentProcessorTime|PercentUserTime|PoolNonpagedBytes|PoolPagedBytes|PriorityBase|PrivateBytes|ThreadCount|Timestamp_Object|Timestamp_PerfTime|Timestamp_Sys100NS|VirtualBytes|VirtualBytesPeak|WorkingSet|WorkingSetPeak|WorkingSetPrivate$$(null)|0|(null)|8300717|0|0|0|0|0|0|0|0|0|0|0|0|0|Idle|0|0|0|100|100|0|0|0|0|0|8|0|0|0|0|0|24576|24576|24576$$(null)|0|(null)|8300717|0|0|0|578|4|0|0|0|0|0|0|0|0|System|0|114688|274432|17|0|0|0|0|8|114688|124|0|0|0|3469312|8908800|311296|5693440|61440$$(null)|4|(null)|8300717|0|0|0|42|280|0|0|0|0|0|0|0|0|smss|0|782336|884736|110|0|0|1864|10664|11|782336|3|0|0|0|5701632|19357696|1388544|1417216|700416$$(null)|372|(null)|8300715|0|0|0|1438|380|0|0|0|0|0|0|0|0|csrss|0|3624960|3747840|0|0|0|15008|157544|13|3624960|10|0|0|0|54886400|55345152|5586944|5648384|2838528$$(null)|424|(null)|8300714|0|0|0|71|432|0|0|0|0|0|0|0|0|csrss#1|0|8605696|8728576|0|0|0|8720|96384|13|8605696|9|0|0|0|50515968|50909184|7438336|9342976|4972544
now I want to find data who's value is PercentProcessorTime, ElapsedTime,.. so for this I first split above string $$ and then again split string using | and this new split string I searched string where PercentProcessorTime' presents and get Index of that string when I get string then skipped first two arrays which split from$$and get data ofPercentProcessorTime` using index , it's looks like complicated but I think following code should helps
// First split string as below
val processData = winProcessData.split("\\$\\$")
// get index here
val getIndex: Int = processData.find(part => part.contains("PercentProcessorTime"))
.map {
case getData =>
getData
} match {
case Some(s) => s.split("\\|").indexOf("PercentProcessorTime")
case None => -1
}
val getIndexOfElapsedTime: Int = processData.find(part => part.contains("ElapsedTime"))
.map {
case getData =>
getData
} match {
case Some(s) => s.split("\\|").indexOf("ElapsedTime")
case None => -1
}
// now fetch data of above index as below
for (i <- 2 to (processData.length - 1)) {
val getValues = processData(i).split("\\|")
val getPercentProcessTime = getValues(getIndex).toFloat
val getElapsedTime = getValues(getIndexOfElapsedTime).toFloat
Logger.info("("+getPercentProcessTime+","+getElapsedTime+"),")
}
Now Problem is that using above code I was getting data of given key in index, so my output was (8300717,100),(8300717,17)(8300717,110)... Now I want sort this data using getPercentProcessTime so my output should be (8300717,110),(8300717,100)(8300717,17)...
and that data should be in lists so I will pass list to case class.
Are you find PercentProcessorTime or PercentPrivilegedTime ?
Here it is
val str = "your very long string"
val heads = Seq("PercentPrivilegedTime", "ElapsedTime")
val Array(elap, perc) = str.split("\\$\\$").tail.map(_.split("\\|"))
.transpose.filter(x => heads.contains(x.head))
//elap: Array[String] = Array(ElapsedTime, 8300717, 8300717, 8300717, 8300715, 8300714)
//perc: Array[String] = Array(PercentPrivilegedTime, 100, 17, 110, 0, 0)
val res = (elap.tail, perc.tail).zipped.toList.sortBy(-_._2.toInt)
//List[(String, String)] = List((8300717,110), (8300717,100), (8300717,17), (8300715,0), (8300714,0))

Resources