Scala split string and sort data - string

Hi I am new in scala and I achieved following things in scala, my string contain following data
CLASS: Win32_PerfFormattedData_PerfProc_Process$$(null)|CreatingProcessID|Description|ElapsedTime|Frequency_Object|Frequency_PerfTime|Frequency_Sys100NS|HandleCount|IDProcess|IODataBytesPersec|IODataOperationsPersec|IOOtherBytesPersec|IOOtherOperationsPersec|IOReadBytesPersec|IOReadOperationsPersec|IOWriteBytesPersec|IOWriteOperationsPersec|Name|PageFaultsPersec|PageFileBytes|PageFileBytesPeak|PercentPrivilegedTime|PercentProcessorTime|PercentUserTime|PoolNonpagedBytes|PoolPagedBytes|PriorityBase|PrivateBytes|ThreadCount|Timestamp_Object|Timestamp_PerfTime|Timestamp_Sys100NS|VirtualBytes|VirtualBytesPeak|WorkingSet|WorkingSetPeak|WorkingSetPrivate$$(null)|0|(null)|8300717|0|0|0|0|0|0|0|0|0|0|0|0|0|Idle|0|0|0|100|100|0|0|0|0|0|8|0|0|0|0|0|24576|24576|24576$$(null)|0|(null)|8300717|0|0|0|578|4|0|0|0|0|0|0|0|0|System|0|114688|274432|17|0|0|0|0|8|114688|124|0|0|0|3469312|8908800|311296|5693440|61440$$(null)|4|(null)|8300717|0|0|0|42|280|0|0|0|0|0|0|0|0|smss|0|782336|884736|110|0|0|1864|10664|11|782336|3|0|0|0|5701632|19357696|1388544|1417216|700416$$(null)|372|(null)|8300715|0|0|0|1438|380|0|0|0|0|0|0|0|0|csrss|0|3624960|3747840|0|0|0|15008|157544|13|3624960|10|0|0|0|54886400|55345152|5586944|5648384|2838528$$(null)|424|(null)|8300714|0|0|0|71|432|0|0|0|0|0|0|0|0|csrss#1|0|8605696|8728576|0|0|0|8720|96384|13|8605696|9|0|0|0|50515968|50909184|7438336|9342976|4972544
now I want to find data who's value is PercentProcessorTime, ElapsedTime,.. so for this I first split above string $$ and then again split string using | and this new split string I searched string where PercentProcessorTime' presents and get Index of that string when I get string then skipped first two arrays which split from$$and get data ofPercentProcessorTime` using index , it's looks like complicated but I think following code should helps
// First split string as below
val processData = winProcessData.split("\\$\\$")
// get index here
val getIndex: Int = processData.find(part => part.contains("PercentProcessorTime"))
.map {
case getData =>
getData
} match {
case Some(s) => s.split("\\|").indexOf("PercentProcessorTime")
case None => -1
}
val getIndexOfElapsedTime: Int = processData.find(part => part.contains("ElapsedTime"))
.map {
case getData =>
getData
} match {
case Some(s) => s.split("\\|").indexOf("ElapsedTime")
case None => -1
}
// now fetch data of above index as below
for (i <- 2 to (processData.length - 1)) {
val getValues = processData(i).split("\\|")
val getPercentProcessTime = getValues(getIndex).toFloat
val getElapsedTime = getValues(getIndexOfElapsedTime).toFloat
Logger.info("("+getPercentProcessTime+","+getElapsedTime+"),")
}
Now Problem is that using above code I was getting data of given key in index, so my output was (8300717,100),(8300717,17)(8300717,110)... Now I want sort this data using getPercentProcessTime so my output should be (8300717,110),(8300717,100)(8300717,17)...
and that data should be in lists so I will pass list to case class.

Are you find PercentProcessorTime or PercentPrivilegedTime ?
Here it is
val str = "your very long string"
val heads = Seq("PercentPrivilegedTime", "ElapsedTime")
val Array(elap, perc) = str.split("\\$\\$").tail.map(_.split("\\|"))
.transpose.filter(x => heads.contains(x.head))
//elap: Array[String] = Array(ElapsedTime, 8300717, 8300717, 8300717, 8300715, 8300714)
//perc: Array[String] = Array(PercentPrivilegedTime, 100, 17, 110, 0, 0)
val res = (elap.tail, perc.tail).zipped.toList.sortBy(-_._2.toInt)
//List[(String, String)] = List((8300717,110), (8300717,100), (8300717,17), (8300715,0), (8300714,0))

Related

Print characters at even and odd indices from a String

Using scala, how to print string in even and odd indices of a given string? I am aware of the imperative approach using var. I am looking for an approach that uses immutability, avoids side-effects (of course, until need to print result) and concise.
Here is a tail-recursive solution returning even and odd chars (List[Char], List[Char]) in one go
def f(in: String): (List[Char], List[Char]) = {
#tailrec def run(s: String, idx: Int, accEven: List[Char], accOdd: List[Char]): (List[Char], List[Char]) = {
if (idx < 0) (accEven, accOdd)
else if (idx % 2 == 0) run(s, idx - 1, s.charAt(idx) :: accEven, accOdd)
else run(s, idx - 1, accEven, s.charAt(idx) :: accOdd)
}
run(in, in.length - 1, Nil, Nil)
}
which could be printed like so
val (even, odd) = f("abcdefg")
println(even.mkString)
Another way to explore is using zipWithIndex
def printer(evenOdd: Int) {
val str = "1234"
str.zipWithIndex.foreach { i =>
i._2 % 2 match {
case x if x == evenOdd => print(i._1)
case _ =>
}
}
}
In this case you can check the results by using the printer function
scala> printer(1)
24
scala> printer(0)
13
.zipWithIndex takes a List and returns tuples of the elements coupled with their index. Knowing that a String is a list of Char
Looking at str
scala> val str = "1234"
str: String = 1234
str.zipWithIndex
res: scala.collection.immutable.IndexedSeq[(Char, Int)] = Vector((1,0), (2,1), (3,2), (4,3))
Lastly, as you only need to print, using foreach instead of map is more ideal as you aren't expecting values to be returned
You can use the sliding function, which is quite simple:
scala> "abcdefgh".sliding(1,2).mkString("")
res16: String = aceg
scala> "abcdefgh".tail.sliding(1,2).mkString("")
res17: String = bdfh
val s = "abcd"
// ac
(0 until s.length by 2).map(i => s(i))
// bd
(1 until s.length by 2).map(i => s(i))
just pure functions with map operator

How reverse words in string and keep punctuation marks and upper case symbol

private def reverseHelper(word: String): String = {
var result = new StringBuilder(word)
if (word.head.isUpper) {
result.setCharAt(0, word.head.toLower)
result.setCharAt(word.length - 1, word.last.toUpper)
}
result.reverse.result()
}
val formatString = str
.split("[.,!?: ]+")
.map(result => str.replaceFirst(result, reverseHelper(result)))
.foreach(println)
Example:
Input: What is a sentence?
Ouput: Tahw si a ecnetnes?
but i have Array[String]: Tahw is a sentence?, What si a sentence?, What is a sentence?, What is a ecnetnes?
How i can write this in right format?
Restoring the original capitalization is a bit tricky.
def reverser(s:Seq[Char], idx:Int = 0) :String = {
val strt = s.indexWhere(_.isLetter, idx)
if (strt < 0) s.mkString
else {
val end = s.indexWhere(!_.isLetter, strt)
val len = end - strt
val rev = Range(0,len).map{ x =>
if (s(strt+x).isUpper) s(end-1-x).toUpper
else s(end-1-x).toLower
}
reverser(s.patch(strt,rev,len), end)
}
}
testing:
reverser( "What, is A sEntence?")
//res0: String = Tahw, si A eCnetnes?
You can first split your string at a list of special characters and then reverse each individual word and store it in a temporary string. After that traverse the original string and temporary string and replace word matching any special characters with current character in temporary string.
private def reverseHelper(word: String): String = {
var result = new StringBuilder(word)
if (word.head.isUpper) {
result.setCharAt(0, word.head.toLower)
result.setCharAt(word.length - 1, word.last.toUpper)
}
result.reverse.result()
}
val tempStr = str
.split("[.,!?: ]+")
.map(result => reverseHelper(result))
.mkString("")
val sList = "[.,!?: ]+".toList
var curr = 0
val formatString = str.map(c => {
if(!sList.contains(c)) {
curr = curr + 1
tempStr(curr-1)
}
else c
})
Here's one approach that uses a Regex pattern to generate a list of paired strings of Seq(word, nonWord), followed by reversal and positional uppercasing of the word strings:
def reverseWords(s: String): String = {
val pattern = """(\w+)(\W*)""".r
pattern.findAllMatchIn(s).flatMap(_.subgroups).grouped(2).
map{ case Seq(word, nonWord) =>
val caseList = word.map(_.isUpper)
val newWord = (word.reverse zip caseList).map{
case (c, true) => c.toUpper
case (c, false) => c.toLower
}.mkString
newWord + nonWord
}.
mkString
}
reverseWords("He likes McDonald's burgers. I prefer In-and-Out's.")
//res1: String = "Eh sekil DlAnodcm's sregrub. I referp Ni-dna-Tuo's."
A version using split on word boundaries:
def reverseWords(string: String): String = {
def revCap(s: String): String =
s.headOption match {
case Some(c) if c.isUpper =>
(c.toLower +: s.drop(1)).reverse.capitalize
case Some(c) if c.isLower =>
s.reverse
case _ => s
}
string
.split("\\b")
.map(revCap)
.mkString("")
}

Trying to create a Map[String, String] from lines in a text file, keep getting errors [duplicate]

This question already has an answer here:
Most efficient way to create a Scala Map from a file of strings?
(1 answer)
Closed 4 years ago.
Hi so I'm trying to create a Map[String, String] based on a text file, in the textfile there are arbritrary lines that begin with ";;;" that I ignore with the function and the lines that i dont ignore are the key-> values. they are separated by 2 spaces.
whenever i run my code i get an error saying the expected type Map[String,String] isn't the required type, even though my conversions seem correct.
def createMap(filename: String): Map[String,String] = {
for (line <- Source.fromFile(filename).getLines) {
if (line.nonEmpty && !line.startsWith(";;;")) {
val string: String = line.toString
val splits: Array[String] = string.split(" ")
splits.map(arr => arr(0) -> arr(1)).toMap
}
}
}
I expect it to return a (String -> String) map but instead i get a bunch of errors. how would i fix this?
Since your if statement is not an expression in the for-loop. You should use the if as a filter when yielding your results. To return a result, you must make it a for-comprehension. After the for-comprehension filters the results. You can map this structure to a Map.
import scala.io.Source
def createMap(filename: String): Map[String,String] = {
val keyValuePairs = for (line <- Source.fromFile(filename).getLines; if line.nonEmpty && !line.startsWith(";;;")) yield {
val string = line.toString
val splits: Array[String] = string.split(" ")
splits(0) -> splits(1)
}
keyValuePairs.toMap
}
Okay, so I took a second look. It looks like the file has some corrupt encodings. You can try this as a solution. It worked in my Scala REPL:
import java.nio.charset.CodingErrorAction
import scala.io.{Codec, Source}
def createMap(filename: String): Map[String,String] = {
val decoder = Codec.UTF8.decoder.onMalformedInput(CodingErrorAction.IGNORE)
Source.fromFile(filename)(decoder).getLines()
.filter(line => line.nonEmpty && !line.startsWith(";;;"))
.flatMap(line => {
val arr = line.split("\\s+")
arr match {
case Array(key, value) => Some(key -> value)
case Array(key, values#_*) => Some(key -> values.mkString(" "))
case _ => None
}
}).toMap
}

Spark reduce by

I have streaming data coming as follows
id, date, value
i1, 12-01-2016, 10
i2, 12-02-2016, 20
i1, 12-01-2016, 30
i2, 12-05-2016, 40
Want to reduce by id to get aggregate value info by date like
output required from rdd is for a given id and list(days 365)
I have to put the value in the list position based on day of year like 12-01-2016 is 336 and as there are two instances for device i1 with same date they should be aggregated
id, List [0|1|2|3|... |336| 337| |340| |365]
i1, |10+30| - this goes to 336 position
i2, 20 40 -- this goes to 337 and 340 position
Please guide the reduce or group by transformation to do this.
I'll provide you the basic code snippet with few assumptions as you haven't specified about the language, data source or data format.
JavaDStream<String> lineStream = //Your data source for stream
JavaPairDStream<String, Long> firstReduce = lineStream.mapToPair(line -> {
String[] fields = line.split(",");
String idDate = fields[0] + fields[1];
Long value = Long.valueOf(fields[2]);
return new Tuple2<String, Long>(idDate, value);
}).reduceByKey((v1, v2) -> {
return (v1+v2);
});
firstReduce.map(idDateValueTuple -> {
String idDate = idDateValueTuple._1();
Long valueSum = idDateValueTuple._2();
String id = idDate.split(",")[0];
String date = idDate.split(",")[];
//TODO parse date and put the sumValue in array as you wish
}
Can only reach this far. Am not sure how to add each element of an array in the final step. Hope this helps!!!If you get the last step or any alternate way,appreciate if you post it here!!
def getDateDifference(dateStr:String):Int = {
val startDate = "01-01-2016"
val formatter = DateTimeFormatter.ofPattern("MM-dd-yyyy")
val oldDate = LocalDate.parse(startDate, formatter)
val currentDate = dateStr
val newDate = LocalDate.parse(currentDate, formatter)
return newDate.toEpochDay().toInt - oldDate.toEpochDay().toInt
}
def getArray(numberofDays:Int,data:Int):Iterable[Int] = {
val daysArray = new Array[Int](366)
daysArray(numberofDays) = data
return daysArray
}
val idRDD = <read from stream>
val idRDDMap = idRDD.map { rec => ((rec.split(",")(0),rec.split(",")(1)),
(getDateDifference(rec.split(",")(1)),rec.split(",")(2).toInt))}
val idRDDconsiceMap = idRDDMap.map { rec => (rec._1._1,getArray(rec._2._1, rec._2._2)) }
val finalRDD = idRDDconsiceMap.reduceByKey((acc,value)=>(???add each element of the arrays????))

Reading Multiple Inputs from the Same Line the Scala Way

I tried to use readInt() to read two integers from the same line but that is not how it works.
val x = readInt()
val y = readInt()
With an input of 1 727 I get the following exception at runtime:
Exception in thread "main" java.lang.NumberFormatException: For input string: "1 727"
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Integer.parseInt(Integer.java:492)
at java.lang.Integer.parseInt(Integer.java:527)
at scala.collection.immutable.StringLike$class.toInt(StringLike.scala:231)
at scala.collection.immutable.StringOps.toInt(StringOps.scala:31)
at scala.Console$.readInt(Console.scala:356)
at scala.Predef$.readInt(Predef.scala:201)
at Main$$anonfun$main$1.apply$mcVI$sp(Main.scala:11)
at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:75)
at Main$.main(Main.scala:10)
at Main.main(Main.scala)
I got the program to work by using readf but it seems pretty awkward and ugly to me:
val (x,y) = readf2("{0,number} {1,number}")
val a = x.asInstanceOf[Int]
val b = y.asInstanceOf[Int]
println(function(a,b))
Someone suggested that I just use Java's Scanner class, (Scanner.nextInt()) but is there a nice idiomatic way to do it in Scala?
Edit:
My solution following paradigmatic's example:
val Array(a,b) = readLine().split(" ").map(_.toInt)
Followup Question: If there were a mix of types in the String how would you extract it? (Say a word, an int and a percentage as a Double)
If you mean how would you convert val s = "Hello 69 13.5%" into a (String, Int, Double) then the most obvious way is
val tokens = s.split(" ")
(tokens(0).toString,
tokens(1).toInt,
tokens(2).init.toDouble / 100)
// (java.lang.String, Int, Double) = (Hello,69,0.135)
Or as mentioned you could match using a regex:
val R = """(.*) (\d+) (\d*\.?\d*)%""".r
s match {
case R(str, int, dbl) => (str, int.toInt, dbl.toDouble / 100)
}
If you don't actually know what data is going to be in the String, then there probably isn't much reason to convert it from a String to the type it represents, since how can you use something that might be a String and might be in Int? Still, you could do something like this:
val int = """(\d+)""".r
val pct = """(\d*\.?\d*)%""".r
val res = s.split(" ").map {
case int(x) => x.toInt
case pct(x) => x.toDouble / 100
case str => str
} // Array[Any] = Array(Hello, 69, 0.135)
now to do anything useful you'll need to match on your values by type:
res.map {
case x: Int => println("It's an Int!")
case x: Double => println("It's a Double!")
case x: String => println("It's a String!")
case _ => println("It's a Fail!")
}
Or if you wanted to take things a bit further, you could define some extractors which will do the conversion for you:
abstract class StringExtractor[A] {
def conversion(s: String): A
def unapply(s: String): Option[A] = try { Some(conversion(s)) }
catch { case _ => None }
}
val intEx = new StringExtractor[Int] {
def conversion(s: String) = s.toInt
}
val pctEx = new StringExtractor[Double] {
val pct = """(\d*\.?\d*)%""".r
def conversion(s: String) = s match { case pct(x) => x.toDouble / 100 }
}
and use:
"Hello 69 13.5%".split(" ").map {
case intEx(x) => println(x + " is Int: " + x.isInstanceOf[Int])
case pctEx(x) => println(x + " is Double: " + x.isInstanceOf[Double])
case str => println(str)
}
prints
Hello
69 is Int: true
0.135 is Double: true
Of course, you can make the extrators match on anything you want (currency mnemonic, name begging with 'J', URL) and return whatever type you want. You're not limited to matching Strings either, if instead of StringExtractor[A] you make it Extractor[A, B].
You can read the line as a whole, split it using spaces and then convert each element (or the one you want) to ints:
scala> "1 727".split(" ").map( _.toInt )
res1: Array[Int] = Array(1, 727)
For most complex inputs, you can have a look at parser combinators.
The input you are describing is not two Ints but a String which just happens to be two Ints. Hence you need to read the String, split by the space and convert the individual Strings into Ints as suggested by #paradigmatic.
One way would be splitting and mapping:
// Assuming whatever is being read is assigned to "input"
val input = "1 727"
val Array(x, y) = input split " " map (_.toInt)
Or, if you have things a bit more complicated than that, a regular expression is usually good enough.
val twoInts = """^\s*(\d+)\s*(\d+)""".r
val Some((x, y)) = for (twoInts(a, b) <- twoInts findFirstIn input) yield (a, b)
There are other ways to use regex. See the Scala API docs about them.
Anyway, if regex patterns are becoming too complicated, then you should appeal to Scala Parser Combinators. Since you can combine both, you don't loose any of regex's power.
import scala.util.parsing.combinator._
object MyParser extends JavaTokenParsers {
def twoInts = wholeNumber ~ wholeNumber ^^ { case a ~ b => (a.toInt, b.toInt) }
}
val MyParser.Success((x, y), _) = MyParser.parse(MyParser.twoInts, input)
The first example was more simple, but harder to adapt to more complex patterns, and more vulnerable to invalid input.
I find that extractors provide some machinery that makes this type of processing nicer. And I think it works up to a certain point nicely.
object Tokens {
def unapplySeq(line: String): Option[Seq[String]] =
Some(line.split("\\s+").toSeq)
}
class RegexToken[T](pattern: String, convert: (String) => T) {
val pat = pattern.r
def unapply(token: String): Option[T] = token match {
case pat(s) => Some(convert(s))
case _ => None
}
}
object IntInput extends RegexToken[Int]("^([0-9]+)$", _.toInt)
object Word extends RegexToken[String]("^([A-Za-z]+)$", identity)
object Percent extends RegexToken[Double](
"""^([0-9]+\.?[0-9]*)%$""", _.toDouble / 100)
Now how to use:
List("1 727", "uptime 365 99.999%") collect {
case Tokens(IntInput(x), IntInput(y)) => "sum " + (x + y)
case Tokens(Word(w), IntInput(i), Percent(p)) => w + " " + (i * p)
}
// List[java.lang.String] = List(sum 728, uptime 364.99634999999995)
To use for reading lines at the console:
Iterator.continually(readLine("prompt> ")).collect{
case Tokens(IntInput(x), IntInput(y)) => "sum " + (x + y)
case Tokens(Word(w), IntInput(i), Percent(p)) => w + " " + (i * p)
case Tokens(Word("done")) => "done"
}.takeWhile(_ != "done").foreach(println)
// type any input and enter, type "done" and enter to finish
The nice thing about extractors and pattern matching is that you can add case clauses as necessary, you can use Tokens(a, b, _*) to ignore some tokens. I think they combine together nicely (for instance with literals as I did with done).

Resources