Count how many times a string repeats itself SCALA - string

Hi I´m trying to get the number of times an artist name repeats in some years for this I have this
var artists=Array.ofDim[String](994,2)//artist,year
var artists2=Array.ofDim[String](250)//artist name
var artists3 = Array.ofDim[Int](250)//number of times
And the user has to enter ano1 and ano2 that are the years margin we want
val loop = new Breaks;
for(i <- 0 to 993){//copiamos
loop.breakable{
for(j<- 0 to 249){
if(artists2(j).contentEquals("NULL") && artists(i)(1).toInt>=ano1 && artists(i)(1).toInt<=ano2){
artists2(j)=artists(i)(0)
artists3(j)= 1
loop.break;
}else if(artists(i)(0).contentEquals(artists2(j)) && artists(i)(1).toInt>=ano1 && artists(i)(1).toInt<=ano2){
artists3(j)= artists3(j)+1
loop.break;
}
}
}
}
println(artists2.mkString("\n"))
println(artists3.mkString(","))
For some reason my if doesnt work or j add itself 1 after entering in the if because every time is creating a new element in artists2 instead of adding it to artists3
The output I get is artists3 filled with 1 because for some reason it never checks the other part of the if

I'm sorry if this sounds a bit harsh, but there are so many things wrong with your code it's hard to know where to begin.
The main problems are 1) your code isn't very scala-like, and 2) you're using data structures and variable names designed to make things as difficult as possible to understand.
Here's a brief attempt to redesign things. It may not meet all your requirements but perhaps it will start you in a better direction.
val mockdata = List( ("Tom", 2001)
, ("Sue", 2002)
, ("Joe", 2002)
, ("Sue", 2005)
, ("Sue", 2004)
, ("Jil", 2001)
, ("Tom", 2005)
, ("Sue", 2002)
, ("Jil", 2012)
)
def countArtists( dataSet: List[(String,Int)]
, anoStart: Int , anoEnd: Int): Map[String,Int] = {
val artists = for {
(artist, year) <- dataSet
if year >= anoStart && year <= anoEnd
} yield artist
artists.distinct.map(name => name -> artists.count(_ == name)).toMap
}
val count2002to2011 = countArtists(mockdata, 2002, 2011)
At this point you can use the result to get interesting information.
scala> count2002to2011.keys // all artists within the time period
res0: Iterable[String] = Set(Sue, Joe, Tom)
scala> count2002to2011.values.sum // total count within the time period
res1: Int = 6
scala> count2002to2011("Sue") // count for just this artist
res2: Int = 4

your implementation seems a little bit confused, if i understood what you needed, i would like if i may, share how i would have implemented your research, in a more functionnal/scala and clear way.
case class Artist(name: String, year: Int)
case class ArtistNames(name: String)
case class ArtistResult(name: String, nbOccurence: Int)
val anno1 = 1950
val anno2 = 1960
def checkArtist(artists: Seq[Artist], artistNames: Seq[ArtistNames]): Seq[ArtistResult] = {
artists.map{ artist =>
def countOccurence(artist: Artist, artistNames: Seq[ArtistNames], occurence: Int): Int = {
artistNames match {
case Nil => occurence
case head :: tail =>
if (head.name == artist.name && artist.year >= anno1 && artist.year <= anno2) countOccurence(artist, tail, occurence + 1)
else countOccurence(artist, tail, occurence)
}
}
val occurrence = countOccurence(artist, artistNames, 0)
ArtistResult(artist.name, occurrence)
}
}
val artistResultList: Seq[ArtistResult] = checkArtist(/* insert your data here */)
i hope i answered a little bit your question.

Related

Kotlin conditional formatting string

I have three variables :
val months: Long
val days: Long
val hours: Long
and I want to return something like this :
3 months, 2 days and 5 hours
Now this would simply translate to as :
val str = "$months months, $days days and $hours hours"
And if my months had to be 0 and days as 1 and hours as 0 then it will come like '0 months, 1 days and 0 hours'
But what I am looking for is "1 days" instead. How can I get it done?
I can definitely use some sort of conditional StringBuilder to get this done, but is there something better and elegant?
How does this look to you?
fun formatTime(months: Long, days: Long, hours: Long): String =
listOf(
months to "months",
days to "days",
hours to "hours"
)
.filter { (length,_) -> length > 0L }
// You can add an optional map predicate to make singular and plurals
.map { (amount, label) -> amount to if(abs(amount)==1L) label.replace("s", "") else label }
.joinToString(separator=", ") { (length, label) -> "$length $label" }
.addressLastItem()
fun String.addressLastItem() =
if(this.count { it == ','} >= 1)
// Dirty hack to get it working quickly
this.reversed().replaceFirst(" ,", " dna ").reversed()
else
this
You can see it working over here
Another variant without replacing, counting or reversing the list:
fun formatTime(months: Long, days: Long, hours: Long): String {
val list = listOfNotNull(
months.formatOrNull("month", "months"),
days.formatOrNull("day", "days"),
hours.formatOrNull("hour", "hours"),
)
return if (list.isEmpty()) "all values <= 0"
else
listOfNotNull(
list.take(list.lastIndex).joinToString().takeIf(String::isNotEmpty),
list.lastOrNull()
).joinToString(" and ")
}
fun Long.formatOrNull(singular: String, plural: String = "${singular}s") = when {
this == 1L -> "$this $singular"
this > 1L -> "$this $plural"
else -> null
}
It also has a fallback if all values are <= 0... you could also just use an empty string or whatever you prefer.
If you do not like that there are intermediate lists created to concatenate the string, you may also just use something as follows instead in the else path:
list.iterator().run {
buildString {
while (hasNext()) {
val part = next()
if (length > 0)
if (hasNext())
append(", ")
else
append(" and ")
append(part)
}
}
}

Print characters at even and odd indices from a String

Using scala, how to print string in even and odd indices of a given string? I am aware of the imperative approach using var. I am looking for an approach that uses immutability, avoids side-effects (of course, until need to print result) and concise.
Here is a tail-recursive solution returning even and odd chars (List[Char], List[Char]) in one go
def f(in: String): (List[Char], List[Char]) = {
#tailrec def run(s: String, idx: Int, accEven: List[Char], accOdd: List[Char]): (List[Char], List[Char]) = {
if (idx < 0) (accEven, accOdd)
else if (idx % 2 == 0) run(s, idx - 1, s.charAt(idx) :: accEven, accOdd)
else run(s, idx - 1, accEven, s.charAt(idx) :: accOdd)
}
run(in, in.length - 1, Nil, Nil)
}
which could be printed like so
val (even, odd) = f("abcdefg")
println(even.mkString)
Another way to explore is using zipWithIndex
def printer(evenOdd: Int) {
val str = "1234"
str.zipWithIndex.foreach { i =>
i._2 % 2 match {
case x if x == evenOdd => print(i._1)
case _ =>
}
}
}
In this case you can check the results by using the printer function
scala> printer(1)
24
scala> printer(0)
13
.zipWithIndex takes a List and returns tuples of the elements coupled with their index. Knowing that a String is a list of Char
Looking at str
scala> val str = "1234"
str: String = 1234
str.zipWithIndex
res: scala.collection.immutable.IndexedSeq[(Char, Int)] = Vector((1,0), (2,1), (3,2), (4,3))
Lastly, as you only need to print, using foreach instead of map is more ideal as you aren't expecting values to be returned
You can use the sliding function, which is quite simple:
scala> "abcdefgh".sliding(1,2).mkString("")
res16: String = aceg
scala> "abcdefgh".tail.sliding(1,2).mkString("")
res17: String = bdfh
val s = "abcd"
// ac
(0 until s.length by 2).map(i => s(i))
// bd
(1 until s.length by 2).map(i => s(i))
just pure functions with map operator

Scala: convert each digit in a string to an integer

I want to convert each digit in a number to an int. Here is my code
for (in <- lines) {
for (c <- in) {
val ci = c.toInt
if (ci == 0) {
// do stuff
}
}
}
The result I get is the ascii code, i.e. a 1 gives 49. I'm looking for the value 1.
The answer is trivial, I know. I'm trying to pull myself up with my own bootstraps until my Scala course begins in two weeks. Any assistance gratefully accepted.
One possible solution is:
for(in <- lines) {
in.toString.map(_.asDigit).foreach { i =>
if(i == 1) {
//do stuff
}
}
}
And more compact w/ output:
lines.foreach(in => in.toString.map(_.asDigit).filter(_ == 1).foreach(i => println(s"found $i in $in.")))
If lines is already a collection of Strings, omit the .toString on in.toString.
You can have this:
val number = 123456
//convert Int to String and do transformation for each character to Digit(Int)
val digitsAsList = number.toString.map(_.asDigit)
This will result to digitizing the number. Then with that Collection, you can do anything from filtering, mapping, zipping: you can checkout the the List api on this page: http://www.scala-lang.org/api/2.11.8/#scala.collection.immutable.List
Hope that's help.

Accessing rows outside of window while aggregating in Spark dataframe

In short, in the example below I want to pin 'b to be the value in the row that the result will appear in.
Given:
a,b
1,2
4,6
3,7 ==> 'special would be: (1-7 + 4-7 + 3-7) == -13 in this row
val baseWin = Window.partitionBy("something_I_forgot").orderBy("whatever")
val sumWin = baseWin.rowsBetween(-2, 0)
frame.withColumn("special",sum( 'a - 'b ).over(win) )
Or another way to think of it is I want to close over the row when I calculate the sum so that I can pass in the value of 'b (in this case 7)
* Update *
Here is what I want to accomplish as an UDF. In short, I used a foldLeft.
def mad(field : Column, numPeriods : Integer) : Column = {
val baseWin = Window.partitionBy("exchange","symbol").orderBy("datetime")
val win = baseWin.rowsBetween(numPeriods + 1, 0)
val subFunc: (Seq[Double],Int) => Double = { (input: Seq[Double], numPeriods : Int) => {
val agg = grizzled.math.stats.mean(input: _*)
val fooBar = (1.0 / -numPeriods)*input.foldLeft(0.0)( (a,b) => a + Math.abs((b-agg)) )
fooBar
} }
val myUdf = udf( subFunc )
myUdf(collect_list(field.cast(DoubleType)).over(win),lit(numPeriods))
}
If I understood correctly what you're trying to do, I think you can refactor your logic a bit to achieve it. The way you have it right now, you're probably getting "-7" instead of -13.
For the "special" column, (1-7 + 4-7 + 3-7), you can calculate it like (sum(a) - count(*) * b):
dfA.withColumn("special",sum('a).over(win) - count("*").over(win) * 'b)

Performance difference in toString.map and toString.toArray.map

While coding Euler problems, I ran across what I think is bizarre:
The method toString.map is slower than toString.toArray.map.
Here's an example:
def main(args: Array[String])
{
def toDigit(num : Int) = num.toString.map(_ - 48) //2137 ms
def toDigitFast(num : Int) = num.toString.toArray.map(_ - 48) //592 ms
val startTime = System.currentTimeMillis;
(1 to 1200000).map(toDigit)
println(System.currentTimeMillis - startTime)
}
Shouldn't the method map on String fallback to a map over the array? Why is there such a noticeable difference? (Note that increasing the number even causes an stack overflow on the non-array case).
Original
Could be because toString.map uses the WrappedString implicit, while toString.toArray.map uses the WrappedArray implicit to resolve map.
Let's see map, as defined in TraversableLike:
def map[B, That](f: A => B)(implicit bf: CanBuildFrom[Repr, B, That]): That = {
val b = bf(repr)
b.sizeHint(this)
for (x <- this) b += f(x)
b.result
}
WrappedString uses a StringBuilder as builder:
def +=(x: Char): this.type = { append(x); this }
def append(x: Any): StringBuilder = {
underlying append String.valueOf(x)
this
}
The String.valueOf call for Any uses Java Object.toString on the Char instances, possibly getting boxed first. These extra ops might be the cause of speed difference, versus the supposedly shorter code paths of the Array builder.
This is a guess though, would have to measure.
Edit
After revising, the general point still stands, but the I referred the wrong implicits, since the toDigit methods return an Int sequence (or like), not a translated string as I misread.
toDigit uses LowPriorityImplicits.fallbackStringCanBuildFrom[T]: CanBuildFrom[String, T, immutable.IndexedSeq[T]], with T = Int, which just defers to a general IndexedSeq builder.
toDigitFast uses a direct Array implicit of type CanBuildFrom[Array[_], T, Array[T]], which is unarguably faster.
Passing the following CBF for toDigit explicitly makes the two methods on par:
object FastStringToArrayBuild {
def canBuildFrom[T : ClassManifest] = new CanBuildFrom[String, T, Array[T]] {
private def newBuilder = scala.collection.mutable.ArrayBuilder.make()
def apply(from: String) = newBuilder
def apply() = newBuilder
}
}
You're being fooled by running out of memory. The toDigit version does create more intermediate objects, but if you have plenty of memory then the GC won't be heavily impacted (and it'll all run faster). For example, if instead of creating 1.2 million numbers, I create 12k 100x in a row, I get approximately equal times for the two methods. If I create 1.2k 5-digit numbers 1000x in a row, I find that toDigit is about 5% faster.
Given that the toDigit method produces an immutable collection, which is better when all else is equal since it is easier to reason about, and given that all else is equal for all but highly demanding tasks, I think the library is as it should be.
When trying to improve performance, of course one needs to keep all sorts of tricks in mind; one of these is that arrays have better memory characteristics for collections of known length than do the fancy collections in the Scala library. Also, one needs to know that map isn't the fastest way to get things done; if you really wanted this to be fast you should
final def toDigitReallyFast(num: Int, accum: Long = 0L, iter: Int = 0): Array[Byte] = {
if (num==0) {
val ans = new Array[Byte](math.max(1,iter))
var i = 0
var ac = accum
while (i < ans.length) {
ans(ans.length-i-1) = (ac & 0xF).toByte
ac >>= 4
i += 1
}
ans
}
else {
val next = num/10
toDigitReallyFast(next, (accum << 4) | (num-10*next), iter+1)
}
}
which on my machine is at 4x faster than either of the others. And you can get almost 3x faster yet again if you leave everything in a Long and pack the results in an array instead of using 1 to N:
final def toDigitExtremelyFast(num: Int, accum: Long = 0L, iter: Int = 0): Long = {
if (num==0) accum | (iter.toLong << 48)
else {
val next = num/10
toDigitExtremelyFast(next, accum | ((num-10*next).toLong<<(4*iter)), iter+1)
}
}
// loop, instead of 1 to N map, for the 1.2k number case
{
var i = 10000
val a = new Array[Long](1201)
while (i<=11200) {
a(i-10000) = toDigitReallyReallyFast(i)
i += 1
}
a
}
As with many things, performance tuning is highly dependent on exactly what you want to do. In contrast, library design has to balance many different concerns. I do think it's worth noticing where the library is sub-optimal with respect to performance, but this isn't really one of those cases IMO; the flexibility is worth it for the common use cases.

Resources