Why unable to access the String value ?
I would expect the s1 to be "a" but instead its Ljava.lang.String;#d70d7a ?
val it = Iterator("(a,((a,b),1.0))") //> it : Iterator[String] = non-empty iterator
val s1 = it.next.replace("(" , "").replace(")" , "").split(",").toString.split(",")
//> s1 : Array[String] = Array([Ljava.lang.String;#d70d7a)
println("s1 is "+s1(0)) //> s1 is [Ljava.lang.String;#d70d7a
Let's go command-by-command:
val it = Iterator("(a,((a,b),1.0))")
// here we got iterator on one String
val s1 = it.next // "(a,((a,b),1.0))"
.replace("(" , "") // "a,a,b),1.0))"
.replace(")" , "") // "a,a,b,1.0"
.split(",") // multiple lines in array: "a", "a","b","1.0"
.toString // Array[String].toString returns what you got: Ljava.lang.String;#d70d7a
.split(",") // one String (because there's no "," signs)
Maybe you should run toList before toString, because toString is defined in a way you expect it to be defined in this implementation of List:
val s1 = ...
.split(",")
.toList
.toString
...
Maybe you should look at Java: split() returns [Ljava.lang.String;#186d4c1], why? for clarification.
.split(",") makes Array and .toString doesn't work on Array and after that you are again spliting by .split(",")
which i don't think helpful.
and you can also use replaceAll in place of multiple replace
scala> val it = Iterator("(a,((a,b),1.0))")
it: Iterator[String] = non-empty iterator
scala> val s1 = it.next.replaceAll("[()]" , "").split(",")
s1: Array[String] = Array(a, a, b, 1.0)
scala> println("s1 is "+s1(0))
s1 is a
Related
Is there a base function or simple way to replace multiple strings with multiple strings in a reference String?
I have seen Replace multiple strings with multiple other strings but it is using known lists instead of variable ones.
For example:
I have val str = "THE GOAT IS RED" , and I want to replace all the characters with other characters or digits, something like:
str.replace("THEGOAISRD".toList(), "0123456789".toList())
To which will result
"012 3450 67 829"
val list1 = listOf('a', 'b', 'c')
val list2 = listOf('0', '1', '2')
val str = "abacada"
val transform = list1.withIndex().associate { it.value to list2[it.index] }
val result = str.map { transform[it] ?: it }.joinToString(separator = "")
println(result)
prints 01020d0
You could do that by first building a dictionary (Map<Char, Char>) using zip and then iterating the string to transform with joinToString like that:
val str = "THE GOAT IS RED"
val dictionary = "THEGOAISRD".zip("0123475689").toMap()
val result = str.toCharArray().joinToString("") {
dictionary.getOrDefault(it, it).toString()
}
println(result)
I have various types of strings like the following:
sales_data_type
saledatatypes
sales_data.new.metric1
sales_data.type.other.metric2
sales_data.type3.metric3
I'm trying to parse them to get a substring with a word before and after the last dot. For example: new.metric1, other.metric2, type3.metric3. If a word doesn't contain dots, it has to be returned as is: sales_data_type, saledatatypes.
With a Regex it may be done this way:
val infoStr = "sales_data.type.other.metric2"
val pattern = ".*?([^.]+\\.)?([^.]+)$"
println(infoStr.replaceAll(pattern, "$1$2"))
// prints other.metric2
// for saledatatypes just prints nullsaledatatypes ??? but seems to work
I want to find a way to achieve this with Scala, without using Regex in order to expand my understanding of Scala features. Will be grateful for any ideas.
One-liner:
dataStr.split('.').takeRight(2).mkString(".")
takeRight(2) will take the last 2 if there are 2 to take, else it will take the last, and only, 1. mkString(".") will re-insert the dot only if there are 2 elements for the dot to go between, else it will leave the string unaltered.
Here's one with lots of scala features for you.
val string = "head.middle.last"
val split = string.split('.') // Array(head, middle, last)
val result = split.toSeq match {
case Seq(word) ⇒ word
case _ :+ before :+ after ⇒ s"$before.$after"
}
println(result) // middle.last
First we split the string on your . and get individual parts.
Then we pattern match those parts, first to check if there is only one (in which case we just return it), and second to grab the last two elements in the seq.
Finally we put a . back in between those last two using string interpolation.
One way of doing it:
val value = "sales_data.type.other.metric2"
val elems = value.split("\\.").toList
elems match {
case _:+beforeLast:+last => s"${beforeLast}.${last}"
case _ => throw new NoSuchElementException
}
for(s<-strs) yield {val s1 = s.split('.');
if(s1.size>=2)s1.takeRight(2).mkString(".") else s }
or
for(s<-strs) yield { val s1 = s.split('.');
if(s1.size>=2)s1.init.last+'.'+s1.last else s }
In Scala REPL:
scala> val strs =
Vector("sales_data_type","saledatatypes","sales_data.new.metric1","sales_data.type.other.metric2","sales_d
ata.type3.metric3")
strs: scala.collection.immutable.Vector[String] = Vector(sales_data_type, saledatatypes, sales_data.new.metric1, sales_data.
type.other.metric2, sales_data.type3.metric3)
scala> for(s<-strs) yield { val s1 = s.split('.');if(s1.size>=2)s1.takeRight(2).mkString(".") else s }
res62: scala.collection.immutable.Vector[String] = Vector(sales_data_type, saledatatypes, new.metric1, other.metric2, type3.
metric3)
scala> for(s<-strs) yield { val s1 = s.split('.');if(s1.size>=2)s1.init.last+'.'+s1.last else s }
res60: scala.collection.immutable.Vector[String] = Vector(sales_data_type, saledatatypes, new.metric1, other.metric2, type3.
metric3)
Use scala match and do like this
def getFormattedStr(str:String):String={
str.contains(".") match{
case true=>{
val arr=str.split("\\.")
val len=arr.length
len match{
case 1=>str
case _=>arr(len-2)+"."+arr(len-1)
}
}
case _=>str
}
}
Is there any way that I can evaluate my Column expression if I am only using Literal (no dataframe columns).
For example, something like:
val result: Int = someFunction(lit(3) * lit(5))
//result: Int = 15
or
import org.apache.spark.sql.function.sha1
val result: String = someFunction(sha1(lit("5")))
//result: String = ac3478d69a3c81fa62e60f5c3696165a4e5e6ac4
I am able to evaluate using a dataframes
val result = Seq(1).toDF.select(sha1(lit("5"))).as[String].first
//result: String = ac3478d69a3c81fa62e60f5c3696165a4e5e6ac4
But is there any way to get the same results without using dataframe?
To evaluate a literal column you can convert it to an Expression and eval without providing input row:
scala> sha1(lit("1").cast("binary")).expr.eval()
res1: Any = 356a192b7913b04c54574d18c28d46e6395428ab
As long as the function is an UserDefinedFunction it will work the same way:
scala> val f = udf((x: Int) => x)
f: org.apache.spark.sql.expressions.UserDefinedFunction = UserDefinedFunction(<function1>,IntegerType,Some(List(IntegerType)))
scala> f(lit(3) * lit(5)).expr.eval()
res3: Any = 15
The following code can help:
val isUuid = udf((uuid: String) => uuid.matches("[a-fA-F0-9]{8}-[a-fA-F0-9]{4}-[a-fA-F0-9]{4}-[a-fA-F0-9]{4}-[a-fA-F0-9]{12}"))
df.withColumn("myCol_is_uuid",isUuid(col("myCol")))
.filter("myCol_is_uuid = true")
.show(10, false)
In Scala, I want to split a string at a specific character like so:
scala> val s = "abba.aadd"
s: String = abba.aadd
scala> val (beforeDot,afterDot) = (s takeWhile (_!='.'), s dropWhile (_!='.'))
beforeDot: String = abba
afterDot: String = .aadd
This solution is slightly inefficient (maybe not asymptotically), but I have the feeling something like this might exist in the standard library already. Any ideas?
There is a span method:
scala> val (beforeDot, afterDot) = s.span{ _ != '.' }
beforeDot: String = abba
afterDot: String = .aadd
From the Scala documentation:
c span p is equivalent to (but possibly more efficient than) (c takeWhile p, c dropWhile p), provided the evaluation of the predicate p does not cause any side-effects.
You can use splitAt for what you want:
val s = "abba.aadd"
val (before, after) = s.splitAt(s.indexOf('.'))
Output:
before: String = abba
after: String = .aadd
I'm fairly new to Scala, but I'm doing my exercises now.
I have a string like "A>Augsburg;B>Berlin". What I want at the end is a map
val mymap = Map("A"->"Augsburg", "B"->"Berlin")
What I did is:
val st = locations.split(";").map(dynamicListExtract _)
with the function
private def dynamicListExtract(input: String) = {
if (input contains ">") {
val split = input split ">"
Some(split(0), split(1)) // return key , value
} else {
None
}
}
Now I have an Array[Option[(String, String)
How do I elegantly convert this into a Map[String, String]
Can anybody help?
Thanks
Just change your map call to flatMap:
scala> sPairs.split(";").flatMap(dynamicListExtract _)
res1: Array[(java.lang.String, java.lang.String)] = Array((A,Augsburg), (B,Berlin))
scala> Map(sPairs.split(";").flatMap(dynamicListExtract _): _*)
res2: scala.collection.immutable.Map[java.lang.String,java.lang.String] = Map((A,Augsburg), (B,Berlin))
For comparison:
scala> Map("A" -> "Augsburg", "B" -> "Berlin")
res3: scala.collection.immutable.Map[java.lang.String,java.lang.String] = Map((A,Augsburg), (B,Berlin))
In 2.8, you can do this:
val locations = "A>Augsburg;B>Berlin"
val result = locations.split(";").map(_ split ">") collect { case Array(k, v) => (k, v) } toMap
collect is like map but also filters values that aren't defined in the partial function. toMap will create a Map from a Traversable as long as it's a Traversable[(K, V)].
It's also worth seeing Randall's solution in for-comprehension form, which might be clearer, or at least give you a better idea of what flatMap is doing.
Map.empty ++ (for(possiblePair<-sPairs.split(";"); pair<-dynamicListExtract(possiblePair)) yield pair)
A simple solution (not handling error cases):
val str = "A>Aus;B>Ber"
var map = Map[String,String]()
str.split(";").map(_.split(">")).foreach(a=>map += a(0) -> a(1))
but Ben Lings' is better.
val str= "A>Augsburg;B>Berlin"
Map(str.split(";").map(_ split ">").map(s => (s(0),s(1))):_*)
--or--
str.split(";").map(_ split ">").foldLeft(Map[String,String]())((m,s) => m + (s(0) -> s(1)))