Scala runtime string interpolation/formatting

Scala runtime string interpolation/formatting - string

Are there any standard library facilities to do string interpolation/formatting at runtime? I'd like the formatting to behave exactly the same as the macro based s"scala ${implementation} except that my string format is loaded at runtime from a config file.
val format = config.getString("my.key")
val stringContext = parseFormat(format)
val res = stringContext.f("arg1", "arg2")
with parseFormat returning a StringContext.
I imagine, worst case, I could just split the string on "{}" sequences and use the parts to construct the StringContext.
// untested
def parseFormat(format: String): StringContext =
new StringContext("""{}""".r.split(format): _*)
Is there an obvious solution that I'm missing or would the above hack do the trick?

There are no silly questions. Only Sunday mornings.
First, don't use String.format.
scala> val s = "Count to %d"
s: String = Count to %d
scala> String format (s, 42)
<console>:9: error: overloaded method value format with alternatives:
(x$1: java.util.Locale,x$2: String,x$3: Object*)String <and>
(x$1: String,x$2: Object*)String
cannot be applied to (String, Int)
String format (s, 42)
^
scala> s format 42
res1: String = Count to 42
But formatting can be expensive. So with your choice of escape handling:
scala> StringContext("Hello, {}. Today is {}." split "\\{}" : _*).s("Bob", "Tuesday")
res2: String = Hello, Bob. Today is Tuesday.
scala> StringContext("""Hello, \"{}.\" Today is {}.""" split "\\{}" : _*).s("Bob", "Tuesday")
res3: String = Hello, "Bob." Today is Tuesday.
scala> StringContext("""Hello, \"{}.\" Today is {}.""" split "\\{}" : _*).raw("Bob", "Tuesday")
res4: String = Hello, \"Bob.\" Today is Tuesday.
It turns out that split doesn't quite hack it.
scala> StringContext("Count to {}" split "\\{}" : _*) s 42
java.lang.IllegalArgumentException: wrong number of arguments (1) for interpolated string with 1 parts
at scala.StringContext.checkLengths(StringContext.scala:65)
at scala.StringContext.standardInterpolator(StringContext.scala:121)
at scala.StringContext.s(StringContext.scala:94)
... 33 elided
So given
scala> val r = "\\{}".r
r: scala.util.matching.Regex = \{}
scala> def parts(s: String) = r split s
parts: (s: String)Array[String]
Maybe
scala> def f(parts: Seq[String], args: Any*) = (parts zip args map (p => p._1 + p._2)).mkString
f: (parts: Seq[String], args: Any*)String
So
scala> val count = parts("Count to {}")
count: Array[String] = Array("Count to ")
scala> f(count, 42)
res7: String = Count to 42
scala> f(parts("Hello, {}. Today is {}."), "Bob", "Tuesday")
res8: String = Hello, Bob. Today is Tuesday
Hey, wait!
scala> def f(parts: Seq[String], args: Any*) = (parts.zipAll(args, "", "") map (p => p._1 + p._2)).mkString
f: (parts: Seq[String], args: Any*)String
scala> f(parts("Hello, {}. Today is {}."), "Bob", "Tuesday")
res9: String = Hello, Bob. Today is Tuesday.
or
scala> def f(parts: Seq[String], args: Any*) = (for (i <- 0 until (parts.size max args.size)) yield (parts.applyOrElse(i, (_: Int) => "") + args.applyOrElse(i, (_: Int) => ""))).mkString
f: (parts: Seq[String], args: Any*)String
or
scala> def f(parts: Seq[String], args: Any*) = { val sb = new StringBuilder ; for (i <- 0 until (parts.size max args.size) ; ss <- List(parts, args)) { sb append ss.applyOrElse(i, (_: Int) => "") } ; sb.toString }
f: (parts: Seq[String], args: Any*)String
scala> f(parts("Hello, {}. Today is {}. {}"), "Bob", "Tuesday", "Bye!")
res16: String = Hello, Bob. Today is Tuesday. Bye!

A. As of Scala 2.10.3, you can't use StringContext.f unless you know the number of arguments at compile time since the .f method is a macro.
B. Use String.format, just like you would in the good ol' days of Java.

I had a similar requirement where I was loading a Seq[String] from a config file which would become a command to be executed (using scala.sys.process). To simplify the format and ignore any potential escaping problems I also made the variable names a configurable option too.
The config looked something like this:
command = ["""C:\Program Files (x86)\PuTTY\pscp.exe""", "-P", "2222", "-i",
".vagrant/machines/default/virtualbox/private_key", "$source", "~/$target"]
source = "$source"
target = "$target"
I couldn't find a nice (or efficient) way of using the StringContext or "string".format so I rolled my own VariableCommand which is quite similar to StringContext however a single variable can appear zero or more times in any order and in any of the items.
The basic idea was to create a function which took the variable values and then would either take part of the string (e.g. "~/") or take the variable value (e.g. "test.conf") repeatedly to build up the result (e.g. "~/test.conf"). This function is created once which is where all the complexity is and then at substitution time it is really simple (and hopefully fast although I haven't done any performance testing, or much testing at all for that matter).
For those that might wonder why I was doing this it was for running automation tests cross platform using ansible (which doesn't support Windows control machines) for provisioning. This allowed me to copy the files to the target machine and run ansible locally.

Related

scala - string parsing without Regex

I have various types of strings like the following:
sales_data_type
saledatatypes
sales_data.new.metric1
sales_data.type.other.metric2
sales_data.type3.metric3
I'm trying to parse them to get a substring with a word before and after the last dot. For example: new.metric1, other.metric2, type3.metric3. If a word doesn't contain dots, it has to be returned as is: sales_data_type, saledatatypes.
With a Regex it may be done this way:
val infoStr = "sales_data.type.other.metric2"
val pattern = ".*?([^.]+\\.)?([^.]+)$"
println(infoStr.replaceAll(pattern, "$1$2"))
// prints other.metric2
// for saledatatypes just prints nullsaledatatypes ??? but seems to work
I want to find a way to achieve this with Scala, without using Regex in order to expand my understanding of Scala features. Will be grateful for any ideas.

One-liner:
dataStr.split('.').takeRight(2).mkString(".")
takeRight(2) will take the last 2 if there are 2 to take, else it will take the last, and only, 1. mkString(".") will re-insert the dot only if there are 2 elements for the dot to go between, else it will leave the string unaltered.

Here's one with lots of scala features for you.
val string = "head.middle.last"
val split = string.split('.') // Array(head, middle, last)
val result = split.toSeq match {
case Seq(word) ⇒ word
case _ :+ before :+ after ⇒ s"$before.$after"
}
println(result) // middle.last
First we split the string on your . and get individual parts.
Then we pattern match those parts, first to check if there is only one (in which case we just return it), and second to grab the last two elements in the seq.
Finally we put a . back in between those last two using string interpolation.

One way of doing it:
val value = "sales_data.type.other.metric2"
val elems = value.split("\\.").toList
elems match {
case _:+beforeLast:+last => s"${beforeLast}.${last}"
case _ => throw new NoSuchElementException
}

for(s<-strs) yield {val s1 = s.split('.');
if(s1.size>=2)s1.takeRight(2).mkString(".") else s }
or
for(s<-strs) yield { val s1 = s.split('.');
if(s1.size>=2)s1.init.last+'.'+s1.last else s }
In Scala REPL:
scala> val strs =
Vector("sales_data_type","saledatatypes","sales_data.new.metric1","sales_data.type.other.metric2","sales_d
ata.type3.metric3")
strs: scala.collection.immutable.Vector[String] = Vector(sales_data_type, saledatatypes, sales_data.new.metric1, sales_data.
type.other.metric2, sales_data.type3.metric3)
scala> for(s<-strs) yield { val s1 = s.split('.');if(s1.size>=2)s1.takeRight(2).mkString(".") else s }
res62: scala.collection.immutable.Vector[String] = Vector(sales_data_type, saledatatypes, new.metric1, other.metric2, type3.
metric3)
scala> for(s<-strs) yield { val s1 = s.split('.');if(s1.size>=2)s1.init.last+'.'+s1.last else s }
res60: scala.collection.immutable.Vector[String] = Vector(sales_data_type, saledatatypes, new.metric1, other.metric2, type3.
metric3)

Use scala match and do like this
def getFormattedStr(str:String):String={
str.contains(".") match{
case true=>{
val arr=str.split("\\.")
val len=arr.length
len match{
case 1=>str
case _=>arr(len-2)+"."+arr(len-1)
}
}
case _=>str
}
}

How to execute Column expression in spark without dataframe

Is there any way that I can evaluate my Column expression if I am only using Literal (no dataframe columns).
For example, something like:
val result: Int = someFunction(lit(3) * lit(5))
//result: Int = 15
or
import org.apache.spark.sql.function.sha1
val result: String = someFunction(sha1(lit("5")))
//result: String = ac3478d69a3c81fa62e60f5c3696165a4e5e6ac4
I am able to evaluate using a dataframes
val result = Seq(1).toDF.select(sha1(lit("5"))).as[String].first
//result: String = ac3478d69a3c81fa62e60f5c3696165a4e5e6ac4
But is there any way to get the same results without using dataframe?

To evaluate a literal column you can convert it to an Expression and eval without providing input row:
scala> sha1(lit("1").cast("binary")).expr.eval()
res1: Any = 356a192b7913b04c54574d18c28d46e6395428ab
As long as the function is an UserDefinedFunction it will work the same way:
scala> val f = udf((x: Int) => x)
f: org.apache.spark.sql.expressions.UserDefinedFunction = UserDefinedFunction(<function1>,IntegerType,Some(List(IntegerType)))
scala> f(lit(3) * lit(5)).expr.eval()
res3: Any = 15

The following code can help:
val isUuid = udf((uuid: String) => uuid.matches("[a-fA-F0-9]{8}-[a-fA-F0-9]{4}-[a-fA-F0-9]{4}-[a-fA-F0-9]{4}-[a-fA-F0-9]{12}"))
df.withColumn("myCol_is_uuid",isUuid(col("myCol")))
.filter("myCol_is_uuid = true")
.show(10, false)

Scala-String filter operation from a new learner

There are some Strings like:
"A,C,D" "A,C" "A,B" "B,C" "D,F" "G,D,H"
If I want to filter those Strings by the key: A,C. That means, if the String contains A or C, I will take it. For example, through this rule, I will get:
"A,C,D" "A,C" "A,B" "B,C"
How should I code this function?

You should try this for yourself as this is a good learning exercise for beginners in Scala like you mentioned!
So here is an idea on how you could do it! There are a million other ways, but this is just for you to get started!
scala> val l = List("A,C,D", "A,C", "A,B", "B,C", "D,F", "G,D,H")
l: List[String] = List(A,C,D, A,C, A,B, B,C, D,F, G,D,H)
scala> l.filter(elem => elem.contains("A") || elem.contains("C"))
res1: List[String] = List(A,C,D, A,C, A,B, B,C)
scala>

val samples = List ("A,C,D", "A,C", "A,B", "B,C", "D,F", "G,D,H")
samples.filter (s => s.contains ('A') || s.contains ('C'))
> List(A,C,D, A,C, A,B, B,C)
Note how you raised the important keywords contains and filter (though you will not always that lucky).

assuming your input is string as below,
scala> val input = """"A,C,D" "A,C" "A,B" "B,C" "D,F" "G,D,H""""
input: String = "A,C,D" "A,C" "A,B" "B,C" "D,F" "G,D,H"
you can split by " and then filter out empty strings. Then filter those which contains A | C
scala> input.split("\"").map(_.trim).filter(_.nonEmpty).filter(e => e.contains("A") || e.contains("C"))
res1: Array[String] = Array(A,C,D, A,C, A,B, B,C)
Or you can also apply regex pattern something like .*A.*|.*C.*,
scala> input.split("\"").filter(_.nonEmpty).filter(_.matches(".*(A|C).*"))
res2: Array[String] = Array(A,C,D, A,C, A,B, B,C)
Also see:
filter a List according to multiple contains

Document Count of a Word in Spark/Scala

I have a text variable which is an RDD of String in scala
val data = sc.parallelize(List("i am a good boy.Are you a good boy.","You are also working here.","I am posting here today.You are good."))
I have another variable in Scala Map(given below)
//list of words for which doc count needs to be found,initial doc count is 1
val dictionary = Map( """good""" -> 1,"""working""" -> 1,"""posting""" -> 1 ).
I want to do a document count of each of the dictionary terms and get the output in key value format
My output should be like below for the above data.
(good,2)
(working,1)
(posting,1)
What i have tried is
dictionary.map { case(k,v) => k -> k.r.findFirstIn(data.map(line => line.trim()).collect().mkString(",")).size}
I am getting counts as 1 for all the words.
Please help me in fixing the above line
Thanks in advance.

Why not use flatMap to create the dictionary and then you can query that.
val dictionary = data.flatMap {case line => line.split(" ")}.map {case word => (word, 1)}.reduceByKey(_+_)
If I collect this in the REPL I get the following result:
res9: Array[(String, Int)] = Array((here,1), (good.,1), (good,2), (here.,1), (You,1), (working,1), (today.You,1), (boy.Are,1), (are,2), (a,2), (posting,1), (i,1), (boy.,1), (also,1), (I,1), (am,2), (you,1))
Obviously you would need to do a better split than in my simple example.

First of all your dictionary should be a Set, because in general sense you need to map the Set of terms to the number of documents which contain them.
So your data should look like:
scala> val docs = List("i am a good boy.Are you a good boy.","You are also working here.","I am posting here today.You are good.")
docs: List[String] = List(i am a good boy.Are you a good boy., You are also working here., I am posting here today.You are good.)
Your dictionary should look like:
scala> val dictionary = Set("good", "working", "posting")
dictionary: scala.collection.immutable.Set[String] = Set(good, working, posting)
Then you have to implement your transformation, for the simplest logic of the contains function it might look like:
scala> dictionary.map(k => k -> docs.count(_.contains(k))) toMap
res4: scala.collection.immutable.Map[String,Int] = Map(good -> 2, working -> 1, posting -> 1)
For better solution I'd recommend you to implement specific function for your requirements
(String, String) => Boolean
to determine the presence of the term in the document:
scala> def foo(doc: String, term: String): Boolean = doc.contains(term)
foo: (doc: String, term: String)Boolean
Then final solution will look like:
scala> dictionary.map(k => k -> docs.count(d => foo(d, k))) toMap
res3: scala.collection.immutable.Map[String,Int] = Map(good -> 2, working -> 1, posting -> 1)
The last thing you have to do is to calculate the result map using SparkContext. First of all you have to define what data you want to have parallelised. Let's assume we want to parallelize the collection of the documents, then solution might be like following:
val docsRDD = sc.parallelize(List(
"i am a good boy.Are you a good boy.",
"You are also working here.",
"I am posting here today.You are good."
))
docsRDD.mapPartitions(_.map(doc => dictionary.collect {
case term if doc.contains(term) => term -> 1
})).map(_.toMap) reduce { case (m1, m2) => merge(m1, m2) }
def merge(m1: Map[String, Int], m2: Map[String, Int]) =
m1 ++ m2 map { case (k, v) => k -> (v + m1.getOrElse(k, 0)) }

HowTo get a Map from a csv string

I'm fairly new to Scala, but I'm doing my exercises now.
I have a string like "A>Augsburg;B>Berlin". What I want at the end is a map
val mymap = Map("A"->"Augsburg", "B"->"Berlin")
What I did is:
val st = locations.split(";").map(dynamicListExtract _)
with the function
private def dynamicListExtract(input: String) = {
if (input contains ">") {
val split = input split ">"
Some(split(0), split(1)) // return key , value
} else {
None
}
}
Now I have an Array[Option[(String, String)
How do I elegantly convert this into a Map[String, String]
Can anybody help?
Thanks

Just change your map call to flatMap:
scala> sPairs.split(";").flatMap(dynamicListExtract _)
res1: Array[(java.lang.String, java.lang.String)] = Array((A,Augsburg), (B,Berlin))
scala> Map(sPairs.split(";").flatMap(dynamicListExtract _): _*)
res2: scala.collection.immutable.Map[java.lang.String,java.lang.String] = Map((A,Augsburg), (B,Berlin))
For comparison:
scala> Map("A" -> "Augsburg", "B" -> "Berlin")
res3: scala.collection.immutable.Map[java.lang.String,java.lang.String] = Map((A,Augsburg), (B,Berlin))

In 2.8, you can do this:
val locations = "A>Augsburg;B>Berlin"
val result = locations.split(";").map(_ split ">") collect { case Array(k, v) => (k, v) } toMap
collect is like map but also filters values that aren't defined in the partial function. toMap will create a Map from a Traversable as long as it's a Traversable[(K, V)].

It's also worth seeing Randall's solution in for-comprehension form, which might be clearer, or at least give you a better idea of what flatMap is doing.
Map.empty ++ (for(possiblePair<-sPairs.split(";"); pair<-dynamicListExtract(possiblePair)) yield pair)

A simple solution (not handling error cases):
val str = "A>Aus;B>Ber"
var map = Map[String,String]()
str.split(";").map(_.split(">")).foreach(a=>map += a(0) -> a(1))
but Ben Lings' is better.

val str= "A>Augsburg;B>Berlin"
Map(str.split(";").map(_ split ">").map(s => (s(0),s(1))):_*)
--or--
str.split(";").map(_ split ">").foldLeft(Map[String,String]())((m,s) => m + (s(0) -> s(1)))

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Scala runtime string interpolation/formatting - string

A. As of Scala 2.10.3, you can't use StringContext.f unless you know the number of arguments at compile time since the .f method is a macro. B. Use String.format, just like you would in the good ol' days of Java.

Related

scala - string parsing without Regex

How to execute Column expression in spark without dataframe

Scala-String filter operation from a new learner

Document Count of a Word in Spark/Scala

HowTo get a Map from a csv string

Categories

Resources