Use of implicit parameter in spark - apache-spark

Below code is used to find the average of values.I am not sure why implicit num: Numeric[T] parameter used in average function.
Code:
val data = List(("32540b03",-0.00699), ("a93dec11",0.00624),
("32cc6532",0.02337) , ("32540b03",0.256023),
("32cc6532",-0.03591),("32cc6532",-0.03591))
val rdd = sc.parallelize(data.toSeq).groupByKey().sortByKey()
def average[T]( ts: Iterable[T] )**( implicit num: Numeric[T] )** = {
num.toDouble( ts.sum ) / ts.size
}
val avgs = rdd.map(x => (x._1, average(x._2)))
Please help to know the reason for using ( implicit num: Numeric[T]) parameter.

Scala does not have a super class for numeric types. This means you can't limit T <: Number for the average to make sense (you can't really do an average of generic objects). By adding implicit you make sure it has the toDouble method which converts to double.
You could pass that conversion function always but that would mean an additional parameter so instead numeric is used. If you would do something like average(List("bla")) you would get a complaint that it can't find a num.
See also https://twitter.github.io/scala_school/advanced-types.html#otherbounds

Related

How to convert data type if Variant.Type is known?

How do I convert the data type if I know the Variant.Type from typeof()?
for example:
var a=5;
var b=6.9;
type_cast(b,typeof(a)); # this makes b an int type value
How do I convert the data type if I know the Variant.Type from typeof()?
You can't. GDScript does not have generics/type templates, so beyond simple type inference, there is no way to specify a type without knowing the type.
Thus, any workaround to cast the value to a type only known at runtime would have to be declared to return Variant, because there is no way to specify the type.
Furthermore, to store the result on a variable, how do you declare the variable if you don't know the type?
Let us have a look at variable declarations. If you do not specify a type, you get a Variant.
For example in this code, a is a Variant that happens to have an int value:
var a = 5
In this other example a is an int:
var a:int = 5
This is also an int:
var a := 5
In this case the variable is typed according to what you are using to initialized, that is the type is inferred.
You may think you can use that like this:
var a = 5
var b := a
Well, no. That is an error. "The variable type can't be inferred". As far as Godot is concerned a does not have a type in this example.
I'm storing data in a json file: { variable:[ typeof(variable), variable_value ] } I added typeof() because for example I store an int but when I reassign it from the file it gets converted to float (one of many other examples)
It is true that JSON is not good at storing Godot types. Which is why many authors do not recommend using JSON to save state.
Now, be aware that we can't get a variable with the right type as explained above. Instead we should try to get a Variant of the right type.
If you cannot change the serialization format, then you are going to need one big match statement. Something like this:
match type:
TYPE_NIL:
return null
TYPE_BOOL:
return bool(value)
TYPE_INT:
return int(value)
TYPE_REAL:
return float(value)
TYPE_STRING:
return str(value)
Those are not all the types that a Variant can hold, but I think it would do for JSON.
Now, if you can change the serialization format, then I will suggest to use str2var and var2str.
For example:
var2str(Vector2(1, 10))
Will return a String value "Vector2( 1, 10 )". And if you do:
str2var("Vector2( 1, 10 )")
You get a Variant with a Vector2 with 1 for the x, and 10 for the y.
This way you can always store Strings, in a human readable format, that Godot can parse. And if you want to do that for whole objects, or you want to put them in a JSON structure, that is up to you.
By the way, you might also be interested in ResourceFormatSaver and ResourceFormatLoader.

RDD Sliding error not understood

Given that this works:
(1 to 5).iterator.sliding(3).toList
Then why does this not work?
val rdd1 = sc.parallelize(List(1,2,3,4,5,6,7,8,9,10), 3)
val z = rdd1.iterator.sliding(3).toList
I get the following error and try to apply the fix but that does not work either!
notebook:3: error: missing argument list for method iterator in class RDD
Unapplied methods are only converted to functions when a function type is expected.
You can make this conversion explicit by writing `iterator _` or
`iterator(_,_)` instead of `iterator`.
val z = rdd1.iterator.sliding(3).toList
^
I am just trying examples and this I cannot really follow.
It doesn't work, because iterator RDD is not a Collection and it's iterator method has different signature:
final def iterator(split: Partition, context: TaskContext): Iterator[T]
Internal method to this RDD; will read from cache if applicable, or otherwise compute it. This should not be called by users directly, but is available for implementors of custom subclasses of RDD.
If you want to convert RDD to local Iterator use toLocalIterator:
def toLocalIterator: Iterator[T]
Return an iterator that contains all of the elements in this RDD.
rdd1.toLocalIterator
but what you probably want is RDDFunctions.sliding - Operate on neighbor elements in RDD in Spark.

Understanding String in Scala and the map method

I wrote the following simple example to understand how the map method works:
object Main{
def main (args : Array[String]) = {
val test = "abc"
val t = Vector(97, 98, 99)
println(test.map(c => (c + 1))) //1 Vector(98, 99, 100)
println(test.map(c => (c + 1).toChar)) //2 bcd
println(t.map(i => (i + 1))) //3 Vector(98, 99, 100)
println(t.map(i => (i + 1).toChar)) //4 Vector(b, c, d)
};
}
I didn't quite understand why bcd is printed at //2. Since every String is treated by Scala as being a Seq I thought that test.map(c => (c + 1).toChar) should have produced another Seq. As //1 suggests Vector(b, c, d). But as you can see, it didn't. Why? How does it actually work?
This is a feature of Scala collections (String in this case is treated as a collection of characters). The real explanation is quite complex, and involves understanding of typeclasses (I guess, this is why Haskell was mentioned in the comment), but the simple explanation is, well, not quite hard.
The point is, Scala collections library authors tried very hard to avoid code duplication. For example, the map function on String is actually defined here: scala.collection.TraversableLike#map. On the other hand, a naive approach to such task would make map return TraversableLike, not the original type the map was called on (it was the String). That's why they've came up with an approach that allows to avoid both code duplication and unnecessary type casting or too general return type.
Basically, Scala collections methods like map produce the type that is as close to the type it was called at as possible. This is achieved using a typeclass called CanBuildFrom. The full signature of the map looks as follows:
def map[B, That](f: A => B)(implicit bf: CanBuildFrom[Repr, B, That]): That
There is a lot of explanations what is a typeclass and CanBuildFrom around. I'd suggest looking here first: http://docs.scala-lang.org/overviews/core/architecture-of-scala-collections.html#factoring-out-common-operations. Another good explanation is here: Scala 2.8 CanBuildFrom
When you use map, this is what is happening : [List|Seq|etc].map([eachElement] => [do something])
map applies some operation to each element of the variable on the left hand-side : "abc".map(letter => letter + 1) will add 1 to each element of the String "abc". And each element of the String abc is called here "letter" (which is of type Char)
"abc" is a String, and as in C++, it is treated as an array of characters. But since test is of type String, the map function gives a String as well.
I tried the following :
val test2 : Seq[Char] = "abc"
but I still get a result of type String, I guess Scala does the conversion automatically from a Seq[Char] to a String
I hope it helped!

Scala String vs java.lang.String - type inference

In the REPL, I define a function. Note the return type.
scala> def next(i: List[String]) = i.map {"0" + _} ::: i.reverse.map {"1" + _}
next: (i: List[String])List[java.lang.String]
And if I specify the return type as String
scala> def next(i: List[String]): List[String] = i.map {"0" + _} ::: i.reverse.map {"1" + _}
next: (i: List[String])List[String]
Why the difference? I can also specify the return type as List[Any], so I guess String is just a wrapper supertype to java.lang.String. Will this have any practical implications or can I safely not specify the return type?
This is a very good question! First, let me assure you that you can safely specify the return type.
Now, let's look into it... yes, when left to inference, Scala infers java.lang.String, instead of just String. So, if you look up "String" in the ScalaDoc, you won't find anything, which seems to indicate it is not a Scala class either. Well, it has to come from someplace, though.
Let's consider what Scala imports by default. You can find it by yourself on the REPL:
scala> :imports
1) import java.lang._ (155 types, 160 terms)
2) import scala._ (801 types, 809 terms)
3) import scala.Predef._ (16 types, 167 terms, 96 are implicit)
The first two are packages -- and, indeed, String can be found on java.lang! Is that it, then? Let's check by instantiating something else from that package:
scala> val s: StringBuffer = new StringBuffer
s: java.lang.StringBuffer =
scala> val s: String = new String
s: String = ""
So, that doesn't seem to be it. Now, it can't be inside the scala package, or it would have been found when looking up on the ScalaDoc. So let's look inside scala.Predef, and there it is!
type String = String
That means String is an alias for java.lang.String (which was imported previously). That looks like a cyclic reference though, but if you check the source, you'll see it is defined with the full path:
type String = java.lang.String
Next, you might want to ask why? Well, I don't have any idea, but I suspect it is to make such an important class a little less dependent on the JVM.

Avoiding implicit def ambiguity in Scala

I am trying to create an implicit conversion from any type (say, Int) to a String...
An implicit conversion to String means RichString methods (like reverse) are not available.
implicit def intToString(i: Int) = String.valueOf(i)
100.toCharArray // => Array[Char] = Array(1, 0, 0)
100.reverse // => error: value reverse is not a member of Int
100.length // => 3
An implicit conversion to RichString means String methods (like toCharArray) are not available
implicit def intToRichString(i: Int) = new RichString(String.valueOf(i))
100.reverse // => "001"
100.toCharArray // => error: value toCharArray is not a member of Int
100.length // => 3
Using both implicit conversions means duplicated methods (like length) are ambiguous.
implicit def intToString(i: Int) = String.valueOf(i)
implicit def intToRichString(i: Int) = new RichString(String.valueOf(i))
100.toCharArray // => Array[Char] = Array(1, 0, 0)
100.reverse // => "001"
100.length // => both method intToString in object $iw of type
// (Int)java.lang.String and method intToRichString in object
// $iw of type (Int)scala.runtime.RichString are possible
// conversion functions from Int to ?{val length: ?}
So, is it possible to implicitly convert to String and still support all String and RichString methods?
I don't have a solution, but will comment that the reason RichString methods are not available after your intToString implicit is that Scala does not chain implicit calls (see 21.2 "Rules for implicits" in Programming in Scala).
If you introduce an intermediate String, Scala will make the implict converstion to a RichString (that implicit is defined in Predef.scala).
E.g.,
$ scala
Welcome to Scala version 2.7.5.final [...].
Type in expressions to have them evaluated.
Type :help for more information.
scala> implicit def intToString(i: Int) = String.valueOf(i)
intToString: (Int)java.lang.String
scala> val i = 100
i: Int = 100
scala> val s: String = i
s: String = 100
scala> s.reverse
res1: scala.runtime.RichString = 001
As of Scala 2.8, this has been improved. As per this paper (§ Avoiding Ambiguities) :
Previously, the most specific overloaded method or implicit conversion
would be chosen based solely on the method’s argument types. There was an
additional clause which said that the most specific method could not be
defined in a proper superclass of any of the other alternatives. This
scheme has been replaced in Scala 2.8 by the following, more liberal one:
When comparing two different applicable alternatives of an overloaded
method or of an implicit, each method gets one point for having more
specific arguments, and another point for being defined in a proper
subclass. An alternative “wins” over another if it gets a greater number
of points in these two comparisons. This means in particular that if
alternatives have identical argument types, the one which is defined in a
subclass wins.
See that other paper (§6.5) for an example.
Either make a huge proxy class, or suck it up and require the client to disambiguate it:
100.asInstanceOf[String].length
The only option I see is to create a new String Wrapper class MyString and let that call whatever method you want to be called in the ambiguous case. Then you could define implicit conversions to MyString and two implicit conversions from MyString to String and RichString, just in case you need to pass it to a library function.
The accepted solution (posted by Mitch Blevins) will never work: downcasting Int to String using asInstanceOf will always fail.
One solution to your problem is to add a conversion from any String-convertible type to RichString (or rather, to StringOps as it is now named):
implicit def stringLikeToRichString[T](x: T)(implicit conv: T => String) = new collection.immutable.StringOps(conv(x))
Then define your conversion(s) to string as before:
scala> implicit def intToString(i: Int) = String.valueOf(i)
warning: there was one feature warning; re-run with -feature for details
intToString: (i: Int)String
scala> 100.toCharArray
res0: Array[Char] = Array(1, 0, 0)
scala> 100.reverse
res1: String = 001
scala> 100.length
res2: Int = 3
I'm confused: can't you use .toString on any type anyway thus avoiding the need for implicit conversions?

Resources