Equals function doesn't work with custom object in spark - apache-spark

I'm haveing trouble, I presume with serialization, in a Spark repl.
I have the following code snippet:
class Foobar (val a: Int, val b: Int) extends Serializable {
override def equals (other: Any): Boolean =
other match {
case that: Foobar =>
println("Comparison of similar objects")
this.a == that.a && this.b == that.b
case _ =>
println("Comparison of disparate objects")
false
}
override def toString = s"[$a:$b]"
}
If I create two instances, one (foo) on a worker, and one (bar) on the driver:
val foo = sc.parallelize(Seq(1)).map(n => new Foobar(n, n)).collect.apply(0)
val bar = new Foobar(1, 1)
then foo != bar (and spouts "Comparison of disparate objects") - yet
foo.getClass == bar.getClass
foo.a == bar.a, and
foo.b == bar.b
Can anyone explain why this happens?

Related

Spark SQL doesn't call my UDT equals/hashcode methods

I want to implement my comparison operators(equals, hashcode, ordering) in a data type defined by me in Spark SQL. Although Spark SQL UDT's still remains private, I follow some examples like this, to workaround this situation.
I have a class called MyPoint:
#SQLUserDefinedType(udt = classOf[MyPointUDT])
case class MyPoint(x: Double, y: Double) extends Serializable {
override def hashCode(): Int = {
println("hash code")
31 * (31 * x.hashCode()) + y.hashCode()
}
override def equals(other: Any): Boolean = {
println("equals")
other match {
case that: MyPoint => this.x == that.x && this.y == that.y
case _ => false
}
}
Then, I have the UDT class:
private class MyPointUDT extends UserDefinedType[MyPoint] {
override def sqlType: DataType = ArrayType(DoubleType, containsNull = false)
override def serialize(obj: MyPoint): ArrayData = {
obj match {
case features: MyPoint =>
new GenericArrayData2(Array(features.x, features.y))
}
}
override def deserialize(datum: Any): MyPoint = {
datum match {
case data: ArrayData if data.numElements() == 2 => {
val arr = data.toDoubleArray()
new MyPoint(arr(0), arr(1))
}
}
}
override def userClass: Class[MyPoint] = classOf[MyPoint]
override def asNullable: MyPointUDT = this
}
Then I create a simple DataFrame:
val p1 = new MyPoint(1.0, 2.0)
val p2 = new MyPoint(1.0, 2.0)
val p3 = new MyPoint(10.0, 20.0)
val p4 = new MyPoint(11.0, 22.0)
val points = Seq(
("P1", p1),
("P2", p2),
("P3", p3),
("P4", p4)
).toDF("label", "point")
points.registerTempTable("points")
spark.sql("SELECT Distinct(point) FROM points").show()
The problem is: Why the SQL query doesn't execute the equals method inside MyPoint class? How comparasions are being made? How can I implement my comparasion operators in this example?

Idiomatic means of choosing class constructor in groovy

I have a class like this:
in foo.groovy
class Foo {
String thing
Integer other
Foo(String thing) {
this.thing = thing
}
Foo(Integer other) {
this.other = other
}
}
return Foo.class
Now I would like to invoke these constructors. What I am doing is:
Other.groovy
def foo = evaluate(new File(ClassLoader.getSystemResource('foo.groovy').file)).newInstance(10)
def foo2 = evaluate(new File(ClassLoader.getSystemResource('foo.groovy').file)).newInstance("thing")
But this doesn't seem like the correct way of doing it. Ideally I would like to actually name the file Foo.groovy but then I get an error because it automatically declares the class for me. Basically, I want it to work like a classic Java class
Maybe I'm missing something here, but:
class Foo {
String thing
Integer other
Foo(String thing) {
this.thing = thing
}
Foo(Integer other) {
this.other = other
}
}
def x = new Foo(10)
assert x.other == 10 // true
def y = new Foo("foo")
assert y​​​​.thing​ == "foo"​​ // true
What are you trying to accomplish here other than that?
Edit: Try it here.

How to use propertyMissing on a class that implements java.util.Map in groovy

I understand that we cannot access Map properties the same way we access them in other classes, because of the ability to get map keys with dot notations in groovy.
Now, Is there a way, for a class that implements java.util.Map, to still benefit from the expando metaclass for using propertyMissing ?
Here is what I'm trying :
LinkedHashMap.metaClass.methodMissing = { method, args ->
println "Invoking ${method}"
"Invoking ${method}"
}
LinkedHashMap.metaClass.propertyMissing = { method, args ->
println "Accessing ${method}"
"Accessing ${method}"
}
def foo = [:]
assert "Invoking bar" == foo.bar() // this works fine
assert "Accessing bar" == foo.bar // this doesn't work, for obvious reasons, but I'd like to be able to do that...
I've been trying through custom DelegatingMetaClasses but didn't succeed...
Not sure it fits your use-case, but you could use Guava and the withDefault method on Maps...
#Grab( 'com.google.guava:guava:16.0.1' )
import static com.google.common.base.CaseFormat.*
def map
map = [:].withDefault { key ->
LOWER_UNDERSCORE.to(LOWER_CAMEL, key).with { alternate ->
map.containsKey(alternate) ? map[alternate] : null
}
}
map.possibleSolution = 'maybe'
assert map.possible_solution == 'maybe'
One side-effect of this is that after the assert, the map contains two key:value pairs:
assert map == [possibleSolution:'maybe', possible_solution:'maybe']
If I understood well you can provide a custom map:
class CustomMap extends LinkedHashMap {
def getAt(name) {
println "getAt($name)"
def r = super.getAt(name)
r ? r : this.propertyMissing(name)
}
def get(name) {
println "get($name)"
super.get(name)
def r = super.get(name)
r ? r : this.propertyMissing(name)
}
def methodMissing(method, args) {
println "methodMissing($method, $args)"
"Invoking ${method}"
}
def propertyMissing(method) {
println "propertyMissing($method)"
"Accessing ${method}"
}
}
def foo = [bar:1] as CustomMap
assert foo.bar == 1
assert foo['bar'] == 1
assert foo.lol == 'Accessing lol'
assert foo['lol'] == 'Accessing lol'
assert foo.bar() == 'Invoking bar'
I reread the groovy Maps javadocs, and I noticed there are 2 versions of the get method. One that takes a single argument, and one that takes 2.
The version that takes 2 does almost what I describe here : it returns a default value if it doesn't find your key.
I get the desired effect, but not in dot notation, therefore I just post this as an alternative solution in case anyone comes across this post :
Map.metaClass.customGet = { key ->
def alternate = key.replaceAll(/_\w/){ it[1].toUpperCase() }
return delegate.get(key, delegate.get(alternate, 'Sorry...'))
}
def m = [myKey : 'Found your key']
assert 'Found your key' == m.customGet('myKey')
assert 'Found your key' == m.customGet('my_key')
assert 'Sorry...' == m.customGet('another_key')
println m
-Result-
m = [myKey:Found your key, my_key:Found your key, anotherKey:Sorry..., another_key:Sorry...]
As in Tim's solution, this leads to m containing both keys after the second assert + 2 keys with the default value (Sorry...) everytime we ask for a new value not present in the initial map... which could be solved by removing the keys with default values. e.g. :
Map.metaClass.customGet = { key ->
def alternate = key.replaceAll(/_\w/){ it[1].toUpperCase() }
def ret = delegate.get(key, delegate.get(alternate, 'Sorry...'))
if (ret == 'Sorry...') {
delegate.remove(key)
delegate.remove(alternate)
}
ret
}
Feel free to comment/correct any mistakes this could lead to... just thinking out loud here...

Coercion befuddlement in Groovy

Why does the following
class Test {
#Test
void go() {
def foo1 = new MockFoo1() as Foo
def foo2 = new MockFoo2() as Foo
}
interface Foo {}
class MockFoo1 {}
class MockFoo2 {}
}
Result in a java.lang.IllegalArgumentException: argument type mismatch on the foo2 coercion?
This only happens if I coerce 2 objects of 2 different types to the same interface during a single path of execution. The groovy approved way of using closures or maps to achieve this kind of duck typing works fine.
Any light shed appreciated.
It's a bug with the ProxyGenerator adapterCache. As a workaround, you can also use some Groovy trickery to make this work:
interface Foo {
static a = {
[MockFoo1, MockFoo2].each {
it.metaClass.asType = { Class klazz ->
try {
DefaultGroovyMethods.asType(delegate, klazz)
} catch (e) {
def cache = ProxyGenerator.INSTANCE.#adapterCache.#cache
cache.each { k, v ->
cache.remove(k)
}
DefaultGroovyMethods.asType(delegate, klazz)
}
}
}
}()
}
class MockFoo1 {}
class MockFoo2 {}
def a = new MockFoo1() as Foo
def b = new MockFoo2() as Foo
assert a instanceof Foo
assert b instanceof Foo
Hope this helps!

Unexpected behavior with overloaded methods

I'm a bit confused about groovys method overloading behavior: Given the class
and tests below, I am pretty okay with testAStringNull and testBStringNull
throwing ambiguous method call exceptions, but why is that not the case for
testANull and testBNull then?
And, much more importantly: why does testBNull(null)
call String foo(A arg)? I guess the object doesn't know about the type of the variable it's bound to, but why is that call not ambiguous to groovy while the others are?
(I hope I explained well enough, my head hurts from generating this minimal
example.)
class Foo {
static class A {}
static class B {}
String foo(A arg) { return 'a' }
String foo(String s, A a) { return 'a' }
String foo(B arg) { return 'b' }
String foo(String s, B b) { return 'b' }
}
Tests:
import org.junit.Test
import Foo.A
import Foo.B
class FooTest {
Foo foo = new Foo()
#Test
void testA() {
A a = new A()
assert foo.foo(a) == 'a'
}
#Test
void testAString() {
A a = new A()
assert foo.foo('foo', a) == 'a'
}
#Test()
void testANull() {
A a = null
assert foo.foo(a) == 'a'
}
#Test
void testAStringNull() {
A a = null
assert foo.foo('foo', a) == 'a'
}
#Test
void testB() {
B b = new B()
assert foo.foo(b) == 'b'
}
#Test
void testBString() {
B b = new B()
assert foo.foo('foo', b) == 'b'
}
#Test
void testBNull() {
B b = null
assert foo.foo(b) == 'b'
}
#Test
void testBStringNull() {
B b = null
assert foo.foo('foo', b) == 'b'
}
}
It's a (somewhat little-known) oddity of Groovy's multi-dispatch mechanism, which as attempting to invoke the "most appropriate" method, in combination with the fact that the provided static type (in your case A or B) is not used as part of the dispatch mechanism. When you declare A a = null, what you get is not a null reference of type A, but a reference to NullObject.
Ultimately, to safely handle possibly null parameters to overloaded methods, the caller must cast the argument, as in
A a = null
assert foo.foo('foo', a as A) == 'a'
This discussion on "Groovy Isn't A Superset of Java" may shed some light on the issue.

Resources