How to pass driver variable to executor udf object?

How to pass driver variable to executor udf object? - apache-spark

I have a code repository execute and it includes some udf class. Some udf need external network call, so I write a trait:
trait NetworkRunner {
def eval(args): object
}
case class MyUDF(child: Expression, networkRunner: NetworkRunner) {
override def eval(input: InternalRow): Any = {
networkRunner.eval(xxx)
do something...
}
}
Then another main code repository used it and implement the trait to register udf and run sql.
public class ExternalNetworkRunner implements NetworkRunner, Serializable {
#Override
public object eval(args...) {
do something...
}
}
I register udf with pass an ExternalNetworkRunner object to MyUDF, so I can work in executor without null pointer.
But now I need add network call logic in many udf. It is a big work to add a NetworkRunner arg in each udf class.
So I want to add a global ExternalNetworkRunner in code repository execute:
trait NetworkRunner {
def eval(args): object
}
object NetworkRunner {
var networkRunner: NetworkRunner = _
def checkInit(): boolean = {
if (networkRunner == null) {
val clazz = Class.forName("com.xxx.xxx.xxx.ExternalNetworkRunner")
networkRunner = clazz.newInstance().asInstanceOf[NetworkRunner]
}
}
}
I init the runner by classLoader， but I need to add some info to NetworkRunner at each executors. How can I pass the variables from driver to udf evaluator object?
Maybe the problem is the same as this: How to pass configuration from driver to executors in Spark?

Related

Kotlin thread safe native lazy singleton with parameter

In java we can write thead-safe singletons using double Checked Locking & volatile:
public class Singleton {
private static volatile Singleton instance;
public static Singleton getInstance(String arg) {
Singleton localInstance = instance;
if (localInstance == null) {
synchronized (Singleton.class) {
localInstance = instance;
if (localInstance == null) {
instance = localInstance = new Singleton(arg);
}
}
}
return localInstance;
}
}
How we can write it in kotlin?
About object
object A {
object B {}
object C {}
init {
C.hashCode()
}
}
I used kotlin decompiler to get that
public final class A {
public static final A INSTANCE;
private A() {
INSTANCE = (A)this;
A.C.INSTANCE.hashCode();
}
static {
new A();
}
public static final class B {
public static final A.B INSTANCE;
private B() {
INSTANCE = (A.B)this;
}
static {
new A.B();
}
}
public static final class C {
public static final A.C INSTANCE;
private C() {
INSTANCE = (A.C)this;
}
static {
new A.C();
}
}
}
All of object have constructor invoke in static block. Based on it, we can think that it's not lazy.
Сlose to the right answer.
class Singleton {
companion object {
val instance: Singleton by lazy(LazyThreadSafetyMode.PUBLICATION) { Singleton() }
}
}
Decompiled:
public static final class Companion {
// $FF: synthetic field
private static final KProperty[] $$delegatedProperties = new KProperty[]{(KProperty)Reflection.property1(new PropertyReference1Impl(Reflection.getOrCreateKotlinClass(Singleton.Companion.class), "instance", "getInstance()Lru/example/project/tech/Singleton;"))};
#NotNull
public final Singleton getInstance() {
Lazy var1 = Singleton.instance$delegate;
KProperty var3 = $$delegatedProperties[0];
return (Singleton)var1.getValue();
}
private Companion() {
}
// $FF: synthetic method
public Companion(DefaultConstructorMarker $constructor_marker) {
this();
}
}
I hope Kotlin developers will make non reflection implementation in future...

Kotlin has an equivalent of your Java code, but more safe. Your double lock check is not recommended even for Java. In Java you should use an inner class on the static which is also explained in Initialization-on-demand holder idiom.
But that's Java. In Kotlin, simply use an object (and optionally a lazy delegate):
object Singletons {
val something: OfMyType by lazy() { ... }
val somethingLazyButLessSo: OtherType = OtherType()
val moreLazies: FancyType by lazy() { ... }
}
You can then access any member variable:
// Singletons is lazy instantiated now, then something is lazy instantiated after.
val thing = Singletons.something // This is Doubly Lazy!
// this one is already loaded due to previous line
val eager = Singletons.somethingLazyButLessSo
// and Singletons.moreLazies isn't loaded yet until first access...
Kotlin intentionally avoids the confusion people have with singletons in Java. And avoids the "wrong versions" of this pattern -- of which there are many. It instead provides the simpler and the safest form of singletons.
Given the use of lazy(), if you have other members each would individually be lazy. And since they are initialized in the lambda passed to lazy() you can do things that you were asking about for about customizing the constructor, and for each member property.
As a result you have lazy loading of Singletons object (on first access of instance), and then lazier loading of something (on first access of member), and complete flexibility in object construction.
See also:
lazy() function
Lazy thread safe mode options
Object declarations
As a side note, look at object registry type libraries for Kotlin that are similar to dependency injection, giving you singletons with injection options:
Injekt - I'm the author
Kodein - Very similar and good

Object declaration is exactly for this purpose:
object Singleton {
//singleton members
}
It is lazy and thread-safe, it initializes upon first call, much as Java's static initializers.
You can declare an object at top level or inside a class or another object.
For more info about working with objects from Java, please refer to this answer.
As to the parameter, if you want to achieve exactly the same semantics (first call to getInstance takes its argument to initialize the singleton, following calls just return the instance, dropping the arguments), I would suggest this construct:
private object SingletonInit { //invisible outside the file
lateinit var arg0: String
}
object Singleton {
val arg0: String = SingletonInit.arg0
}
fun Singleton(arg0: String): Singleton { //mimic a constructor, if you want
synchronized(SingletonInit) {
SingletonInit.arg0 = arg0
return Singleton
}
}
The main flaw of this solution is that it requires the singleton to be defined in a separate file to hide the object SingletonInit, and you cannot reference Singleton directly until it's initialized.
Also, see a similar question about providing arguments to a singleton.

I recently wrote an article on that topic.
TL;DR Here's the solution I came up to:
1) Create a SingletonHolder class. You only have to write it once:
open class SingletonHolder<out T, in A>(creator: (A) -> T) {
private var creator: ((A) -> T)? = creator
#Volatile private var instance: T? = null
fun getInstance(arg: A): T {
val i = instance
if (i != null) {
return i
}
return synchronized(this) {
val i2 = instance
if (i2 != null) {
i2
} else {
val created = creator!!(arg)
instance = created
creator = null
created
}
}
}
}
2) Use it like this in your singletons:
class MySingleton private constructor(arg: ArgumentType) {
init {
// Init using argument
}
companion object : SingletonHolder<MySingleton, ArgumentType>(::MySingleton)
}
The singleton initialization will be lazy and thread-safe.

Groovy Copying / Combining MetaMethods From Multiple Objects

I have two classes. At runtime, I want to "clone" the methods of one object, over to another. Is this possible? My failed attempt using leftshift is shown below.
(Note: I also tried currMethod.clone() with the same result.)
class SandboxMetaMethod2 {
String speak(){
println 'bow wow'
}
}
class SandboxMetaMethod1{
void leftShift(Object sandbox2){
sandbox2.metaClass.getMethods().each{currMethod->
if(currMethod.name.contains("speak")){
this.speak()
this.metaClass."$currMethod.name" = currMethod
this.speak()
}
}
}
String speak(){
println 'woof'
}
}
class SandboxMetaMethodSpec extends Specification {
def "try this"(){
when:
def sandbox1 = new SandboxMetaMethod1()
def sandbox2 = new SandboxMetaMethod2()
sandbox1 << sandbox2
then:
true
}
}
//Output
woof
speak
woof
Per Request, I am adding background as to the goal / use case:
It's very much like a standard functional type of use case. In summary, we have a lot of methods on a class which applies to all of our client environments (50-100). We apply those to process data in a certain default order. Each of those methods may be overridden by client specific methods (if they exist with the same method name), and the idea was to use the approach above to "reconcile" the method set. Based on the client environment name, we need a way to dynamically override methods.
Note: Overriding methods on the metaclass is very standard (or should i say, it's the reason the amazing capability exists). And it works if my method exists as text like String currMethod = "{x-> x+1}", then i just say this.metaClass."$currMethodName" = currMethod. My challenge in this case is that my method is compiled and exists on another class, rather than being defined as text somewhere.
The goal of having all the custom methods compiled in client-specific classes at build time was to avoid the expense of compilation of these dynamic methods at runtime for each calculation, so all client-specific methods are compiled into a separate client-specific JAR at build time. This way also allows us to only deploy the client-specific code to the respective client, without all the other clients calculations in some master class.
I hope that makes sense.
New Approach, in Response to Jeremie B's suggestion:
Since I need to choose the trait to implement by name at runtime, will something like this work:
String clientName = "client1"
String clientSpeakTrait = "${clientName}Speak"
trait globalSpeak {
String speak() {
println 'bow wow'
}
}
trait client1Speak {
String speak() {
println 'woof'
}
}
def mySpeaker = new Object().withTraits globalSpeak, clientSpeakTrait

A basic example with Traits :
trait Speak {
String speak() {
println 'bow wow'
}
}
class MyClass {
}
def instance = new MyClass()
def extended = instance.withTraits Speak
extended.speak()
You can choose which trait to use at runtime :
def clientTrait = Speak
def sb = new Object().withTraits(clientTrait)
sb.speak()
And dynamically load the trait with a ClassLoader :
def clientTrait = this.class.classLoader.loadClass "my.package.${client}Speak"
def sb = new Object().withTraits(clientTrait)

On closures and groovy builder pattern

Starting to grasp closures in general and some groovy features.
Given the following code:
class Mailer {
void to(final String to) { println "to $to" }
void from(final String from) { println "from $from" }
static void send(Closure configuration) {
Mailer mailer = new Mailer()
mailer.with configuration
}
}
class MailSender {
static void sendMessage() {
Mailer.send {
to 'them'
from 'me'
}
}
}
MailSender.sendMessage()
What happens under the hood when you pass a closure to Mailer.send method?
Does to and from are passed as arguments from the Closure point of view? Which types the Closure maps them?
And then inside the Mailer.send method at the moment the Mailer object calls mailer.with receiving the configuration object, the object maps them into method calls. Groovy does this by reflection?

Groovy can dynamically define the delegate of a closure and even the this object.
with is setting the delegate and executing the closure. This is a verbose way to achieve the same:
def math = {
given 4
sum 5
print
}
class PrintMath {
def initial
def given(val) {
initial = val
}
def sum(val) {
initial += val
}
def getPrint() {
println initial
return initial
}
}
math.delegate = new PrintMath()
math.resolveStrategy = Closure.DELEGATE_ONLY
assert math() == 9
What happens under the hood when you pass a closure to Mailer.send method?
It receives a not-yet-executed block of code.
Does to and from are passed as arguments from the Closure point of view?
No, it is better thinking of them as an anonymous class/lambda in java, or a function(){} in javascript.
Which types the Closure maps them?
None, they are method calls waiting to be executed. They can be delegated to different objects, though.
And then inside the Mailer.send method at the moment the Mailer object calls mailer.with receiving the configuration object, the object maps them into method calls. Groovy does this by reflection?
You can decompile a Groovy class file to see what is going on. IIRC, Groovy currently uses a "reflector" strategy (with an arrayOfCallSite caching) to make calls faster OR it can use invokedynamic.
The closure math in the code above will result in this class:
// .. a lot of techno-babble
public Object doCall(Object it) {
CallSite[] arrayOfCallSite = $getCallSiteArray();
arrayOfCallSite[0].callCurrent(this, Integer.valueOf(4));
arrayOfCallSite[1].callCurrent(this, Integer.valueOf(5));
return arrayOfCallSite[2].callGroovyObjectGetProperty(this);
return null;
}

Avoid "Task not serialisable" with nested method in a class

I understand the usual "Task not serializable" issue that arises when accessing a field or a method that is out of scope of a closure.
To fix it, I usually define a local copy of these fields/methods, which avoids the need to serialize the whole class:
class MyClass(val myField: Any) {
def run() = {
val f = sc.textFile("hdfs://xxx.xxx.xxx.xxx/file.csv")
val myField = this.myField
println(f.map( _ + myField ).count)
}
}
Now, if I define a nested function in the run method, it cannot be serialized:
class MyClass() {
def run() = {
val f = sc.textFile("hdfs://xxx.xxx.xxx.xxx/file.csv")
def mapFn(line: String) = line.split(";")
val myField = this.myField
println(f.map( mapFn( _ ) ).count)
}
}
I don't understand since I thought "mapFn" would be in scope...
Even stranger, if I define mapFn to be a val instead of a def, then it works:
class MyClass() {
def run() = {
val f = sc.textFile("hdfs://xxx.xxx.xxx.xxx/file.csv")
val mapFn = (line: String) => line.split(";")
println(f.map( mapFn( _ ) ).count)
}
}
Is this related to the way Scala represents nested functions?
What's the recommended way to deal with this issue ?
Avoid nested functions?

Isn't it working in the way so that in the first case f.map(mapFN(_)) is equivalent to f.map(new Function() { override def apply(...) = mapFN(...) }) and in the second one it is just f.map(mapFN)? When you declare a method with def it is probably just a method in some anonymous class with implicit $outer reference to the enclosing class. But map requires a Function so the compiler needs to wrap it. In the wrapper you just refer to some method of that anonymous class, but not to the instance itself. If you use val, you have a direct reference to the function which you pass to the map. I'm not sure about this, just thinking out loud...

Method aliasing in class with Groovy

I'm going to internationalize groovy API abit.
For final class (e.g. String)
String.metaClass.вСтроку = {-> this.toString() }
However, this will create additional closure. Isn't there any way to just alias method with another method?
Something like this:
String.metaClass.вСтроку = String.metaClass.&toString

You could use #Category transform like this
#Category(String) class StringInternationalization {
String вСтроку() {
this.toString()
}
int длина() {
this.length()
}
}
class ApplyMixin {
static {
String.mixin(StringInternationalization)
final helloString = "Привет мир!"
println helloString.вСтроку()
assert helloString.длина() == helloString.length()
}
}
new Main()
This will create 1 Category class for each localised class and one class to apply all mixin transformations(to register all methods.) Also should be faster, then individual closures.
More reading here: http://groovy.codehaus.org/Category+and+Mixin+transformations

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

How to pass driver variable to executor udf object? - apache-spark

Related

Kotlin thread safe native lazy singleton with parameter

Groovy Copying / Combining MetaMethods From Multiple Objects

On closures and groovy builder pattern

Avoid "Task not serialisable" with nested method in a class

Method aliasing in class with Groovy

Categories

Resources