MapWithStateRDDRecord with kryo - apache-spark

How can I register MapWithStateRDDRecord in kryo?
When I'm trying to do.
`sparkConfiguration.registerKryoClasses(Array(classOf[org.apache.spark.streaming.rdd.MapWithStateRDD))`
I get an error
class MapWithStateRDDRecord in package rdd cannot be accessed in package org.apache.spark.streaming.rdd
[error] classOf[org.apache.spark.streaming.rdd.MapWithStateRDDRecord]
I'd like to make sure that all serialization is done via kryo thus I set SparkConf().set("spark.kryo.registrationRequired", "true"). With this setting enabled I get exceptions during runtime. java.lang.IllegalArgumentException (Class is not registered: org.apache.spark.streaming.rdd.MapWithStateRDDRecord)

Related

how to register graphx Edge class with Kryo?

I've been trying to register Edge class with Kryo but I'm always getting the following error.
java.lang.IllegalArgumentException: Class is not registered: org.apache.spark.graphx.Edge\nNote: To register this class use: kryo.register(org.apache.spark.graphx.Edge.class);
what is wrong with following line?
sc.getConf.registerKryoClasses(Array(Class.forName("org.apache.spark.graphx.Edge")))
How should I do it?
I've had trouble getting graphx classes registered. This finally works for me...
import org.apache.spark.graphx.GraphXUtils
val conf = new SparkConf().setAppName("yourAppName")
GraphXUtils.registerKryoClasses(conf)
Here's what's going on behind the scenes...
https://github.com/amplab/graphx/blob/master/graphx/src/main/scala/org/apache/spark/graphx/GraphKryoRegistrator.scala
In your case... I'm not sure why the following wouldn't work fine, since Edge is exposed...
conf.registerKryoClasses(Array(classOf[Edge]))
But I think there are private classes in graphx that aren't exposed through the spark API, at least I see them in the graphx repo, but not the spark.graphx repo. In my case, I couldn't get VertexAttributeBlock registered, until I used the GraphXUtils method.

When are custom TableCatalogs loaded?

I've created a custom Catalog in Spark 3.0.0:
class ExCatalogPlugin extends SupportsNamespaces with TableCatalog
I've provided the configuration asking Spark to load the Catalog:
.config("spark.sql.catalog.ex", "com.test.ExCatalogPlugin")
But Spark never loads the plugin, during debug no breakpoints are ever hit inside the initialize method, and none of the namespaces it exposes are recognized. There are also no error messages logged. If I change the class name to an invalid class name no errors are thrown either.
I wrote a small TEST case similar to the test cases in the Spark code, and I am able to load the plugin if I call:
package org.apache.spark.sql.connector.catalog
....
class CatalogsTest extends FunSuite {
test("EX") {
val conf = new SQLConf()
conf.setConfString("spark.sql.catalog.ex", "com.test.ExCatalogPlugin")
val plugin:CatalogPlugin = Catalogs.load("ex", conf)
}
}
Spark is using it's normal Lazy loading techniques, and doesn't instantiate the custom Catalog Plugin until it's needed.
In my case referencing the plugin in one of two ways worked:
USE ex, this explicit USE statement caused Spark to lookup the catalog and instantiate it.
I have a companion TableProvider defined as class DefaultSource extends SupportsCatalogOptions. This class has a hard coded extractCatalog set to ex. If I create a reader for this source, it sees the name of the catalog provider and will instantiate it. It then uses the Catalog Provider to create the table.

ical4j 2.2.0 using Grape, throws java.lang.NoClassDefFoundError: javax/cache/configuration/Configuration when loading a calendar

Previously I have been able to run this script that read events from a url.ics
import net.fortuna.ical4j.util.Calendars
import net.fortuna.ical4j.model.component.VEvent
#Grapes(
#Grab(group='org.mnode.ical4j', module='ical4j', version='2.2.0')
)
def url = 'https://calendar.google.com/calendar/ical/xxxx/basic.ics'.toURL()
def cal = Calendars.load(url)
However, now I am getting this exception java.lang.NoClassDefFoundError: javax/cache/configuration/Configuration.
I assume there is some sort of dependency change that has occurred. I have noted this
javax.cache.cache-api [optional*] - Supports caching timzeone definitions. * NOTE: when not included you must set a value for the net.fortuna.ical4j.timezone.cache.impl configuration
however, now I am getting this java.lang.NoClassDefFoundError: Could not initialize class net.fortuna.ical4j.validate.AbstractCalendarValidatorFactory
any help appreciated.
ical4j looks for a properties file called ical4j.properties and loads configuration from it. Create this file in the same folder and add
net.fortuna.ical4j.timezone.cache.impl=net.fortuna.ical4j.util.MapTimeZoneCache
to specify in-memory cache provider that uses ConcurrentHashMap. When property net.fortuna.ical4j.timezone.cache.impl is not specified, ical4j falls back to JCacheTimeZoneCache which uses cache manager and requires valid caching library to be present in the classpath.
The alternative to using ical4j.properties file is to set this property programatically, e.g.
System.setProperty("net.fortuna.ical4j.timezone.cache.impl", "net.fortuna.ical4j.util.MapTimeZoneCache")
Just remember to set it before calling Calendars.load(url) and it should work.

Hazelcast Supplier and Aggregation gives Concurrent Execution Exception

I am trying to get a set of the distinct values of an object's field stored in a Hazelcast map.
This line of java code:
instructions.aggregate(Supplier.all(value -> value.getWorkArea()), Aggregations.distinctValues());
has the following stacktrace :
java.util.concurrent.ExecutionException: com.hazelcast.nio.serialization.HazelcastSerializationException: java.lang.ClassNotFoundException: com.example.instruction.repository.HazelcastInstructionRepository$GeneratedEvaluationClass
com.hazelcast.nio.serialization.HazelcastSerializationException: java.lang.ClassNotFoundException: com.example.instruction.repository.HazelcastInstructionRepository$GeneratedEvaluationClass
java.lang.ClassNotFoundException: com.example.instruction.repository.HazelcastInstructionRepository$GeneratedEvaluationClass
If I were to try this line :
instructions.aggregate(Supplier.all()), Aggregations.distinctValues());
or:
instructions.aggregate(Supplier.fromPredicate(Predicates.and(Predicates.equal("type", "someType"), equal("groupId", null),
Predicates.equal("workArea", "someWorkArea"))), Aggregations.distinctValues());
It just works ... It seems to be something wrong when I am making a reference to the object's field. (I also tried it with other fields of the object and the same error gets returned)
This is running on my local environment and I am sure that the objects are being placed correctly in the Hazelcast map since the other aggregations/predicates are working.
Do you have any ideas about what am I doing wrong?
Many Thanks!
EDITED: So the problem is the closure. It's not available on all nodes. Only on the calling node.
Also. This feature is deprecated. Plz use the fast-aggregations instead.
http://docs.hazelcast.org/docs/latest/manual/html-single/#fast-aggregations

Scala + SBT - How to configure reference.conf for a shaded Akka library

TL;DR
I am trying to shade a version of the akka library and bundle it with my application (to be able to run a spray-can server on the CDH 5.7 version of Spark 1.6). The shading process messes up akka's default configuration, and after manually providing a separate version of akka's reference.conf for the shaded akka, it still looks like the 2 versions get mixed up somehow.
Is shading akka versions known to cause problems? What am I doing wrong?
Background
I have a Scala/Spark application currently running on Spark 1.6.1 standalone. The application runs a spray-can http server using spray 1.3.3, which requires akka 2.3.9 (Spark 1.6.1 standalone includes a compatible akka 2.3.11).
I am trying to migrate the application to a new Cloudera-based Spark cluster running the CDH 5.7 version of Spark 1.6. The problem is that Spark 1.6 in CDH 5.7 is bundled with akka 2.2.3 which is not sufficient for spray 1.3.3 to function properly.
Attempted solution
Following the suggestion in this post, I decided to shade akka 2.3.9 and bundle it along with my application. Although this time I stumbled upon a new problem - akka has it's default configuration defined in a reference.conf file, which should be located on the application's classpath. Due to a known issue in sbt-assembly's shading feature, it seems that the shaded akka library would require a separate configuration.
So, I ended up shading akka with the following shade rule:
ShadeRule.rename("akka.**" -> "akka_2_3_9_shade.#1")
.inLibrary("com.typesafe.akka" % "akka-actor_2.10" % "2.3.9")
.inAll
and including an additional reference.conf file in my project, which is identical to akka's original reference.conf, but with all occurances of "akka" replaced with "akka_2_3_9_shade".
Now, though, it seems that the Spark-provided akka gets mixed up somehow with the shaded akka, as I'm getting the following error:
Exception in thread "main" java.lang.IllegalArgumentException: Cannot instantiate MailboxType [akka.dispatch.UnboundedMailbox], defined in [akka.actor.default-mailbox], make sure it has a public constructor with [akka.actor.ActorSystem.Settings, com.typesafe.config.Config] parameters
at akka_2_3_9_shade.dispatch.Mailboxes$$anonfun$1.applyOrElse(Mailboxes.scala:197)
at akka_2_3_9_shade.dispatch.Mailboxes$$anonfun$1.applyOrElse(Mailboxes.scala:195)
at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:33)
at scala.util.Failure$$anonfun$recover$1.apply(Try.scala:185)
at scala.util.Try$.apply(Try.scala:161)
at scala.util.Failure.recover(Try.scala:185)
at akka_2_3_9_shade.dispatch.Mailboxes.lookupConfiguration(Mailboxes.scala:195)
at akka_2_3_9_shade.dispatch.Mailboxes.lookup(Mailboxes.scala:78)
at akka_2_3_9_shade.actor.LocalActorRefProvider.akka$actor$LocalActorRefProvider$$defaultMailbox$lzycompute(ActorRefProvider.scala:561)
at akka_2_3_9_shade.actor.LocalActorRefProvider.akka$actor$LocalActorRefProvider$$defaultMailbox(ActorRefProvider.scala:561)
at akka_2_3_9_shade.actor.LocalActorRefProvider$$anon$1.<init>(ActorRefProvider.scala:568)
at akka_2_3_9_shade.actor.LocalActorRefProvider.rootGuardian$lzycompute(ActorRefProvider.scala:564)
at akka_2_3_9_shade.actor.LocalActorRefProvider.rootGuardian(ActorRefProvider.scala:563)
at akka_2_3_9_shade.actor.LocalActorRefProvider.init(ActorRefProvider.scala:618)
at akka_2_3_9_shade.actor.ActorSystemImpl.liftedTree2$1(ActorSystem.scala:619)
at akka_2_3_9_shade.actor.ActorSystemImpl._start$lzycompute(ActorSystem.scala:616)
at akka_2_3_9_shade.actor.ActorSystemImpl._start(ActorSystem.scala:616)
at akka_2_3_9_shade.actor.ActorSystemImpl.start(ActorSystem.scala:633)
at akka_2_3_9_shade.actor.ActorSystem$.apply(ActorSystem.scala:142)
at akka_2_3_9_shade.actor.ActorSystem$.apply(ActorSystem.scala:109)
at akka_2_3_9_shade.actor.ActorSystem$.apply(ActorSystem.scala:100)
at MyApp.api.Boot$delayedInit$body.apply(Boot.scala:45)
at scala.Function0$class.apply$mcV$sp(Function0.scala:40)
at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12)
at scala.App$$anonfun$main$1.apply(App.scala:71)
at scala.App$$anonfun$main$1.apply(App.scala:71)
at scala.collection.immutable.List.foreach(List.scala:318)
at scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:32)
at scala.App$class.main(App.scala:71)
at MyApp.api.Boot$.main(Boot.scala:28)
at MyApp.api.Boot.main(Boot.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassCastException: interface akka_2_3_9_shade.dispatch.MailboxType is not assignable from class akka.dispatch.UnboundedMailbox
at akka_2_3_9_shade.actor.ReflectiveDynamicAccess$$anonfun$getClassFor$1.apply(DynamicAccess.scala:69)
at akka_2_3_9_shade.actor.ReflectiveDynamicAccess$$anonfun$getClassFor$1.apply(DynamicAccess.scala:66)
at scala.util.Try$.apply(Try.scala:161)
at akka_2_3_9_shade.actor.ReflectiveDynamicAccess.getClassFor(DynamicAccess.scala:66)
at akka_2_3_9_shade.actor.ReflectiveDynamicAccess.CreateInstanceFor(DynamicAccess.scala:84)
... 34 more
The relevant code from my application's Boot.scala file is the following:
[45] implicit val system = ActorSystem()
...
[48] val service = system.actorOf(Props[MyAppApiActor], "MyApp.Api")
...
[52] val port = config.getInt("MyApp.server.port")
[53] IO(Http) ? Http.Bind(service, interface = "0.0.0.0", port = port)
OK, so eventually I managed to solve this.
Turns out akka loads (some of the) configuration settings from the config file using keys that are defined as string literals. You can find a lot of these in akka/actor/ActorSystem.scala, for example.
And it seems that sbt-assembly does not change references to the shaded library/package name in string literals.
Also, some configuration keys are being changed by sbt-assembly's shading. I haven't really taken the time to find where and how exactly they are defined in akka's source, but the following exception, which is being thrown during the ActorSystem init code, proves that this is indeed the case:
ConfigException$Missing: No configuration setting found for key 'akka_2_3_9_shade'
So, the solution it to include a custom config file (call it for example akka_spray_shade.conf), and copy the following configuration sections in it:
The contents of akka's original reference.conf, but having the akka prefix in the configuration values changed to akka_2_3_9_shade. (this is required for the hard-coded string literal config keys)
The contents of akka's original reference.conf, but having the akka prefix in the configuration values changed to akka_2_3_9_shade and having the root configuration key changed from akka to akka_2_3_9_shade. (this is required for the config keys which do get modified by sbt-assembly)
The contents of spray's original reference.conf, but having the akka prefix in the configuration values changed to akka_2_3_9_shade. (this is required to make sure that spray always refers to the shaded akka)
Now, this custom config file must be provided explicitly during the initialization of the ActorSystem in application's Boot.scala code:
val akkaShadeConfig = ConfigFactory.load("akka_spray_shade")
implicit val system = ActorSystem("custom-actor-system-name", akkaShadeConfig)
A small addition to the accepted answer.
It is not necessary to put this configuration in a custom-named file like akka_spray_shade.conf. The configuration can be placed into application.conf which is being loaded by default during ActorSystem creation when no custom configuration is explicitly specified: ActorSystem("custom-actor-system-name") effectively means ActorSystem("custom-actor-system-name", ConfigFactory.load("application")).
I struggled with this for a long time as well. It turns out that the default merge strategy in sbt-assembly excludes all the reference.conf files. Adding this to build.sbt solved it for me:
assemblyMergeStrategy in assembly := {
case PathList("reference.conf") => MergeStrategy.concat
}

Resources