I use Apache Commons Lang3's SerializationUtils in the code.
SerializationUtils.serialize()
to store a customized class as files into disk and
SerializationUtils.deserialize(byte[])
to restore them again.
In the local environment (Mac OS), all serialized files can be deserialized normally and no error happens. But when I copy these serialized files into HDFS, and read them from HDFS by using Spark/Scala, a SerializeException happens.
The Apache Commons Lang3 version is:
<dependency>
<groupId>org.apache.commons</groupId>
<artifactId>commons-lang3</artifactId>
<version>3.9</version>
</dependency>
The deserialize code like this:
public static Block deserializeFrom(byte[] bytes) {
try {
Block b = SerializationUtils.deserialize(bytes);
System.out.println("b="+b);
return b;
} catch (ClassCastException e) {
System.out.println("ClassCastException");
e.printStackTrace();
} catch (IllegalArgumentException e) {
System.out.println("IllegalArgumentException");
e.printStackTrace();
} catch (SerializationException e) {
System.out.println("SerializationException");
e.printStackTrace();
}
return null;
}
The Spark code is:
val fis = spark.sparkContext.binaryFiles("/folder/abc*.file")
val RDD = fis.map(x => {
val content = x._2.toArray()
val b = Block.deserializeFrom(content)
...
}
All codes above can run successfully in Spark local mode, but when run it in Yarn cluster mode, an error happens. The stack error as below:
org.apache.commons.lang3.SerializationException: java.lang.ClassNotFoundException: com.XXXX.XXXX
at org.apache.commons.lang3.SerializationUtils.deserialize(SerializationUtils.java:227)
at org.apache.commons.lang3.SerializationUtils.deserialize(SerializationUtils.java:265)
at com.com.XXXX.XXXX.deserializeFrom(XXX.java:81)
at com.XXX.FFFF$$anonfun$3.apply(BXXXX.scala:157)
at com.XXX.FFFF$$anonfun$3.apply(BXXXX.scala:153)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
at scala.collection.Iterator$class.foreach(Iterator.scala:893)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:104)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:48)
at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:310)
at scala.collection.AbstractIterator.to(Iterator.scala:1336)
at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:302)
at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1336)
at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:289)
at scala.collection.AbstractIterator.toArray(Iterator.scala:1336)
at org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$12.apply(RDD.scala:945)
at org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$12.apply(RDD.scala:945)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2074)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2074)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:109)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.ClassNotFoundException: com.XXXX.XXXX
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at java.io.ObjectInputStream.resolveClass(ObjectInputStream.java:686)
at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1868)
at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1751)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2042)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1573)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:431)
at org.apache.commons.lang3.SerializationUtils.deserialize(SerializationUtils.java:223)
I've check the loaded byte[]'s length, both from local and from HDFS are same. But why it can not be deserialized from HDFS?
This may be a classloader issue. Suppose your application is deployed to a Java server. The server will have loaded its own classes including library code it may need, for example SerializationUtils from Apache commons-lang3. When your application is deployed to it, the server may provide it with a separate classloader which inherits from the server's classloader. Let's call the server's classloader Cl-S and the deployed application's classloader Cl-A.
At some point the application wishes to deserialize an object from a byte[]. So it uses org.apache.commons.lang3.SerializationUtils. Cl-A is asked to provide that class. The first time around Cl-A won't have it so it has to load it in. But a classloader will commonly first ask its parent for a class before trying to load it by itself. Cl-A asks Cl-S if it happens to have SerializationUtils. If it does, it returns the class. Now the application can use it.
Things go wrong when you then perform the deserialization. The deserialize method is generic. This line
Block b = SerializationUtils.deserialize(bytes)
has the type Block inferred. The method will internally try to cast the deserialized Object to Block. But of course, to do so it must know class Block. When performing the method Java will go looking for that class. But for this it queries the classloader that loaded in SerializationUtils. This is Cl-S. Cl-S is the server's classloader, it has no knowledge of your application's Block class. so you get a ClassNotFoundException.
The classloader assigned to the application has access to your application's classes and its parent classloader's classes. The server classloader can't go in the other direction, it can't get classes from your application. Application servers, like Java EE ones (Wildfly, Glassfish etc.) typically use this to allow multiple applications to run in the server but remain separated, or to implement a module system so certain modules can be shared across applications to reduce their size and memory footprint.
Serializing and deserializing objects in Java is simple. Just do it yourself or write a couple methods for it rather than dragging in a library that opens you up to opaque issues like this, version conflicts and bloat.
Related
spark version: 2.3
Spark streaming application is streaming from a hdfs path
Dataset<Row> lines = spark
.readStream()
.format("text")
.load("path");
And after some transformations of Data, for one file, the job supposed to be in succeeded status.
Job listner is added for job end and it moves the file when it's triggered.
#Override
public void onJobENd(SparkListenerJobEnd jobEnd) {
// Move source file to some other location which is finished processing.
}
Files gets moved successfully to another location. But at the same time (exact time stamp) spark throws following file not found exception.This happens at random and cannot be replicated. But happens often
Even though the particular job is ended spark is still referring to the file somehow.
How to make sure once a job is ended file is not referred by spark and avoid this file not found issue
I could found that on :here
SparkListenerJobEnd
DAGScheduler does cleanUpAfterSchedulerStop, handleTaskCompletion, failJobAndIndependentStages, and markMapStageJobAsFinished.
Same question with different approach
Exception:
java.io.FileNotFoundException: File does not exist: <filename>
at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:66)
at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:56)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:1932)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1873)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1853)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1825)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:559)
at org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.getBlockLocations(AuthorizationProviderProxyClientProtocol.java:87)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:363)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1060)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2040)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2038)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
at org.apache.hadoop.ipc.RemoteExc
Its a bug in spark
https://issues.apache.org/jira/browse/SPARK-24364
Mail thread; http://mail-archives.apache.org/mod_mbox/spark-issues/201805.mbox/%3cJIRA.13161366.1527070555000.13725.1527070560256#Atlassian.JIRA%3e
Fix: https://github.com/apache/spark/pull/21408/commits/c52d972e4ca09e0ede1bb9e60d3c07f80f605f88
Fixed versions: 2.3.1/2.4.0
I'm using Fiji on linux (inside a virtual machine) from the command line.
Specifically I have a python script that calls many fiji macros.
Each time fiji loads it takes about a minute, making it the bottleneck of the process and slowing me considerably.
I wonder if this is normal. Could fiji perhaps be made to load more quickly?
Fiji also outputs a lengthy error message when it is called. I wonder if it is related to the lengthy time it takes to load.
The command I use to call it (for example)
//shared_directory/projects/Fiji.app/ImageJ-linux64 --headless
-macro //shared_directory/projects/retracted/fiji_scripts/patch_cropper.txt
The error message is as follows
java.awt.HeadlessException
at java.awt.GraphicsEnvironment.checkHeadless(GraphicsEnvironment.java:204)
at java.awt.Window.<init>(Window.java:536)
at java.awt.Frame.<init>(Frame.java:420)
at ij.plugin.frame.PlugInFrame.<init>(PlugInFrame.java:13)
at ij.plugin.frame.RoiManager.<init>(RoiManager.java:94)
at ij.macro.Interpreter.getBatchModeRoiManager(Interpreter.java:1875)
at ij.plugin.frame.RoiManager.getInstance(RoiManager.java:1795)
at ij.ImagePlus.deleteRoi(ImagePlus.java:1735)
at ij.ImagePlus.close(ImagePlus.java:391)
at ij.plugin.Commands.closeImage(Commands.java:136)
at ij.plugin.Commands.close(Commands.java:83)
at ij.plugin.Commands.run(Commands.java:29)
at ij.IJ.runPlugIn(IJ.java:187)
at ij.Executer.runCommand(Executer.java:137)
at ij.Executer.run(Executer.java:66)
at ij.IJ.run(IJ.java:297)
at ij.IJ.run(IJ.java:272)
at ij.macro.Functions.doRun(Functions.java:603)
at ij.macro.Functions.doFunction(Functions.java:96)
at ij.macro.Interpreter.doStatement(Interpreter.java:230)
at ij.macro.Interpreter.doStatements(Interpreter.java:218)
at ij.macro.Interpreter.run(Interpreter.java:115)
at ij.macro.Interpreter.run(Interpreter.java:85)
at ij.macro.Interpreter.run(Interpreter.java:96)
at ij.plugin.Macro_Runner.runMacro(Macro_Runner.java:155)
at ij.plugin.Macro_Runner.runMacroFile(Macro_Runner.java:139)
at ij.IJ.runMacroFile(IJ.java:148)
at net.imagej.legacy.IJ1Helper$4.call(IJ1Helper.java:1056)
at net.imagej.legacy.IJ1Helper$4.call(IJ1Helper.java:1052)
at net.imagej.legacy.IJ1Helper.runMacroFriendly(IJ1Helper.java:986)
at net.imagej.legacy.IJ1Helper.runMacroFile(IJ1Helper.java:1052)
at net.imagej.legacy.LegacyCommandline$Macro.handle(LegacyCommandline.java:188)
at org.scijava.console.DefaultConsoleService.processArgs(DefaultConsoleService.java:102)
at net.imagej.legacy.LegacyConsoleService.processArgs(LegacyConsoleService.java:81)
at org.scijava.AbstractGateway.launch(AbstractGateway.java:95)
at net.imagej.Main.launch(Main.java:62)
at net.imagej.Main.main(Main.java:68)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at net.imagej.launcher.ClassLauncher.launch(ClassLauncher.java:279)
at net.imagej.launcher.ClassLauncher.run(ClassLauncher.java:186)
at net.imagej.launcher.ClassLauncher.main(ClassLauncher.java:77)
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option PermSize=128m; support was removed in 8.0
Java HotSpot(TM) 64-Bit Server VM warning: Using incremental CMS is deprecated and will likely be removed in a future release
java.lang.IllegalArgumentException: Cannot handle app name in ij.gui.YesNoCancelDialog's public <init>(java.awt.Frame parent, java.lang.String title, java.lang.String msg)
at net.imagej.patcher.CodeHacker.replaceAppNameInCall(CodeHacker.java:446)
at net.imagej.patcher.LegacyExtensions.insertAppNameHooks(LegacyExtensions.java:419)
at net.imagej.patcher.LegacyExtensions.injectHooks(LegacyExtensions.java:291)
at net.imagej.patcher.LegacyInjector.inject(LegacyInjector.java:308)
at net.imagej.patcher.LegacyInjector.injectHooks(LegacyInjector.java:109)
at net.imagej.patcher.LegacyEnvironment.initialize(LegacyEnvironment.java:101)
at net.imagej.patcher.LegacyEnvironment.applyPatches(LegacyEnvironment.java:495)
at net.imagej.patcher.LegacyInjector.preinit(LegacyInjector.java:397)
at net.imagej.patcher.LegacyInjector.preinit(LegacyInjector.java:376)
at net.imagej.legacy.LegacyService.<clinit>(LegacyService.java:134)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
at java.lang.Class.newInstance(Class.java:442)
at org.scijava.service.ServiceHelper.createServiceRecursively(ServiceHelper.java:302)
at org.scijava.service.ServiceHelper.createExactService(ServiceHelper.java:269)
at org.scijava.service.ServiceHelper.loadService(ServiceHelper.java:231)
at org.scijava.service.ServiceHelper.createServiceRecursively(ServiceHelper.java:340)
at org.scijava.service.ServiceHelper.createExactService(ServiceHelper.java:269)
at org.scijava.service.ServiceHelper.loadService(ServiceHelper.java:231)
at org.scijava.service.ServiceHelper.loadService(ServiceHelper.java:194)
at org.scijava.service.ServiceHelper.loadServices(ServiceHelper.java:166)
at org.scijava.Context.<init>(Context.java:278)
at org.scijava.Context.<init>(Context.java:234)
at org.scijava.Context.<init>(Context.java:174)
at org.scijava.Context.<init>(Context.java:160)
at net.imagej.ImageJ.<init>(ImageJ.java:77)
at net.imagej.Main.launch(Main.java:61)
at net.imagej.Main.main(Main.java:68)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at net.imagej.launcher.ClassLauncher.launch(ClassLauncher.java:279)
at net.imagej.launcher.ClassLauncher.run(ClassLauncher.java:186)
at net.imagej.launcher.ClassLauncher.main(ClassLauncher.java:77)
Caused by: javassist.CannotCompileException: No code replaced!
at net.imagej.patcher.CodeHacker$EagerExprEditor.instrument(CodeHacker.java:1280)
at net.imagej.patcher.CodeHacker.replaceAppNameInCall(CodeHacker.java:443)
... 36 more
Thanks!
I wonder if this is normal. Could fiji perhaps be made to load more quickly?
I don't think this is intended behavior. As this issue is very specific to ImageJ/Fiji, you should better raise this point on the ImageJ forum.
java.lang.IllegalArgumentException: Cannot handle app name in ij.gui.YesNoCancelDialog's public <init>(java.awt.Frame parent, java.lang.String title, java.lang.String msg)
This is likely due to ij1-patcher not handling ij.gui.YesNoCancelDialog to make it independent on the Java AWT framework (which interferes with the goal of running things headless).
You are welcome to submit a patch to the ij1-patcher project to make it deal with YesNoCancelDialog in a similar way as it currently does with GenericDialog, see these lines.
As a workaround, simply use a macro that doesn't use YesNoCancelDialog, i.e. does not run the getBoolean() macro function.
EDIT: as also noted in the ImageJ forum topic, a patch was submitted here.
Until this gets merged and released, you can work around the issue by downgrading ij.jar using Help > Update ImageJ...
TL;DR
I am trying to shade a version of the akka library and bundle it with my application (to be able to run a spray-can server on the CDH 5.7 version of Spark 1.6). The shading process messes up akka's default configuration, and after manually providing a separate version of akka's reference.conf for the shaded akka, it still looks like the 2 versions get mixed up somehow.
Is shading akka versions known to cause problems? What am I doing wrong?
Background
I have a Scala/Spark application currently running on Spark 1.6.1 standalone. The application runs a spray-can http server using spray 1.3.3, which requires akka 2.3.9 (Spark 1.6.1 standalone includes a compatible akka 2.3.11).
I am trying to migrate the application to a new Cloudera-based Spark cluster running the CDH 5.7 version of Spark 1.6. The problem is that Spark 1.6 in CDH 5.7 is bundled with akka 2.2.3 which is not sufficient for spray 1.3.3 to function properly.
Attempted solution
Following the suggestion in this post, I decided to shade akka 2.3.9 and bundle it along with my application. Although this time I stumbled upon a new problem - akka has it's default configuration defined in a reference.conf file, which should be located on the application's classpath. Due to a known issue in sbt-assembly's shading feature, it seems that the shaded akka library would require a separate configuration.
So, I ended up shading akka with the following shade rule:
ShadeRule.rename("akka.**" -> "akka_2_3_9_shade.#1")
.inLibrary("com.typesafe.akka" % "akka-actor_2.10" % "2.3.9")
.inAll
and including an additional reference.conf file in my project, which is identical to akka's original reference.conf, but with all occurances of "akka" replaced with "akka_2_3_9_shade".
Now, though, it seems that the Spark-provided akka gets mixed up somehow with the shaded akka, as I'm getting the following error:
Exception in thread "main" java.lang.IllegalArgumentException: Cannot instantiate MailboxType [akka.dispatch.UnboundedMailbox], defined in [akka.actor.default-mailbox], make sure it has a public constructor with [akka.actor.ActorSystem.Settings, com.typesafe.config.Config] parameters
at akka_2_3_9_shade.dispatch.Mailboxes$$anonfun$1.applyOrElse(Mailboxes.scala:197)
at akka_2_3_9_shade.dispatch.Mailboxes$$anonfun$1.applyOrElse(Mailboxes.scala:195)
at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:33)
at scala.util.Failure$$anonfun$recover$1.apply(Try.scala:185)
at scala.util.Try$.apply(Try.scala:161)
at scala.util.Failure.recover(Try.scala:185)
at akka_2_3_9_shade.dispatch.Mailboxes.lookupConfiguration(Mailboxes.scala:195)
at akka_2_3_9_shade.dispatch.Mailboxes.lookup(Mailboxes.scala:78)
at akka_2_3_9_shade.actor.LocalActorRefProvider.akka$actor$LocalActorRefProvider$$defaultMailbox$lzycompute(ActorRefProvider.scala:561)
at akka_2_3_9_shade.actor.LocalActorRefProvider.akka$actor$LocalActorRefProvider$$defaultMailbox(ActorRefProvider.scala:561)
at akka_2_3_9_shade.actor.LocalActorRefProvider$$anon$1.<init>(ActorRefProvider.scala:568)
at akka_2_3_9_shade.actor.LocalActorRefProvider.rootGuardian$lzycompute(ActorRefProvider.scala:564)
at akka_2_3_9_shade.actor.LocalActorRefProvider.rootGuardian(ActorRefProvider.scala:563)
at akka_2_3_9_shade.actor.LocalActorRefProvider.init(ActorRefProvider.scala:618)
at akka_2_3_9_shade.actor.ActorSystemImpl.liftedTree2$1(ActorSystem.scala:619)
at akka_2_3_9_shade.actor.ActorSystemImpl._start$lzycompute(ActorSystem.scala:616)
at akka_2_3_9_shade.actor.ActorSystemImpl._start(ActorSystem.scala:616)
at akka_2_3_9_shade.actor.ActorSystemImpl.start(ActorSystem.scala:633)
at akka_2_3_9_shade.actor.ActorSystem$.apply(ActorSystem.scala:142)
at akka_2_3_9_shade.actor.ActorSystem$.apply(ActorSystem.scala:109)
at akka_2_3_9_shade.actor.ActorSystem$.apply(ActorSystem.scala:100)
at MyApp.api.Boot$delayedInit$body.apply(Boot.scala:45)
at scala.Function0$class.apply$mcV$sp(Function0.scala:40)
at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12)
at scala.App$$anonfun$main$1.apply(App.scala:71)
at scala.App$$anonfun$main$1.apply(App.scala:71)
at scala.collection.immutable.List.foreach(List.scala:318)
at scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:32)
at scala.App$class.main(App.scala:71)
at MyApp.api.Boot$.main(Boot.scala:28)
at MyApp.api.Boot.main(Boot.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassCastException: interface akka_2_3_9_shade.dispatch.MailboxType is not assignable from class akka.dispatch.UnboundedMailbox
at akka_2_3_9_shade.actor.ReflectiveDynamicAccess$$anonfun$getClassFor$1.apply(DynamicAccess.scala:69)
at akka_2_3_9_shade.actor.ReflectiveDynamicAccess$$anonfun$getClassFor$1.apply(DynamicAccess.scala:66)
at scala.util.Try$.apply(Try.scala:161)
at akka_2_3_9_shade.actor.ReflectiveDynamicAccess.getClassFor(DynamicAccess.scala:66)
at akka_2_3_9_shade.actor.ReflectiveDynamicAccess.CreateInstanceFor(DynamicAccess.scala:84)
... 34 more
The relevant code from my application's Boot.scala file is the following:
[45] implicit val system = ActorSystem()
...
[48] val service = system.actorOf(Props[MyAppApiActor], "MyApp.Api")
...
[52] val port = config.getInt("MyApp.server.port")
[53] IO(Http) ? Http.Bind(service, interface = "0.0.0.0", port = port)
OK, so eventually I managed to solve this.
Turns out akka loads (some of the) configuration settings from the config file using keys that are defined as string literals. You can find a lot of these in akka/actor/ActorSystem.scala, for example.
And it seems that sbt-assembly does not change references to the shaded library/package name in string literals.
Also, some configuration keys are being changed by sbt-assembly's shading. I haven't really taken the time to find where and how exactly they are defined in akka's source, but the following exception, which is being thrown during the ActorSystem init code, proves that this is indeed the case:
ConfigException$Missing: No configuration setting found for key 'akka_2_3_9_shade'
So, the solution it to include a custom config file (call it for example akka_spray_shade.conf), and copy the following configuration sections in it:
The contents of akka's original reference.conf, but having the akka prefix in the configuration values changed to akka_2_3_9_shade. (this is required for the hard-coded string literal config keys)
The contents of akka's original reference.conf, but having the akka prefix in the configuration values changed to akka_2_3_9_shade and having the root configuration key changed from akka to akka_2_3_9_shade. (this is required for the config keys which do get modified by sbt-assembly)
The contents of spray's original reference.conf, but having the akka prefix in the configuration values changed to akka_2_3_9_shade. (this is required to make sure that spray always refers to the shaded akka)
Now, this custom config file must be provided explicitly during the initialization of the ActorSystem in application's Boot.scala code:
val akkaShadeConfig = ConfigFactory.load("akka_spray_shade")
implicit val system = ActorSystem("custom-actor-system-name", akkaShadeConfig)
A small addition to the accepted answer.
It is not necessary to put this configuration in a custom-named file like akka_spray_shade.conf. The configuration can be placed into application.conf which is being loaded by default during ActorSystem creation when no custom configuration is explicitly specified: ActorSystem("custom-actor-system-name") effectively means ActorSystem("custom-actor-system-name", ConfigFactory.load("application")).
I struggled with this for a long time as well. It turns out that the default merge strategy in sbt-assembly excludes all the reference.conf files. Adding this to build.sbt solved it for me:
assemblyMergeStrategy in assembly := {
case PathList("reference.conf") => MergeStrategy.concat
}
I am trying to run my spark application in local mode. To set it up all, I followed this tutorial: http://blog.d2-si.fr/2015/11/05/apache-kafka-3/, (in
French) showing each step to build up the local kafka/zookeeper environment.
Moreover, I use IntelliJ with the following configuration:
val sparkConf = new SparkConf().setAppName("zumbaApp").setMaster("local[2]")
And my run config, for the consumer:
"127.0.0.1:2181" "zumbaApp-gpId" "D2SI" "1"
And for the producer:
"127.0.0.1:9092" "D2SI" "my\Input\File.csv" 300
Beforehand, I checked if the consumer received the inputs from the producer with the default console-producer and console-consumer of kafka_2.10-0.9.0.1 ; it does.
But, I am facing the following error:
java.lang.NoSuchMethodError: org.I0Itec.zkclient.ZkClient.createEphemeral(Ljava/lang/String;Ljava/lang/Object;Ljava/util/List;)V
at kafka.utils.ZkPath$.createEphemeral(ZkUtils.scala:921)
at kafka.utils.ZkUtils.createEphemeralPath(ZkUtils.scala:348)
at kafka.utils.ZkUtils.createEphemeralPathExpectConflict(ZkUtils.scala:363)
at kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener$$anonfun$18.apply(ZookeeperConsumerConnector.scala:839)
at kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener$$anonfun$18.apply(ZookeeperConsumerConnector.scala:833)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:226)
at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39)
at scala.collection.mutable.HashMap.foreach(HashMap.scala:98)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
at scala.collection.AbstractTraversable.map(Traversable.scala:105)
at kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener.reflectPartitionOwnershipDecision(ZookeeperConsumerConnector.scala:833)
at kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener.kafka$consumer$ZookeeperConsumerConnector$ZKRebalancerListener$$rebalance(ZookeeperConsumerConnector.scala:721)
at kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener$$anonfun$syncedRebalance$1$$anonfun$apply$mcV$sp$1.apply$mcVI$sp(ZookeeperConsumerConnector.scala:636)
at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141)
at kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener$$anonfun$syncedRebalance$1.apply$mcV$sp(ZookeeperConsumerConnector.scala:627)
at kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener$$anonfun$syncedRebalance$1.apply(ZookeeperConsumerConnector.scala:627)
at kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener$$anonfun$syncedRebalance$1.apply(ZookeeperConsumerConnector.scala:627)
at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:33)
at kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener.syncedRebalance(ZookeeperConsumerConnector.scala:626)
at kafka.consumer.ZookeeperConsumerConnector.kafka$consumer$ZookeeperConsumerConnector$$reinitializeConsumer(ZookeeperConsumerConnector.scala:967)
at kafka.consumer.ZookeeperConsumerConnector.consume(ZookeeperConsumerConnector.scala:254)
at kafka.consumer.ZookeeperConsumerConnector.createMessageStreams(ZookeeperConsumerConnector.scala:156)
at org.apache.spark.streaming.kafka.KafkaReceiver.onStart(KafkaInputDStream.scala:111)
at org.apache.spark.streaming.receiver.ReceiverSupervisor.startReceiver(ReceiverSupervisor.scala:148)
at org.apache.spark.streaming.receiver.ReceiverSupervisor.start(ReceiverSupervisor.scala:130)
at org.apache.spark.streaming.scheduler.ReceiverTracker$ReceiverTrackerEndpoint$$anonfun$9.apply(ReceiverTracker.scala:575)
at org.apache.spark.streaming.scheduler.ReceiverTracker$ReceiverTrackerEndpoint$$anonfun$9.apply(ReceiverTracker.scala:565)
at org.apache.spark.SparkContext$$anonfun$37.apply(SparkContext.scala:1992)
at org.apache.spark.SparkContext$$anonfun$37.apply(SparkContext.scala:1992)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
I did not succeed in solving this. I thought it was a zookeeper-config error but after comparing with a working version of the application on another machine with the same configuration files, It does not seem it is anymore.
Looks like you have a dependency issue here.
Check the version of the library com.101tec.zkclient in our classpath. Kafka needs the version 0.7
In addition, since kafka_2.10-0.9.0.1 the APIs for both producer and consumer don't use zookeeper anymore. It seems that Spark-streaming uses a 0.8 version of Kafka in your case.
I'm getting this error:
java.lang.ClassCastException: java.net.URL cannot be cast to java.lang.CharSequence
at org.codehaus.groovy.runtime.dgm$948.doMethodInvoke(Unknown Source)>
at org.codehaus.groovy.reflection.GeneratedMetaMethod$Proxy.doMethodInvoke(GeneratedMetaMethod.java:70)
at org.codehaus.groovy.runtime.metaclass.MethodMetaProperty$GetBeanMethodMetaProperty.getProperty(MethodMetaProperty.java:73)
at org.codehaus.groovy.runtime.callsite.GetEffectivePojoPropertySite.getProperty(GetEffectivePojoPropertySite.java:61)
at org.codehaus.groovy.runtime.callsite.AbstractCallSite.callGetProperty(AbstractCallSite.java:227)
at ch.qos.logback.classic.gaffer.GafferConfigurator.run(GafferConfigurator.groovy:44)
at ch.qos.logback.classic.gaffer.GafferUtil.runGafferConfiguratorOn(GafferUtil.java:43)
at ch.qos.logback.classic.util.ContextInitializer.configureByResource(ContextInitializer.java:66)
at ch.qos.logback.classic.util.ContextInitializer.autoConfig(ContextInitializer.java:150)
at org.slf4j.impl.StaticLoggerBinder.init(StaticLoggerBinder.java:85)
at org.slf4j.impl.StaticLoggerBinder.<clinit>(StaticLoggerBinder.java:55)
at org.slf4j.LoggerFactory.bind(LoggerFactory.java:142)
at org.slf4j.LoggerFactory.performInitialization(LoggerFactory.java:121)
at org.slf4j.LoggerFactory.getILoggerFactory(LoggerFactory.java:332)
at org.slf4j.LoggerFactory.getLogger(LoggerFactory.java:284)
at org.apache.commons.logging.impl.SLF4JLogFactory.getInstance(SLF4JLogFactory.java:155)
at org.apache.commons.logging.impl.SLF4JLogFactory.getInstance(SLF4JLogFactory.java:132)
at org.apache.commons.logging.LogFactory.getLog(LogFactory.java:655)
at org.springframework.web.context.ContextLoader.initWebApplicationContext(ContextLoader.java:282)
at org.springframework.web.context.ContextLoaderListener.contextInitialized(ContextLoaderListener.java:106)
at org.apache.catalina.core.StandardContext.listenerStart(StandardContext.java:4728)
at org.apache.catalina.core.StandardContext.startInternal(StandardContext.java:5162)
at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:150)
at org.apache.catalina.core.ContainerBase$StartChild.call(ContainerBase.java:1409)
at org.apache.catalina.core.ContainerBase$StartChild.call(ContainerBase.java:1399)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
when trying to start Tomcat 8 using Logback 1.1.2 and Groovy 2.3.6/Java 8. The URL for the logback.groovy file is coming from a shared jar and looks correct when debugging:
jar:file:/Users/matt.whipple/.gradle/caches/modules-2/files-2.1/com.example/core/1.0.0-SNAPSHOT/c30ff0f9caae010d728248c66cc55f40bf591232/core-1.0.0-SNAPSHOT.jar!/logback.groovy
The issue seems to be with the call to url.text in the GDK. It appears as though this is an issue with this version of Groovy. If in the debugger I try alternate methods to fetch the contents of the URL I get similar ClassCastExceptions, but updating to Groovy 2.4.1 resolves the issue.