ERROR Instrumentation: java.lang.ClassNotFoundException: ai.h2o.sparkling.ml.models.H2OGBMMOJOModel - apache-spark

I am working on pysparkling in Databricks. I have built a model with pyspark transformers and h2o pysparkling algorithm. When I log the model on to mlflow and deploy in from a job cluster, I get the following error in the job cluster logs. I have mentioned pysparkling as a dependancy in _mlflow_conda_env. Any ideas about the workaround for this?! Thanks in advance!
ERROR Instrumentation: java.lang.ClassNotFoundException: ai.h2o.sparkling.ml.models.H2OGBMMOJOModel
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at java.lang.ClassLoader.loadClass(ClassLoader.java:419)
at java.lang.ClassLoader.loadClass(ClassLoader.java:352)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at org.apache.spark.util.Utils$.classForName(Utils.scala:227)
at org.apache.spark.ml.util.DefaultParamsReader$.loadParamsInstanceReader(ReadWrite.scala:630)
at org.apache.spark.ml.Pipeline$SharedReadWrite$.$anonfun$load$4(Pipeline.scala:276)
at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238)
at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)
at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:198)
at scala.collection.TraversableLike.map(TraversableLike.scala:238)
at scala.collection.TraversableLike.map$(TraversableLike.scala:231)
at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:198)
at org.apache.spark.ml.Pipeline$SharedReadWrite$.$anonfun$load$3(Pipeline.scala:274)
at org.apache.spark.ml.util.Instrumentation$.$anonfun$instrumented$1(Instrumentation.scala:284)
at scala.util.Try$.apply(Try.scala:213)
at org.apache.spark.ml.util.Instrumentation$.instrumented(Instrumentation.scala:284)
at org.apache.spark.ml.Pipeline$SharedReadWrite$.load(Pipeline.scala:268)
at org.apache.spark.ml.PipelineModel$PipelineModelReader.$anonfun$load$7(Pipeline.scala:356)
at org.apache.spark.ml.MLEvents.withLoadInstanceEvent(events.scala:161)
at org.apache.spark.ml.MLEvents.withLoadInstanceEvent$(events.scala:156)
at org.apache.spark.ml.util.Instrumentation.withLoadInstanceEvent(Instrumentation.scala:43)
at org.apache.spark.ml.PipelineModel$PipelineModelReader.$anonfun$load$6(Pipeline.scala:355)
at org.apache.spark.ml.util.Instrumentation$.$anonfun$instrumented$1(Instrumentation.scala:284)
at scala.util.Try$.apply(Try.scala:213)
at org.apache.spark.ml.util.Instrumentation$.instrumented(Instrumentation.scala:284)
at org.apache.spark.ml.PipelineModel$PipelineModelReader.load(Pipeline.scala:355)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:380)
at py4j.Gateway.invoke(Gateway.java:295)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:251)
at java.lang.Thread.run(Thread.java:748)enter code here

Related

sparkcontext error while running sparknlp.start()

when i import sparknlp and run
sparknlp.start()
I get the following error
pyspark version 3.1.2
sparknlp version is 4.2.2
Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext.
: org.apache.spark.SparkException: Only one SparkContext should be running in this JVM (see SPARK-2243).The currently running SparkContext was created at:
org.apache.spark.api.java.JavaSparkContext.(JavaSparkContext.scala:58)
java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:490)
py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
py4j.Gateway.invoke(Gateway.java:238)
py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
py4j.GatewayConnection.run(GatewayConnection.java:238)
java.base/java.lang.Thread.run(Thread.java:829)
at org.apache.spark.SparkContext$.$anonfun$assertNoOtherContextIsRunning$2(SparkContext.scala:2629)
at scala.Option.foreach(Option.scala:407)
at org.apache.spark.SparkContext$.assertNoOtherContextIsRunning(SparkContext.scala:2626)
at org.apache.spark.SparkContext$.markPartiallyConstructed(SparkContext.scala:2716)
at org.apache.spark.SparkContext.(SparkContext.scala:95)
at org.apache.spark.api.java.JavaSparkContext.(JavaSparkContext.scala:58)
at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:490)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:238)
at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.base/java.lang.Thread.run(Thread.java:829)

Janusgraph connect to remove server cannot instantiate implementation of CQLStoreManager

I am running a cassandara cluster on ec2 and janusgraph node as another ec2 instance as well. I have these properties file for janus graph(janusgraph.properties)
storage.backend = cassandra
storage.hostname = [xxx.xx.xx.xx] #its valid ip and is correct, just hidden
and from gremlin coonsole i do
graph = JanusGraphFactory.open('janusgraph.properties')
It seems like it is not being able to connect with cassandra and error is
Could not instantiate implementation:
org.janusgraph.diskstorage.cql.CQLStoreManager
I am not able to understand it.
Detailed error is:
java.lang.IllegalArgumentException: Could not instantiate implementation: org.janusgraph.diskstorage.cql.CQLStoreManager at org.janusgraph.util.system.ConfigurationUtil.instantiate(ConfigurationUtil.java:64) at org.janusgraph.diskstorage.Backend.getImplementationClass(Backend.java:440) at org.janusgraph.diskstorage.Backend.getStorageManager(Backend.java:411) at org.janusgraph.graphdb.configuration.builder.GraphDatabaseConfigurationBuilder.build(GraphDatabaseConfigurationBuilder.java:50) at org.janusgraph.core.JanusGraphFactory.open(JanusGraphFactory.java:161) at org.janusgraph.core.JanusGraphFactory.open(JanusGraphFactory.java:132) at org.janusgraph.core.JanusGraphFactory.open(JanusGraphFactory.java:79) at org.janusgraph.core.JanusGraphFactory$open.call(Unknown Source) at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:47) at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:115) at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:127) at groovysh_evaluate.run(groovysh_evaluate:3) at groovysh_evaluate$run.call(Unknown Source) at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:47) at groovysh_evaluate$run.call(Unknown Source) at org.codehaus.groovy.tools.shell.Interpreter.evaluate(Interpreter.groovy:77) at org.codehaus.groovy.tools.shell.Evaluator$evaluate.call(Unknown Source) at org.codehaus.groovy.tools.shell.Groovysh.execute(Groovysh.groovy:199) at org.apache.tinkerpop.gremlin.console.GremlinGroovysh.super$3$execute(GremlinGroovysh.groovy) at sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:101) at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:323) at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1217) at org.codehaus.groovy.runtime.ScriptBytecodeAdapter.invokeMethodOnSuperN(ScriptBytecodeAdapter.java:144) at org.apache.tinkerpop.gremlin.console.GremlinGroovysh.execute(GremlinGroovysh.groovy:72) at org.codehaus.groovy.tools.shell.Shell.leftShift(Shell.groovy:120) at org.codehaus.groovy.tools.shell.Shell$leftShift$1.call(Unknown Source) at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:47) at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:115) at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:127) at org.codehaus.groovy.tools.shell.ShellRunner.work(ShellRunner.groovy:93) at org.codehaus.groovy.tools.shell.InteractiveShellRunner.super$2$work(InteractiveShellRunner.groovy) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:101) at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:323) at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1217) at org.codehaus.groovy.runtime.ScriptBytecodeAdapter.invokeMethodOnSuperN(ScriptBytecodeAdapter.java:144) at org.codehaus.groovy.runtime.ScriptBytecodeAdapter.invokeMethodOnSuper0(ScriptBytecodeAdapter.java:164) at org.codehaus.groovy.tools.shell.InteractiveShellRunner.work(InteractiveShellRunner.groovy:138) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.codehaus.groovy.runtime.callsite.PlainObjectMetaMethodSite.doInvoke(PlainObjectMetaMethodSite.java:43) at org.codehaus.groovy.runtime.callsite.PogoMetaMethodSite$PogoCachedMethodSiteNoUnwrapNoCoerce.invoke(PogoMetaMethodSite.java:190) at org.codehaus.groovy.runtime.callsite.PogoMetaMethodSite.callCurrent(PogoMetaMethodSite.java:58) at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCallCurrent(CallSiteArray.java:51) at org.codehaus.groovy.runtime.callsite.AbstractCallSite.callCurrent(AbstractCallSite.java:156) at org.codehaus.groovy.runtime.callsite.AbstractCallSite.callCurrent(AbstractCallSite.java:160) at org.codehaus.groovy.tools.shell.ShellRunner.run(ShellRunner.groovy:57) at org.codehaus.groovy.tools.shell.InteractiveShellRunner.super$2$run(InteractiveShellRunner.groovy) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:101) at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:323) at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1217) at org.codehaus.groovy.runtime.ScriptBytecodeAdapter.invokeMethodOnSuperN(ScriptBytecodeAdapter.java:144) at org.codehaus.groovy.runtime.ScriptBytecodeAdapter.invokeMethodOnSuper0(ScriptBytecodeAdapter.java:164) at org.codehaus.groovy.tools.shell.InteractiveShellRunner.run(InteractiveShellRunner.groovy:97) at org.codehaus.groovy.vmplugin.v7.IndyInterface.selectMethod(IndyInterface.java:234) at org.apache.tinkerpop.gremlin.console.Console.<init>(Console.groovy:153) at org.codehaus.groovy.vmplugin.v7.IndyInterface.selectMethod(IndyInterface.java:234) at org.apache.tinkerpop.gremlin.console.Console.main(Console.groovy:466) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.janusgraph.util.system.ConfigurationUtil.instantiate(ConfigurationUtil.java:58) ... 69 more Caused by: com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: []:9042 (com.datastax.driver.core.exceptions.TransportException: [[]:9042] Cannot connect)) at com.datastax.driver.core.ControlConnection.reconnectInternal(ControlConnection.java:268) at com.datastax.driver.core.ControlConnection.connect(ControlConnection.java:107) at com.datastax.driver.core.Cluster$Manager.negotiateProtocolVersionAndConnect(Cluster.java:1652) at com.datastax.driver.core.Cluster$Manager.init(Cluster.java:1571) at com.datastax.driver.core.Cluster.init(Cluster.java:208) at com.datastax.driver.core.Cluster.connectAsync(Cluster.java:376) at com.datastax.driver.core.Cluster.connectAsync(Cluster.java:355) at com.datastax.driver.core.Cluster.connect(Cluster.java:305) at org.janusgraph.diskstorage.cql.CQLStoreManager.initializeSession(CQLStoreManager.java:302) at org.janusgraph.diskstorage.cql.CQLStoreManager.<init>(CQLStoreManager.java:174) ... 74 more
So i figured it out.
Janusgraph documentation says to put one of the node in cluster as
storage.hostname = [IP-address-of-one-Cassandra-EC2-instance]
Since it wasnt in form, i thought its an array syntax
Replacing [xx.xx.xx.xx] with xx.xx.xx.xx worked
I faced the similar issue but the EC2 instances were brought down by administrator and hence the issue.
Once I restarted the EC2 instances, the issue was resolved.

Reading Data from HDFS of different machine using Spark

I have to read data stored in HDFS of a different machine and needs to be accessed through Spark for being read.
How to do that? Full HDFS address along with port doesn't seem to work.
Anyone did it before?
Py4JJavaError: An error occurred while calling o244.load.
: java.net.ConnectException: Call From Sandeeps-MacBook-Pro.local/192.168.50.188 to ec2-35-154-242-76.ap-south-1.compute.amazonaws.com:9000 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:792)
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:732)
at org.apache.hadoop.ipc.Client.call(Client.java:1479)
at org.apache.hadoop.ipc.Client.call(Client.java:1412)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
at com.sun.proxy.$Proxy17.getFileInfo(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:771)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy18.getFileInfo(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:2108)
at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1305)
at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1301)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1317)
at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1426)
at org.apache.spark.sql.execution.datasources.DataSource$.org$apache$spark$sql$execution$datasources$DataSource$$checkAndGlobPathIfNecessary(DataSource.scala:714)
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$15.apply(DataSource.scala:389)
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$15.apply(DataSource.scala:389)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
at scala.collection.immutable.List.foreach(List.scala:381)
at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241)
at scala.collection.immutable.List.flatMap(List.scala:344)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:388)
at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:239)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:227)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:174)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282
You are trying to reach the host ec2-35-154-242-76.ap-south-1.compute.amazonaws.com:9000, but the ConnectionRefused error indicates that is not available.
Because it seems to be an AWS host, you need to create some firewall rules to enable your IP Sandeeps-MacBook-Pro.local/192.168.50.188 access your cluster. As your IP is private, I'm afraid that you will need to figure out the public one.

Error starting spark-jobserver with Apache Spark 2.0.2

I'm trying to start the spark-jobserver. I can't find any reference to this akka library on the installation steps provided on the GitHub spark_jobserver page.
I'm running spark in standalone mode on a single server which act as a master/node.
But when I execute
./job-server/server_start.sh
it shows the following error:
error while starting up loggers
akka.ConfigurationException: Logger specified in config can't be loaded [akka.event.slf4j.Slf4jLogger] due to [java.lang.ClassNotFoundException: akka.event.slf4j.Slf4jLogger]
at akka.event.LoggingBus$$anonfun$4$$anonfun$apply$1.applyOrElse(Logging.scala:116)
at akka.event.LoggingBus$$anonfun$4$$anonfun$apply$1.applyOrElse(Logging.scala:115)
at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
at scala.util.Failure$$anonfun$recover$1.apply(Try.scala:216)
at scala.util.Try$.apply(Try.scala:192)
at scala.util.Failure.recover(Try.scala:216)
at akka.event.LoggingBus$$anonfun$4.apply(Logging.scala:115)
at akka.event.LoggingBus$$anonfun$4.apply(Logging.scala:110)
at scala.collection.TraversableLike$WithFilter$$anonfun$map$2.apply(TraversableLike.scala:683)
at scala.collection.Iterator$class.foreach(Iterator.scala:893)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
at scala.collection.TraversableLike$WithFilter.map(TraversableLike.scala:682)
at akka.event.LoggingBus$class.startDefaultLoggers(Logging.scala:110)
at akka.event.EventStream.startDefaultLoggers(EventStream.scala:26)
at akka.actor.LocalActorRefProvider.init(ActorRefProvider.scala:623)
at akka.remote.RemoteActorRefProvider.init(RemoteActorRefProvider.scala:157)
at akka.cluster.ClusterActorRefProvider.init(ClusterActorRefProvider.scala:58)
at akka.actor.ActorSystemImpl.liftedTree2$1(ActorSystem.scala:620)
at akka.actor.ActorSystemImpl._start$lzycompute(ActorSystem.scala:617)
at akka.actor.ActorSystemImpl._start(ActorSystem.scala:617)
at akka.actor.ActorSystemImpl.start(ActorSystem.scala:634)
at akka.actor.ActorSystem$.apply(ActorSystem.scala:142)
at akka.actor.ActorSystem$.apply(ActorSystem.scala:119)
at spark.jobserver.JobServer$.spark$jobserver$JobServer$$makeSupervisorSystem$1(JobServer.scala:154)
at spark.jobserver.JobServer$$anonfun$main$1.apply(JobServer.scala:156)
at spark.jobserver.JobServer$$anonfun$main$1.apply(JobServer.scala:156)
at spark.jobserver.JobServer$.start(JobServer.scala:54)
at spark.jobserver.JobServer$.main(JobServer.scala:156)
at spark.jobserver.JobServer.main(JobServer.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:736)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:185)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:210)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:124)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: akka.event.slf4j.Slf4jLogger
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:274)
at akka.actor.ReflectiveDynamicAccess$$anonfun$getClassFor$1.apply(DynamicAccess.scala:67)
at akka.actor.ReflectiveDynamicAccess$$anonfun$getClassFor$1.apply(DynamicAccess.scala:66)
at scala.util.Try$.apply(Try.scala:192)
at akka.actor.ReflectiveDynamicAccess.getClassFor(DynamicAccess.scala:66)
at akka.event.LoggingBus$$anonfun$4.apply(Logging.scala:113)
... 33 more
Exception in thread "main" akka.ConfigurationException: Could not start logger due to [akka.ConfigurationException: Logger specified in config can't be loaded [akka.event.slf4j.Slf4jLogger] due to [java.lang.ClassNotFoundException: akka.event.slf4j.Slf4jLogger]]
at akka.event.LoggingBus$class.startDefaultLoggers(Logging.scala:144)
at akka.event.EventStream.startDefaultLoggers(EventStream.scala:26)
at akka.actor.LocalActorRefProvider.init(ActorRefProvider.scala:623)
at akka.remote.RemoteActorRefProvider.init(RemoteActorRefProvider.scala:157)
at akka.cluster.ClusterActorRefProvider.init(ClusterActorRefProvider.scala:58)
at akka.actor.ActorSystemImpl.liftedTree2$1(ActorSystem.scala:620)
at akka.actor.ActorSystemImpl._start$lzycompute(ActorSystem.scala:617)
at akka.actor.ActorSystemImpl._start(ActorSystem.scala:617)
at akka.actor.ActorSystemImpl.start(ActorSystem.scala:634)
at akka.actor.ActorSystem$.apply(ActorSystem.scala:142)
at akka.actor.ActorSystem$.apply(ActorSystem.scala:119)
at spark.jobserver.JobServer$.spark$jobserver$JobServer$$makeSupervisorSystem$1(JobServer.scala:154)
at spark.jobserver.JobServer$$anonfun$main$1.apply(JobServer.scala:156)
at spark.jobserver.JobServer$$anonfun$main$1.apply(JobServer.scala:156)
at spark.jobserver.JobServer$.start(JobServer.scala:54)
at spark.jobserver.JobServer$.main(JobServer.scala:156)
at spark.jobserver.JobServer.main(JobServer.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:736)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:185)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:210)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:124)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Two steps to solve this issue:
The spark-job-server.jar created after execute ./bin/server_package.sh < enviroment> it have to be moved to the jars folder located on $SPARK_HOME
Nevertheless I had to download akka-slf4j_2.11.0-RC3-2.3.0.jar and put it on the $SPARK_HOME/jars folder too.

Oozie > Spark action containing Hive context can not work because of exceptions

In my project, we need to add a Spark action containing Hive context to an Oozie workflow. Previous Spark actions worked properly because previous Spark actions do not contain HiveContext. However, when the Spark action contains HiveContext, it can not work anymore because of error:
2016-05-17T16:39:41.878+0300 WARN [Hive:168] Failed to access metastore. This class should not accessed in runtime.
org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
at org.apache.hadoop.hive.ql.metadata.Hive.getAllDatabases(Hive.java:1236)
at org.apache.hadoop.hive.ql.metadata.Hive.reloadFunctions(Hive.java:174)
at org.apache.hadoop.hive.ql.metadata.Hive.<clinit>(Hive.java:166)
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:503)
at org.apache.spark.sql.hive.client.ClientWrapper.<init>(ClientWrapper.scala:204)
at org.apache.spark.sql.hive.client.IsolatedClientLoader.createClient(IsolatedClientLoader.scala:238)
at org.apache.spark.sql.hive.HiveContext.executionHive$lzycompute(HiveContext.scala:218)
at org.apache.spark.sql.hive.HiveContext.executionHive(HiveContext.scala:208)
at org.apache.spark.sql.hive.HiveContext.setConf(HiveContext.scala:440)
at org.apache.spark.sql.SQLContext$$anonfun$5.apply(SQLContext.scala:287)
at org.apache.spark.sql.SQLContext$$anonfun$5.apply(SQLContext.scala:286)
at scala.collection.Iterator$class.foreach(Iterator.scala:727)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
at org.apache.spark.sql.SQLContext.<init>(SQLContext.scala:286)
at org.apache.spark.sql.hive.HiveContext.<init>(HiveContext.scala:90)
at org.apache.spark.sql.hive.HiveContext.<init>(HiveContext.scala:101)
at ...Job.onStart(InvoiceRevenueInvoiceJob.java:63)
at ...job.spark.SparkJob.execute(SparkJob.java:79)
at ...spark.SparkJobLauncher.launch(SparkJobLauncher.java:73)
at ...job.spark.SparkJobLauncher.main(SparkJobLauncher.java:50)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:542)
Caused by: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1523)
at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.<init>(RetryingMetaStoreClient.java:86)
at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:132)
at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:104)
at org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:3078)
at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3097)
at org.apache.hadoop.hive.ql.metadata.Hive.getAllDatabases(Hive.java:1234)
... 26 more
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1521)
... 32 more
Caused by: java.lang.NoClassDefFoundError: org/datanucleus/enhancement/Detachable
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:760)
at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:467)
at java.net.URLClassLoader.access$100(URLClassLoader.java:73)
at java.net.URLClassLoader$1.run(URLClassLoader.java:368)
at java.net.URLClassLoader$1.run(URLClassLoader.java:362)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:361)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at org.apache.hadoop.hive.metastore.ObjectStore.<clinit>(ObjectStore.java:180)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at org.apache.hadoop.hive.metastore.MetaStoreUtils.getClass(MetaStoreUtils.java:1489)
at org.apache.hadoop.hive.metastore.RawStoreProxy.getProxy(RawStoreProxy.java:63)
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newRawStore(HiveMetaStore.java:593)
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMS(HiveMetaStore.java:571)
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB(HiveMetaStore.java:620)
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:461)
at org.apache.hadoop.hive.metastore.RetryingHMSHandler.<init>(RetryingHMSHandler.java:66)
at org.apache.hadoop.hive.metastore.RetryingHMSHandler.getProxy(RetryingHMSHandler.java:72)
at org.apache.hadoop.hive.metastore.HiveMetaStore.newRetryingHMSHandler(HiveMetaStore.java:5762)
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:199)
at org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.<init>(SessionHiveMetaStoreClient.java:74)
... 37 more
Caused by: java.lang.ClassNotFoundException: org.datanucleus.enhancement.Detachable
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 63 more
Versions:
Spark: 1.6.1
Hive: 1.2
Oozie: 4.2.0
Anyone has any idea about how to fix this problem? Thanks in advance

Resources