Using Sparksql and SparkCSV with SparkJob Server - apache-spark

Am trying to JAR a simple scala application which make use of SparlCSV and spark sql to create a Data frame of the CSV file stored in HDFS and then just make a simple query to return the Max and Min of specific column in CSV file.
I am getting error when i use the sbt command to create the JAR which later i will curl to jobserver /jars folder and execute from remote machine
Code:
import com.typesafe.config.{Config, ConfigFactory}
import org.apache.spark.SparkContext._
import org.apache.spark._
import org.apache.spark.SparkConf
import org.apache.spark.SparkContext
import org.apache.spark.sql.SQLContext
object sparkSqlCSV extends SparkJob {
def main(args: Array[String]) {
val conf = new SparkConf().setMaster("local[4]").setAppName("sparkSqlCSV")
val sc = new SparkContext(conf)
val sqlContext = new org.apache.spark.sql.SQLContext(sc)
val config = ConfigFactory.parseString("")
val results = runJob(sc, config)
println("Result is " + results)
}
override def validate(sc: sqlContext, config: Config): SparkJobValidation = {
SparkJobValid
}
override def runJob(sc: sqlContext, config: Config): Any = {
val value = "com.databricks.spark.csv"
val ControlDF = sqlContext.load(value,Map("path"->"hdfs://mycluster/user/Test.csv","header"->"true"))
ControlDF.registerTempTable("Control")
val aggDF = sqlContext.sql("select max(DieX) from Control")
aggDF.collectAsList()
}
}
Error:
[hduser#ptfhadoop01v spark-jobserver]$ sbt ashesh-jobs/package
[info] Loading project definition from /usr/local/hadoop/spark-jobserver/project
Missing bintray credentials /home/hduser/.bintray/.credentials. Some bintray features depend on this.
Missing bintray credentials /home/hduser/.bintray/.credentials. Some bintray features depend on this.
Missing bintray credentials /home/hduser/.bintray/.credentials. Some bintray features depend on this.
Missing bintray credentials /home/hduser/.bintray/.credentials. Some bintray features depend on this.
[info] Set current project to root (in build file:/usr/local/hadoop/spark-jobserver/)
[info] scalastyle using config /usr/local/hadoop/spark-jobserver/scalastyle-config.xml
[info] Processed 2 file(s)
[info] Found 0 errors
[info] Found 0 warnings
[info] Found 0 infos
[info] Finished in 9 ms
[success] created output: /usr/local/hadoop/spark-jobserver/ashesh-jobs/target
[warn] Credentials file /home/hduser/.bintray/.credentials does not exist
[info] Updating {file:/usr/local/hadoop/spark-jobserver/}ashesh-jobs...
[info] Resolving org.fusesource.jansi#jansi;1.4 ...
[info] Done updating.
[info] scalastyle using config /usr/local/hadoop/spark-jobserver/scalastyle-config.xml
[info] Processed 5 file(s)
[info] Found 0 errors
[info] Found 0 warnings
[info] Found 0 infos
[info] Finished in 1 ms
[success] created output: /usr/local/hadoop/spark-jobserver/job-server-api/target
[info] Compiling 2 Scala sources and 1 Java source to /usr/local/hadoop/spark-jobserver/ashesh-jobs/target/scala-2.10/classes...
[error] /usr/local/hadoop/spark-jobserver/ashesh-jobs/src/spark.jobserver/sparkSqlCSV.scala:8: object sql is not a member of package org.apache.spark
[error] import org.apache.spark.sql.SQLContext
[error] ^
[error] /usr/local/hadoop/spark-jobserver/ashesh-jobs/src/spark.jobserver/sparkSqlCSV.scala:14: object sql is not a member of package org.apache.spark
[error] val sqlContext = new org.apache.spark.sql.SQLContext(sc)
[error] ^
[error] /usr/local/hadoop/spark-jobserver/ashesh-jobs/src/spark.jobserver/sparkSqlCSV.scala:25: not found: type sqlContext
[error] override def runJob(sc: sqlContext, config: Config): Any = {
[error] ^
[error] /usr/local/hadoop/spark-jobserver/ashesh-jobs/src/spark.jobserver/sparkSqlCSV.scala:21: not found: type sqlContext
[error] override def validate(sc: sqlContext, config: Config): SparkJobValidation = {
[error] ^
[error] /usr/local/hadoop/spark-jobserver/ashesh-jobs/src/spark.jobserver/sparkSqlCSV.scala:27: not found: value sqlContext
[error] val ControlDF = sqlContext.load(value,Map("path"->"hdfs://mycluster/user/Test.csv","header"->"true"))
[error] ^
[error] /usr/local/hadoop/spark-jobserver/ashesh-jobs/src/spark.jobserver/sparkSqlCSV.scala:29: not found: value sqlContext
[error] val aggDF = sqlContext.sql("select max(DieX) from Control")
[error] ^
[error] 6 errors found
[error] (ashesh-jobs/compile:compileIncremental) Compilation failed
[error] Total time: 10 s, completed May 26, 2016 4:42:52 PM
[hduser#ptfhadoop01v spark-jobserver]$
I guess the main issue being that its missing the dependencies for sparkCSV and sparkSQL , But i have no idea where to place the dependencies before compiling the code using sbt.
I am issuing the following command to package the application , The source codes are placed under "ashesh_jobs" directory
[hduser#ptfhadoop01v spark-jobserver]$ sbt ashesh-jobs/package
I hope someone can help me to resolve this issue.Can you specify me the file where i can specify the dependency and the format to input

The following link has more information in creating other contexts https://github.com/spark-jobserver/spark-jobserver/blob/master/doc/contexts.md
Also you need job-server-extras

add library dependency in buil.sbt
libraryDependencies += "org.apache.spark" %% "spark-sql" % "1.6.2"

Related

Build Failure in Stromcrawler 1.16

i am using stormcrawler 1.16, apache storm 1.2.3, maven 3.6.3 and jdk 1.8.
i have created the project using the articfact command below-
mvn archetype:generate -DarchetypeGroupId=com.digitalpebble.stormcrawler -Darche typeArtifactId=storm-crawler-elasticsearch-archetype -DarchetypeVersion=LATEST
when i run mvn clean package command then i get this error -
/crawler$ mvn clean package
[INFO] Scanning for projects...
[INFO]
[INFO] -------------------------< com.storm:crawler >--------------------------
[INFO] Building crawler 1.0-SNAPSHOT
[INFO] --------------------------------[ jar ]---------------------------------
[INFO]
[INFO] --- maven-clean-plugin:2.5:clean (default-clean) # crawler ---
[INFO] Deleting /home/ubuntu/crawler/target
[INFO]
[INFO] --- maven-resources-plugin:2.6:resources (default-resources) # crawler ---
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] Copying 4 resources
[INFO]
[INFO] --- maven-compiler-plugin:3.2:compile (default-compile) # crawler ---
[INFO] Changes detected - recompiling the module!
[INFO] Compiling 1 source file to /home/ubuntu/crawler/target/classes
[INFO] -------------------------------------------------------------
[ERROR] COMPILATION ERROR :
[INFO] -------------------------------------------------------------
[ERROR] /home/ubuntu/crawler/src/main/java/com/cnf/245/ESCrawlTopology.java:[19,16] ';'
expected
[INFO] 1 error
[INFO] -------------------------------------------------------------
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 2.407 s
[INFO] Finished at: 2020-06-29T20:40:46Z
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.2:compile
(default-compile) on project crawler: Compilation failure
[ERROR] /home/ubuntu/crawler/src/main/java/com/cnf/245/ESCrawlTopology.java:[19,16] ';'
expected
[ERROR]
[ERROR] -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please read the
following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException
i haven't edited pom.xml file.
Here is the content of the ESCrawlTopology.java file -
package com.cnf.245;
import org.apache.storm.metric.LoggingMetricsConsumer;
import org.apache.storm.topology.TopologyBuilder;
import org.apache.storm.tuple.Fields;
import com.digitalpebble.stormcrawler.ConfigurableTopology;
import com.digitalpebble.stormcrawler.Constants;
import com.digitalpebble.stormcrawler.bolt.FetcherBolt;
import com.digitalpebble.stormcrawler.bolt.JSoupParserBolt;
import com.digitalpebble.stormcrawler.bolt.SiteMapParserBolt;
import com.digitalpebble.stormcrawler.bolt.URLFilterBolt;
import com.digitalpebble.stormcrawler.bolt.URLPartitionerBolt;
import
com.digitalpebble.stormcrawler.elasticsearch.bolt.DeletionBolt;
import
com.digitalpebble.stormcrawler.elasticsearch.bolt.IndexerBolt;
import
com.digitalpebble.stormcrawler.elasticsearch.metrics.MetricsConsumer;
import
com.digitalpebble.stormcrawler.elasticsearch.metrics.StatusMetricsBolt;
import
com.digitalpebble.stormcrawler.elasticsearch.persistence.AggregationSpout;
import com.digitalpebble.stormcrawler.elasticsearch.persistence.StatusUpdaterBolt;
import com.digitalpebble.stormcrawler.spout.FileSpout;
import com.digitalpebble.stormcrawler.util.ConfUtils;
import com.digitalpebble.stormcrawler.util.URLStreamGrouping;
/**
* Dummy topology to play with the spouts and bolts on ElasticSearch
*/
public class ESCrawlTopology extends ConfigurableTopology {
public static void main(String[] args) throws Exception {
ConfigurableTopology.start(new ESCrawlTopology(), args);
}
#Override
protected int run(String[] args) {
TopologyBuilder builder = new TopologyBuilder();
int numWorkers = ConfUtils.getInt(getConf(), "topology.workers", 1);
if (args.length == 0) {
System.err.println("ESCrawlTopology seed_dir file_filter");
return -1;
}
// set to the real number of shards ONLY if es.status.routing is set to
// true in the configuration
int numShards = 1;
builder.setSpout("filespout", new FileSpout(args[0], args[1], true));
Fields key = new Fields("url");
builder.setBolt("filter", new URLFilterBolt())
.fieldsGrouping("filespout", Constants.StatusStreamName, key);
builder.setSpout("spout", new AggregationSpout(), numShards);
builder.setBolt("status_metrics", new StatusMetricsBolt())
.shuffleGrouping("spout");
builder.setBolt("partitioner", new URLPartitionerBolt(), numWorkers)
.shuffleGrouping("spout");
builder.setBolt("fetch", new FetcherBolt(), numWorkers)
.fieldsGrouping("partitioner", new Fields("key"));
builder.setBolt("sitemap", new SiteMapParserBolt(), numWorkers)
.localOrShuffleGrouping("fetch");
builder.setBolt("parse", new JSoupParserBolt(), numWorkers)
.localOrShuffleGrouping("sitemap");
builder.setBolt("indexer", new IndexerBolt(), numWorkers)
.localOrShuffleGrouping("parse");
builder.setBolt("status", new StatusUpdaterBolt(), numWorkers)
.fieldsGrouping("fetch", Constants.StatusStreamName,
key)
.fieldsGrouping("sitemap", Constants.StatusStreamName,
key)
.fieldsGrouping("parse", Constants.StatusStreamName,
key)
.fieldsGrouping("indexer", Constants.StatusStreamName,
key)
.customGrouping("filter", Constants.StatusStreamName,
new URLStreamGrouping());
builder.setBolt("deleter", new DeletionBolt(), numWorkers)
.localOrShuffleGrouping("status",
Constants.DELETION_STREAM_NAME);
conf.registerMetricsConsumer(MetricsConsumer.class);
conf.registerMetricsConsumer(LoggingMetricsConsumer.class);
return submit("crawl", conf, builder);
}
}
i put com.cnf.245 in groupId and crawler in articfactId.
someone please explain what cause this error ?
Can you please paste the content of ESCrawlTopology.java? Did you set com.cnf.245 as package name?
The template class gets rewritten during the execution of the archetype with the package name substituted, it could be that the value you set broke the template.
EDIT: you can't use numbers in package names in Java. See Using numbers as package names in java
Use a different package name and groupID.

Derby Metastore directory is created in spark workspace

I have spark 2.1.0 installed and integrated with eclipse and hive2 installed and metastore configured in Mysql also placed hive-site.xml file in spark >> conf folder. I'm trying to access tables already present in hive from eclipse.
when I execute the program metastore folder and derby.log file is been created in spark workspace and eclipse console show the below INFO:
Setting MetaStore object pin classes with hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order"
17/06/13 18:26:43 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
17/06/13 18:26:43 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
17/06/13 18:26:43 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
17/06/13 18:26:43 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
17/06/13 18:26:43 INFO Query: Reading in results for query "org.datanucleus.store.rdbms.query.SQLQuery#0" since the connection used is closing
17/06/13 18:26:43 INFO MetaStoreDirectSql: Using direct SQL, underlying DB is MYSQL
spark can't able to locate the configured mysql metastore database
also throwing the error
Exception in thread "main" java.lang.IllegalArgumentException: Error while instantiating 'org.apache.spark.sql.hive.HiveSessionState':
Code:
import org.apache.spark.SparkContext, org.apache.spark.SparkConf
import com.typesafe.config._
import org.apache.spark.sql.Row
import org.apache.spark.sql.SparkSession
object hivecore {
def main(args: Array[String]) {
val warehouseLocation = "hdfs://HADOOPMASTER:54310/user/hive/warehouse"
val spark = SparkSession
.builder().master("local[*]")
.appName("hivecore")
.config("spark.sql.warehouse.dir", warehouseLocation)
.enableHiveSupport()
.getOrCreate()
import spark.implicits._
import spark.sql
sql("SELECT * FROM sample.source").show()
}
}
Build.sbt
libraryDependencies += "org.apache.spark" % "spark-core_2.11" % "2.1.0"
libraryDependencies += "com.typesafe" % "config" % "1.3.0"
libraryDependencies += "org.apache.spark" % "spark-sql_2.11" % "2.1.0"
libraryDependencies += "org.apache.spark" % "spark-hive_2.11" % "2.1.0"
libraryDependencies += "mysql" % "mysql-connector-java" % "5.1.42"
NOTE : I can able to access the hive tables from Spark-shell
Thanks
When you put context.setMaster(local), it may not look for the spark configurations that you setup in cluster; specially when you trigger it from ECLIPSE.
Make a jar out of it; and trigger from cmd as spark-submit --class <main class package> --master spark://207.184.161.138:7077 --deploy-mode client
The master ip: spark://207.184.161.138:7077 should be replace with your cluster's ip and spark port.
And, remember to initialize HiveContext to trigger query on underlying HIVE.
val hc = new HiveContext(sc)
hc.sql("SELECT * FROM ...")

TypeError: jsdom.createVirtualConsole is not a function

I am trying to build upon the basic Scala.js tutorial and am having this weird error.
There isn't much different from the project set-up as shown in the tutorial, but just in case here's my build.sbt:
enablePlugins(ScalaJSPlugin)
scalaVersion := "2.12.1"
name := "algorithms1_4_34"
version := "1.0"
libraryDependencies ++= Seq("org.scalatest" % "scalatest_2.12" % "3.0.1" % "test",
"org.scalacheck" %% "scalacheck" % "1.13.4" % "test",
"org.scala-js" % "scalajs-dom_sjs0.6_2.12" % "0.9.1",
"be.doeraene" %%% "scalajs-jquery" % "0.9.1")
// This is an application with a main method
scalaJSUseMainModuleInitializer := true
skip in packageJSDependencies := false
jsDependencies +=
"org.webjars" % "jquery" % "2.1.4" / "2.1.4/jquery.js"
jsDependencies += RuntimeDOM
...and the JSApp file:
package ca.vgorcinschi.algorithms1_4_34
import scala.scalajs.js.JSApp
import org.scalajs.jquery.jQuery
object HotAndColdJS extends JSApp{
def main(): Unit = {
jQuery(()=>setupUI())
}
def addClickedMessage():Unit ={
jQuery("body").append("<p>You clicked the button!</p>")
}
def setupUI():Unit = {
//click envokes an event handler
jQuery("#click-me-button").click(()=> addClickedMessage())
jQuery("body").append("<p>Hello World!</p>")
}
}
I can run compile, fastOptJS, reload and eclipse (I am using eclipsePlugin) commands without problems. The only issue is the run command. To be fair I did add something to the flow of the tutorial, but only because running this command (npm install jsdom) from the root of application lead to failure in run as well (npm WARN enoent ENOENT). Following this as advised here I ran:
npm init
npm install
npm install jsdom
And this is where I am now. This is the error I get when running the app with run:
> run
[info] Running ca.vgorcinschi.algorithms1_4_34.HotAndColdJS
[error] [stdin]:40
[error] virtualConsole: jsdom.createVirtualConsole().sendTo(console),
[error] ^
[error]
[error] TypeError: jsdom.createVirtualConsole is not a function
[error] at [stdin]:40:27
[error] at [stdin]:61:3
[error] at ContextifyScript.Script.runInThisContext (vm.js:23:33)
[error] at Object.runInThisContext (vm.js:95:38)
[error] at Object.<anonymous> ([stdin]-wrapper:6:22)
[error] at Module._compile (module.js:571:32)
[error] at evalScript (bootstrap_node.js:391:27)
[error] at Socket.<anonymous> (bootstrap_node.js:188:13)
[error] at emitNone (events.js:91:20)
[error] at Socket.emit (events.js:188:7)
org.scalajs.jsenv.ExternalJSEnv$NonZeroExitException: Node.js with JSDOM exited with code 1
at org.scalajs.jsenv.ExternalJSEnv$AbstractExtRunner.waitForVM(ExternalJSEnv.scala:107)
at org.scalajs.jsenv.ExternalJSEnv$ExtRunner.run(ExternalJSEnv.scala:156)
at org.scalajs.sbtplugin.ScalaJSPluginInternal$.org$scalajs$sbtplugin$ScalaJSPluginInternal$$jsRun(ScalaJSPluginInternal.scala:697)
at org.scalajs.sbtplugin.ScalaJSPluginInternal$$anonfun$73$$anonfun$apply$48$$anonfun$apply$49.apply(ScalaJSPluginInternal.scala:814)
at org.scalajs.sbtplugin.ScalaJSPluginInternal$$anonfun$73$$anonfun$apply$48$$anonfun$apply$49.apply(ScalaJSPluginInternal.scala:808)
at scala.Function1$$anonfun$compose$1.apply(Function1.scala:47)
[trace] Stack trace suppressed: run last compile:run for the full output.
[error] (compile:run) org.scalajs.jsenv.ExternalJSEnv$NonZeroExitException: Node.js with JSDOM exited with code 1
[error] Total time: 4 s, completed 23-May-2017 9:24:20 PM
I would appreciate if anyone could give me a hand with this.
jsdom v10 introduced some breaking changes wrt. v9, and Scala.js <= 0.6.15 was not prepared for those. That is what's causing the error you're hitting.
Upgrading to Scala.js 0.6.16 will fix your issue. It supports both jsdom v9 and v10.

Getting SparkFlumeProtocol and EventBatch not found errors when building Spark 1.6.2 on CentOS7

I was trying to build Spark 1.6.2 on CentOS7 and ran into the error below:
[error] /home/pateln16/spark-1.6.2/external/flume-sink/src/main/scala/org/apache/spark/streaming/flume/sink/SparkAvroCallbackHandler.scala:45: not found: type SparkFlumeProtocol
[error] val transactionTimeout: Int, val backOffInterval: Int) extends SparkFlumeProtocol with Logging {
[error] ^
[error] /home/pateln16/spark-1.6.2/external/flume-sink/src/main/scala/org/apache/spark/streaming/flume/sink/SparkAvroCallbackHandler.scala:70: not found: type EventBatch
[error] override def getEventBatch(n: Int): EventBatch = {
[error] ^
[error] /home/pateln16/spark-1.6.2/external/flume-sink/src/main/scala/org/apache/spark/streaming/flume/sink/TransactionProcessor.scala:80: not found: type EventBatch
[error] def getEventBatch: EventBatch = {
[error] ^
[error] /home/pateln16/spark-1.6.2/external/flume-sink/src/main/scala/org/apache/spark/streaming/flume/sink/SparkSinkUtils.scala:25: not found: type EventBatch
[error] def isErrorBatch(batch: EventBatch): Boolean = {
[error] ^
[error] /home/pateln16/spark-1.6.2/external/flume-sink/src/main/scala/org/apache/spark/streaming/flume/sink/SparkAvroCallbackHandler.scala:85: not found: type EventBatch
[error] new EventBatch("Spark sink has been stopped!", "", java.util.Collections.emptyList())
[error] ^
[warn] Class org.jboss.netty.channel.ChannelFactory not found - continuing with a stub.
[warn] Class org.jboss.netty.channel.ChannelFactory not found - continuing with a stub.
[warn] Class org.jboss.netty.channel.ChannelPipelineFactory not found - continuing with a stub.
[warn] Class org.jboss.netty.handler.execution.ExecutionHandler not found - continuing with a stub.
[warn] Class org.jboss.netty.channel.ChannelFactory not found - continuing with a stub.
[warn] Class org.jboss.netty.handler.execution.ExecutionHandler not found - continuing with a stub.
[warn] Class org.jboss.netty.channel.group.ChannelGroup not found - continuing with a stub.
[error] /home/pateln16/spark-1.6.2/external/flume-sink/src/main/scala/org/apache/spark/streaming/flume/sink/SparkSink.scala:86: not found: type SparkFlumeProtocol
[error] val responder = new SpecificResponder(classOf[SparkFlumeProtocol], handler.get)
I meet the same problem on Spark 2.0.0. I think the reason is that file 'external\flume-sink\src\main\avro\sparkflume.avdl' is not complied well.
The problem can be resolved by:
Download Apache Avro
http://avro.apache.org/docs/current/gettingstartedjava.html
I downloaded all jar files into folder 'C:\Downloads\avro'.
Go to folder 'external\flume-sink\src\main\avro'
compile sparkflume.avdl to java files
java -jar C:\Downloads\avro\avro-tools-1.8.1.jar idl sparkflume.avdl > sparkflume.avpr
java -jar C:\Downloads\avro\avro-tools-1.8.1.jar compile -string protocol sparkflume.avpr ..\scala
recompile your projects.

Error when using SparkJob with NamedRddSupport

Goal is to create the following on a local instance of Spark JobServer:
object foo extends SparkJob with NamedRddSupport
Question: How can I fix the following error which happens on every job:
{
"status": "ERROR",
"result": {
"message": "Ask timed out on [Actor[akka://JobServer/user/context-supervisor/439b2467-spark.jobserver.genderPrediction#884262439]] after [10000 ms]",
"errorClass": "akka.pattern.AskTimeoutException",
"stack: ["akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:334)", "akka.actor.Scheduler$$anon$7.run(Scheduler.scala:117)", "scala.concurrent.Future$InternalCallbackExecutor$.scala$concurrent$Future$InternalCallbackExecutor$$unbatchedExecute(Future.scala:694)", "scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:691)", "akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(Scheduler.scala:467)", "akka.actor.LightArrayRevolverScheduler$$anon$8.executeBucket$1(Scheduler.scala:419)", "akka.actor.LightArrayRevolverScheduler$$anon$8.nextTick(Scheduler.scala:423)", "akka.actor.LightArrayRevolverScheduler$$anon$8.run(Scheduler.scala:375)", "java.lang.Thread.run(Thread.java:745)"]
}
}
A more detailed error description by the Spark JobServer:
job-server[ERROR] Exception in thread "pool-100-thread-1" java.lang.AbstractMethodError: spark.jobserver.genderPrediction$.namedObjectsPrivate()Ljava/util/concurrent/atomic/AtomicReference;
job-server[ERROR] at spark.jobserver.JobManagerActor$$anonfun$spark$jobserver$JobManagerActor$$getJobFuture$4.apply(JobManagerActor.scala:248)
job-server[ERROR] at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
job-server[ERROR] at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
job-server[ERROR] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
job-server[ERROR] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
job-server[ERROR] at java.lang.Thread.run(Thread.java:745)
In case somebody wants to see the code:
package spark.jobserver
import org.apache.spark.SparkContext._
import org.apache.spark.{SparkContext}
import com.typesafe.config.{Config, ConfigFactory}
import collection.JavaConversions._
import scala.io.Source
object genderPrediction extends SparkJob with NamedRddSupport
{
// Main function
def main(args: scala.Array[String])
{
val sc = new SparkContext()
sc.hadoopConfiguration.set("fs.tachyon.impl", "tachyon.hadoop.TFS")
val config = ConfigFactory.parseString("")
val results = runJob(sc, config)
}
def validate(sc: SparkContext, config: Config): SparkJobValidation = {SparkJobValid}
def runJob(sc: SparkContext, config: Config): Any =
{
return "ok";
}
}
Version information:
Spark is 1.5.0 - SparkJobServer is latest version
Thank you all very much in advance!
Adding more explanation to #noorul 's answer
It seems like you compiled the code with an old version of SJS and you are running it with the latest.
NamedObjects were recently added. You are getting AbstractMethodError because your server expects NamedObjects support and you didn't compile the code with that.
Also: you don't need the main method there since it won't be executed by SJS.
Ensure that your.compile and run time library versions of dependent packages are same.

Resources