Error when running job that queries against Cassandra via Spark SQL through Spark Jobserver - cassandra

So I'm trying to run job that simply runs a query against cassandra using spark-sql, the job is submitted fine and the job starts fine. This code works when it is not being run through spark jobserver (when simply using spark submit). Could someone tell my what is wrong with my job code or configuration files that is causing the error below?
{
"status": "ERROR",
"ERROR": {
"errorClass": "java.util.concurrent.ExecutionException",
"cause": "Failed to open native connection to Cassandra at {127.0.1.1}:9042",
"stack": ["com.datastax.spark.connector.cql.CassandraConnector$.com$datastax$spark$connector$cql$CassandraConnector$$createSes
sion(CassandraConnector.scala:155)", "com.datastax.spark.connector.cql.CassandraConnector$$anonfun$2.apply(CassandraConnector.scal
a:141)", "com.datastax.spark.connector.cql.CassandraConnector$$anonfun$2.apply(CassandraConnector.scala:141)", "com.datastax.spark
.connector.cql.RefCountedCache.createNewValueAndKeys(RefCountedCache.scala:31)", "com.datastax.spark.connector.cql.RefCountedCache
.acquire(RefCountedCache.scala:56)", "com.datastax.spark.connector.cql.CassandraConnector.openSession(CassandraConnector.scala:73)
", "com.datastax.spark.connector.cql.CassandraConnector.withSessionDo(CassandraConnector.scala:101)", "com.datastax.spark.connecto
r.cql.CassandraConnector.withClusterDo(CassandraConnector.scala:112)", "com.datastax.spark.connector.cql.Schema$.fromCassandra(Sch
ema.scala:243)", "org.apache.spark.sql.cassandra.CassandraCatalog$$anon$1.load(CassandraCatalog.scala:22)", "org.apache.spark.sql.
cassandra.CassandraCatalog$$anon$1.load(CassandraCatalog.scala:19)", "com.google.common.cache.LocalCache$LoadingValueReference.loa
dFuture(LocalCache.java:3599)", "com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2379)", "com.google.common.ca
che.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2342)", "com.google.common.cache.LocalCache$Segment.get(LocalCache.java:225
7)", "com.google.common.cache.LocalCache.get(LocalCache.java:4000)", "com.google.common.cache.LocalCache.getOrLoad(LocalCache.java
:4004)", "com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4874)", "org.apache.spark.sql.cassandra.Cassand
raCatalog.lookupRelation(CassandraCatalog.scala:28)", "org.apache.spark.sql.cassandra.CassandraSQLContext$$anon$2.org$apache$spark
$sql$catalyst$analysis$OverrideCatalog$$super$lookupRelation(CassandraSQLContext.scala:218)", "org.apache.spark.sql.catalyst.analy
sis.OverrideCatalog$$anonfun$lookupRelation$3.apply(Catalog.scala:161)", "org.apache.spark.sql.catalyst.analysis.OverrideCatalog$$
anonfun$lookupRelation$3.apply(Catalog.scala:161)", "scala.Option.getOrElse(Option.scala:120)", "org.apache.spark.sql.catalyst.ana
lysis.OverrideCatalog$class.lookupRelation(Catalog.scala:161)", "org.apache.spark.sql.cassandra.CassandraSQLContext$$anon$2.lookup
Relation(CassandraSQLContext.scala:218)", "org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.getTable(Analyzer.sca
la:174)", "org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$6.applyOrElse(Analyzer.scala:186)", "or
g.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$6.applyOrElse(Analyzer.scala:181)", "org.apache.spar
k.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:188)", "org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.appl
y(TreeNode.scala:188)", "org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:51)", "org.apache.spark.sql.
catalyst.trees.TreeNode.transformDown(TreeNode.scala:187)", "org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNod
e.scala:208)", "scala.collection.Iterator$$anon$11.next(Iterator.scala:328)", "scala.collection.Iterator$class.foreach(Iterator.sc
ala:727)", "scala.collection.AbstractIterator.foreach(Iterator.scala:1157)", "scala.collection.generic.Growable$class.$plus$plus$e
q(Growable.scala:48)", "scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)", "scala.collection.mutable.Arra
yBuffer.$plus$plus$eq(ArrayBuffer.scala:47)", "scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273)", "scala.colle
ction.AbstractIterator.to(Iterator.scala:1157)", "scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265)", "sc
ala.collection.AbstractIterator.toBuffer(Iterator.scala:1157)", "scala.collection.TraversableOnce$class.toArray(TraversableOnce.sc
ala:252)", "scala.collection.AbstractIterator.toArray(Iterator.scala:1157)", "org.apache.spark.sql.catalyst.trees.TreeNode.transfo
rmChildrenDown(TreeNode.scala:238)", "org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:193)", "org.apache
.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:178)", "org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelatio
ns$.apply(Analyzer.scala:181)", "org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.apply(Analyzer.scala:171)", "or
g.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1$$anonfun$apply$2.apply(RuleExecutor.scala:61)", "org.apache.spark.
sql.catalyst.rules.RuleExecutor$$anonfun$apply$1$$anonfun$apply$2.apply(RuleExecutor.scala:59)", "scala.collection.LinearSeqOptimi
zed$class.foldLeft(LinearSeqOptimized.scala:111)", "scala.collection.immutable.List.foldLeft(List.scala:84)", "org.apache.spark.sq
l.catalyst.rules.RuleExecutor$$anonfun$apply$1.apply(RuleExecutor.scala:59)", "org.apache.spark.sql.catalyst.rules.RuleExecutor$$a
nonfun$apply$1.apply(RuleExecutor.scala:51)", "scala.collection.immutable.List.foreach(List.scala:318)", "org.apache.spark.sql.cat
alyst.rules.RuleExecutor.apply(RuleExecutor.scala:51)", "org.apache.spark.sql.SQLContext$QueryExecution.analyzed$lzycompute(SQLCon
text.scala:1082)", "org.apache.spark.sql.SQLContext$QueryExecution.analyzed(SQLContext.scala:1082)", "org.apache.spark.sql.SQLCont
ext$QueryExecution.assertAnalyzed(SQLContext.scala:1080)", "org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:133)", "org.apac
he.spark.sql.cassandra.CassandraSQLContext.cassandraSql(CassandraSQLContext.scala:211)", "org.apache.spark.sql.cassandra.Cassandra
SQLContext.sql(CassandraSQLContext.scala:214)", "CassSparkTest$.runJob(CassSparkTest.scala:23)", "CassSparkTest$.runJob(CassSparkT
est.scala:9)", "spark.jobserver.JobManagerActor$$anonfun$spark$jobserver$JobManagerActor$$getJobFuture$4.apply(JobManagerActor.sca
la:235)", "scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)", "scala.concurrent.impl.Future$P
romiseCompletingRunnable.run(Future.scala:24)", "java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)",
"java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)", "java.lang.Thread.run(Thread.java:745)"],
"causingClass": "java.io.IOException",
"message": "java.io.IOException: Failed to open native connection to Cassandra at {127.0.1.1}:9042"
}
}
Here is the job I am running:
import org.apache.spark.{SparkContext, SparkConf}
import com.datastax.spark.connector._
import org.apache.spark.sql.cassandra.CassandraSQLContext
import org.apache.spark.sql._
import spark.jobserver._
import com.typesafe.config.Config
import com.typesafe.config.ConfigFactory
object CassSparkTest extends SparkJob {
def main(args: Array[String]) {
val sc = new SparkContext("spark://192.168.10.11:7077", "test")
val config = ConfigFactory.parseString("")
val results = runJob(sc, config)
println("Results:" + results)
}
override def validate(sc:SparkContext, config: Config): SparkJobValidation = {
SparkJobValid
}
override def runJob(sc:SparkContext, config: Config): Any = {
val sqlC = new CassandraSQLContext(sc)
val df = sqlC.sql(config.getString("input.sql"))
df.collect()
}
}
and here is my configuration file for spark-jobserver
# Template for a Spark Job Server configuration file
# When deployed these settings are loaded when job server starts
#
# Spark Cluster / Job Server configuration
spark {
# spark.master will be passed to each job's JobContext
master = "spark://192.168.10.11:7077"
# master = "mesos://vm28-hulk-pub:5050"
# master = "yarn-client"
# Default # of CPUs for jobs to use for Spark standalone cluster
job-number-cpus = 1
jobserver {
port = 2020
jar-store-rootdir = /tmp/jobserver/jars
jobdao = spark.jobserver.io.JobFileDAO
filedao {
rootdir = /tmp/spark-job-server/filedao/data
}
}
# predefined Spark contexts
# contexts {
# my-low-latency-context {
# num-cpu-cores = 1 # Number of cores to allocate. Required.
# memory-per-node = 512m # Executor memory per node, -Xmx style eg 512m, 1G, etc.
# }
# # define additional contexts here
# }
# universal context configuration. These settings can be overridden, see README.md
context-settings {
num-cpu-cores = 1 # Number of cores to allocate. Required.
memory-per-node = 512m # Executor memory per node, -Xmx style eg 512m, #1G, etc.
# in case spark distribution should be accessed from HDFS (as opposed to being installed on every mesos slave)
# spark.executor.uri = "hdfs://namenode:8020/apps/spark/spark.tgz"
spark-cassandra-connection-host="127.0.0.1"
# uris of jars to be loaded into the classpath for this context. Uris is a string list, or a string separated by commas ','
# dependent-jar-uris = ["file:///some/path/present/in/each/mesos/slave/somepackage.jar"]
dependent-jar-uris = ["file:///home/vagrant/lib/spark-cassandra-connector-assembly-1.3.0-M2-SNAPSHOT.jar"]
# If you wish to pass any settings directly to the sparkConf as-is, add them here in passthrough,
# such as hadoop connection settings that don't use the "spark." prefix
passthrough {
#es.nodes = "192.1.1.1"
}
}
# This needs to match SPARK_HOME for cluster SparkContexts to be created successfully
# home = "/home/spark/spark"
}
# Note that you can use this file to define settings not only for job server,
# but for your Spark jobs as well. Spark job configuration merges with this configuration file as defaults.

#vicg, first you need spark.cassandra.connection.host -- periods not dashes. Also note in the error how the IP is "127.0.1.1", not the one in the config. You can also pass the IP when you create a context, like:
curl -X POST 'localhost:8090/contexts/my-context?spark.cassandra.connection.host=127.0.0.1'
If the above don't work, try the following PR:
https://github.com/spark-jobserver/spark-jobserver/pull/164

Related

Difficulties in using a Gcloud Composer DAG to run a Spark job

I'm playing around with Gcloud Composer, trying to create a DAG that creates a DataProc cluster, runs a simple Spark job, then tears down the cluster. I am trying to run the Spark PI example job.
I understand that when calling DataProcSparkOperator I can choose only to define either the main_jar or the main_class property. When I define main_class, the job fails with the error:
java.lang.ClassNotFoundException: org.apache.spark.examples.SparkPi
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at org.apache.spark.util.Utils$.classForName(Utils.scala:239)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:851)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:198)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:228)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:137)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
When I choose to define the main_jar property, the job fails with the error:
Error: No main class set in JAR; please specify one with --class
Run with --help for usage help or --verbose for debug output
I'm at a bit of a loss as to how to resolve this, as I am kinda new to both Spark and DataProc.
My DAG:
import datetime as dt
from airflow import DAG, models
from airflow.contrib.operators import dataproc_operator as dpo
from airflow.utils import trigger_rule
MAIN_JAR = 'file:///usr/lib/spark/examples/jars/spark-examples.jar'
MAIN_CLASS = 'org.apache.spark.examples.SparkPi'
CLUSTER_NAME = 'quickspark-cluster-{{ ds_nodash }}'
yesterday = dt.datetime.combine(
dt.datetime.today() - dt.timedelta(1),
dt.datetime.min.time())
default_dag_args = {
'start_date': yesterday,
'email_on_failure': False,
'email_on_retry': False,
'retries': 1,
'retry_delay': dt.timedelta(seconds=30),
'project_id': models.Variable.get('gcp_project')
}
with DAG('dataproc_spark_submit', schedule_interval='0 17 * * *',
default_args=default_dag_args) as dag:
create_dataproc_cluster = dpo.DataprocClusterCreateOperator(
project_id = default_dag_args['project_id'],
task_id = 'create_dataproc_cluster',
cluster_name = CLUSTER_NAME,
num_workers = 2,
zone = models.Variable.get('gce_zone')
)
run_spark_job = dpo.DataProcSparkOperator(
task_id = 'run_spark_job',
#main_jar = MAIN_JAR,
main_class = MAIN_CLASS,
cluster_name = CLUSTER_NAME
)
delete_dataproc_cluster = dpo.DataprocClusterDeleteOperator(
project_id = default_dag_args['project_id'],
task_id = 'delete_dataproc_cluster',
cluster_name = CLUSTER_NAME,
trigger_rule = trigger_rule.TriggerRule.ALL_DONE
)
create_dataproc_cluster >> run_spark_job >> delete_dataproc_cluster
I compared it with a successful job using the CLI and saw that, even when the class was populating the Main class or jar field, the path to the Jar was specified in Jar files:
Checking the operator I noticed there is also a dataproc_spark_jars parameter which is not mutually exclusive to main_class:
run_spark_job = dpo.DataProcSparkOperator(
task_id = 'run_spark_job',
dataproc_spark_jars = [MAIN_JAR],
main_class = MAIN_CLASS,
cluster_name = CLUSTER_NAME
)
Adding it did the trick:

Spark, kerberos, yarn-cluster -> connection to hbase

Facing one issue with Kerberos enabled Hadoop cluster.
We are trying to run a streaming job on yarn-cluster, which interacts with Kafka (direct stream), and hbase.
Somehow, we are not able to connect to hbase in the cluster mode. We use keytab to login to hbase.
This is what we do:
spark-submit --master yarn-cluster --keytab "dev.keytab" --principal "dev#IO-INT.COM" --conf "spark.executor.extraJavaOptions=-Dlog4j.configuration=log4j_executor_conf.properties -XX:+UseG1GC" --conf "spark.driver.extraJavaOptions=-Dlog4j.configuration=log4j_driver_conf.properties -XX:+UseG1GC" --conf spark.yarn.stagingDir=hdfs:///tmp/spark/ --files "job.properties,log4j_driver_conf.properties,log4j_executor_conf.properties" service-0.0.1-SNAPSHOT.jar job.properties
To connect to hbase:
def getHbaseConnection(properties: SerializedProperties): (Connection, UserGroupInformation) = {
val config = HBaseConfiguration.create();
config.set("hbase.zookeeper.quorum", HBASE_ZOOKEEPER_QUORUM_VALUE);
config.set("hbase.zookeeper.property.clientPort", 2181);
config.set("hadoop.security.authentication", "kerberos");
config.set("hbase.security.authentication", "kerberos");
config.set("hbase.cluster.distributed", "true");
config.set("hbase.rpc.protection", "privacy");
config.set("hbase.regionserver.kerberos.principal", “hbase/_HOST#IO-INT.COM”);
config.set("hbase.master.kerberos.principal", “hbase/_HOST#IO-INT.COM”);
UserGroupInformation.setConfiguration(config);
var ugi: UserGroupInformation = null;
if (SparkFiles.get(properties.keytab) != null
&& (new java.io.File(SparkFiles.get(properties.keytab)).exists)) {
ugi = UserGroupInformation.loginUserFromKeytabAndReturnUGI(properties.kerberosPrincipal,
SparkFiles.get(properties.keytab));
} else {
ugi = UserGroupInformation.loginUserFromKeytabAndReturnUGI(properties.kerberosPrincipal,
properties.keytab);
}
val connection = ConnectionFactory.createConnection(config);
return (connection, ugi);
}
and we connect to hbase:
….
foreachRDD { rdd =>
if (!rdd.isEmpty()) {
//var ugi: UserGroupInformation = Utils.getHbaseConnection(properties)._2
rdd.foreachPartition { partition =>
val connection = Utils.getHbaseConnection(propsObj)._1
val table = …
partition.foreach { json =>
}
table.put(puts)
table.close()
connection.close()
}
}
}
Keytab file is not getting copied to yarn staging/temp directory, we are not getting that in SparkFiles.get… and if we pass keytab with --files, spark-submit is failing because it’s there in --keytab already.
error is:
This server is in the failed servers list: myserver.test.com/120.111.25.45:60020
RpcRetryingCaller{globalStartTime=1497943263013, pause=100, retries=5}, org.apache.hadoop.hbase.ipc.FailedServerException: This server is in the failed servers list: myserver.test.com/120.111.25.45:60020
RpcRetryingCaller{globalStartTime=1497943263013, pause=100, retries=5}, org.apache.hadoop.hbase.ipc.FailedServerException: This server is in the failed servers list: myserver.test.com/120.111.25.45:60020 at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:147)
at org.apache.hadoop.hbase.client.HTable.get(HTable.java:935)

Error while submitting a spark job using spark-jobserver

I face following error occasionally while submitting job. This error goes away if I remove the rootdir of filedao, datadao and sqldao. That means I have to restart the job-server and re-upload my jar.
{
"status": "ERROR",
"result": {
"message": "Ask timed out on [Actor[akka://JobServer/user/context-supervisor/1995aeba-com.spmsoftware.distributed.job.TestJob#-1370794810]] after [10000 ms]. Sender[null] sent message of type \"spark.jobserver.JobManagerActor$StartJob\".",
"errorClass": "akka.pattern.AskTimeoutException",
"stack": ["akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:604)", "akka.actor.Scheduler$$anon$4.run(Scheduler.scala:126)", "scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:601)", "scala.concurrent.BatchingExecutor$class.execute(BatchingExecutor.scala:109)", "scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:599)", "akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(LightArrayRevolverScheduler.scala:331)", "akka.actor.LightArrayRevolverScheduler$$anon$4.executeBucket$1(LightArrayRevolverScheduler.scala:282)", "akka.actor.LightArrayRevolverScheduler$$anon$4.nextTick(LightArrayRevolverScheduler.scala:286)", "akka.actor.LightArrayRevolverScheduler$$anon$4.run(LightArrayRevolverScheduler.scala:238)", "java.lang.Thread.run(Thread.java:745)"]
}
}
My config file is as follows:
# Template for a Spark Job Server configuration file
# When deployed these settings are loaded when job server starts
#
# Spark Cluster / Job Server configuration
# Spark Cluster / Job Server configuration
spark {
# spark.master will be passed to each job's JobContext
master = <spark_master>
# Default # of CPUs for jobs to use for Spark standalone cluster
job-number-cpus = 4
jobserver {
port = 8090
context-per-jvm = false
context-creation-timeout = 100 s
# Note: JobFileDAO is deprecated from v0.7.0 because of issues in
# production and will be removed in future, now defaults to H2 file.
jobdao = spark.jobserver.io.JobSqlDAO
filedao {
rootdir = /tmp/spark-jobserver/filedao/data
}
datadao {
rootdir = /tmp/spark-jobserver/upload
}
sqldao {
slick-driver = slick.driver.H2Driver
jdbc-driver = org.h2.Driver
rootdir = /tmp/spark-jobserver/sqldao/data
jdbc {
url = "jdbc:h2:file:/tmp/spark-jobserver/sqldao/data/h2-db"
user = ""
password = ""
}
dbcp {
enabled = false
maxactive = 20
maxidle = 10
initialsize = 10
}
}
result-chunk-size = 1m
short-timeout = 60 s
}
context-settings {
num-cpu-cores = 2 # Number of cores to allocate. Required.
memory-per-node = 512m # Executor memory per node, -Xmx style eg 512m, #1G, etc.
}
}
akka {
remote.netty.tcp {
# This controls the maximum message size, including job results, that can be sent
# maximum-frame-size = 200 MiB
}
}
# check the reference.conf in spray-can/src/main/resources for all defined settings
spray.can.server.parsing.max-content-length = 250m
I am using spark-2.0-preview version.
I have faced the same error before and was related with timeout, for sure is an syncronus request (sync=true) togheter you must provide the timeout (in seconds) who is a value relative with how long it takes to process your request.
This an example how the request should look like:
curl -k --basic -d '' 'http://localhost:5050/jobs?appName=app&classPath=Main&context=test-context&sync=true&timeout=40'
if your request needs more than 40 seconds maybe you also need to modify the application.conf located on
spark-jobserver-master/job-server/src/main/resources/application.conf
ànd on the spray.can.server section modify:
idle-timeout = 210 s
request-timeout = 200 s

where to check my yarn+spark's applications' log?

I wrote an application with yarn+spark, for simplicity, I list the following
object testKafkaSparkStreaming extends Logging {
private class Parser extends Logging{
def parse(row: String): Row =
{
val row = "/_dc.gif| |20160616063934| |39.190.5.69| |729252016040907094857083a3c7c62e"
logInfo("pengcz starting parse " + row)
}
}
def main(args: Array[String]) {
...
logInfo("main starting parse " + row)
...
}
}
When I executed:
spark-submit --master local[*] --class $CLASS $JAR ...
I can see the two log infos in the console
But when I executed:
spark-submit --master yarn --class $CLASS $JAR ...
And I opened the yarn web ui of my own application:
http://192.168.36.172:8088/cluster/app/application_1465894400511_3624
And I click logs under the page, I got:
But the page has not information including my two logs
What should I do to find my own logs?
Any advice will be appreciated!
You can use this command to see the Yarn Logs of an Application :
yarn logs -applicationId <Your Yarn Application ID> // i.e. application_1465894400511_3624

Yarn support on Ooyala Spark JobServer

Just started experimenting with the JobServer and would like to use it in our production environment.
We usually run spark jobs individually in yarn-client mode and would like to shift towards the paradigm offered by the Ooyala Spark JobServer.
I am able to run the WordCount examples shown in the official page.
I tried running submitting our custom spark job to the Spark JobServer and I got this error:
{
"status": "ERROR",
"result": {
"message": "null",
"errorClass": "scala.MatchError",
"stack": ["spark.jobserver.JobManagerActor$$anonfun$spark$jobserver$JobManagerActor$$getJobFuture$4.apply(JobManagerActor.scala:220)",
"scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)",
"scala.concurrent.impl.Future $PromiseCompletingRunnable.run(Future.scala:24)",
"akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:41)",
"akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393)",
"scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)",
"scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java 1339)",
"scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)",
"scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)"]
}
I had made the necessary code modifications like extending SparkJob and implementing the runJob() method.
This is the dev.conf file that I used:
# Spark Cluster / Job Server configuration
spark {
# spark.master will be passed to each job's JobContext
master = "yarn-client"
# Default # of CPUs for jobs to use for Spark standalone cluster
job-number-cpus = 4
jobserver {
port = 8090
jar-store-rootdir = /tmp/jobserver/jars
jobdao = spark.jobserver.io.JobFileDAO
filedao {
rootdir = /tmp/spark-job-server/filedao/data
}
context-creation-timeout = "60 s"
}
contexts {
my-low-latency-context {
num-cpu-cores = 1
memory-per-node = 512m
}
}
context-settings {
num-cpu-cores = 2
memory-per-node = 512m
}
home = "/data/softwares/spark-1.2.0.2.2.0.0-82-bin-2.6.0.2.2.0.0-2041"
}
spray.can.server {
parsing.max-content-length = 200m
}
spark.driver.allowMultipleContexts = true
YARN_CONF_DIR=/home/spark/conf/
Also how can I give run-time parameters for the spark job, such as --files, --jars ?
For example, I usually run our custom spark job like this:
./spark-1.2.0.2.2.0.0-82-bin-2.6.0.2.2.0.0-2041/bin/spark-submit --class com.demo.SparkDriver --master yarn-cluster --num-executors 3 --jars /tmp/api/myUtil.jar --files /tmp/myConfFile.conf,/tmp/mySchema.txt /tmp/mySparkJob.jar
Number of executors and extra jars are passed in a different way, through the config file (see dependent-jar-uris config setting).
YARN_CONF_DIR should be set in the environment and not in the .conf file.
As for other issues, the google group is the right place to ask. You may want to search it for yarn-client issues, as several other folks have figured out how to get it to work.

Resources