I am using sparklyr in a batch setup where multiple concurrent jobs with different parameters, are arriving and are processed by same sparklyr codebase. In "certain" random situations, the code gives errors (as below). I think it is under high load.
I am seeking guidance on the best way to troubleshoot it (including understanding the architecture of different components in the call chain). Therefore, beside pointing any correction in the code used to establish connection, any pointers to study further will be appreciated.
Thanks.
Stack versions:
Spark version 2.3.2.3.1.5.6091-7
Using Scala version 2.11.12 (OpenJDK 64-Bit Server VM, Java 1.8.0_322)
SparklyR version: sparklyr-2.1-2.11.jar
Yarn Cluster: hdp-3.1.5
Error:
2022-03-03 23:00:09 | Connecting to SPARK ...
2022-03-03 23:02:19 | Couldn't connect to SPARK (Error). Error in force(code): Failed while connecting to sparklyr to port (10980) for sessionid (38361): Sparklyr gateway did not respond while retrieving ports information after 120 seconds
Path: /usr/hdp/3.1.5.6091-7/spark2/bin/spark-submit
Parameters: --driver-memory, 3G, --executor-memory, 3G, --keytab, /etc/security/keytabs/appuser.headless.keytab, --principal, appuser#myorg.com, --class, sparklyr.Shell, '/usr/lib64/R/library/sparklyr/java/sparklyr-2.1-2.11.jar', 10980, 38361
Log: /tmp/RtmpyhpKkv/file187b82097bb55_spark.log
---- Output Log ----
22/03/03 23:00:17 INFO sparklyr: Session (38361) is starting under 127.0.0.1 port 10980
22/03/03 23:00:17 INFO sparklyr: Session (38361) found port 10980 is available
22/03/03 23:00:17 INFO sparklyr: Gateway (38361) is waiting for sparklyr client to connect to port 10980
22/03/03 23:01:17 INFO sparklyr: Gateway (38361) is terminating backend since no client has connected after 60 seconds to 192.168.1.55/10980.
22/03/03 23:01:17 INFO ShutdownHookManager: Shutdown hook called
22/03/03 23:01:17 INFO ShutdownHookManager: Deleting directory /tmp/spark-4fec5364-e440-41a8-87c4-b5e94472bb2f
---- Error Log ----
Connection code:
conf <- spark_config()
conf$spark.executor.memory <- "10G"
conf$spark.executor.cores <- 6
conf$spark.executor.instances <- 6
conf$spark.driver.memory <- "10g"
conf$spark.driver.memoryOverhead <-"3g"
conf$spark.shuffle.service.enabled <- "true"
conf$spark.port.maxRetries <- 125
conf$spark.sql.hive.convertMetastoreOrc <- "true"
conf$spark.local.dir = '/var/log/myapp/sparkjobs'
conf$'sparklyr.shell.driver-memory' <- "3G"
conf$'sparklyr.shell.executor-memory' <- "3G"
conf$spark.serializer <- "org.apache.spark.serializer.KryoSerializer"
conf$hive.metastore.uris = configs$DEFAULT$HIVE_METASTORE_URL
conf$spark.sql.session.timeZone <- "UTC"
# fix as per cloudera suggestion for future timeout issue
conf$spark.sql.broadcastTimeout <- 1200
conf$sparklyr.shell.keytab = "/etc/security/keytabs/appuser.headless.keytab"
conf$sparklyr.shell.principal = "appuser#myorg.com"
conf$spark.yarn.keytab= "/etc/security/keytabs/appuser.headless.keytab"
conf$spark.yarn.principal= "appuser#myorg.com"
conf$spark.sql.catalogImplementation <- "hive"
conf$sparklyr.gateway.config.retries <- 10
conf$sparklyr.connect.timeout <- 120
conf$sparklyr.gateway.port.query.attempts <- 10
conf$sparklyr.gateway.port.query.retry.interval.seconds <- 60
conf$sparklyr.gateway.port <- 10090 + round(runif(1, 1, 1000))
tryCatch
(
{
logging(paste0("Connecting to SPARK ... "))
withTimeout({ sc <- spark_connect(master = "yarn-client", spark_home = eval(SPARK_HOME_PATH), version = "2.1.0",app_name = "myjobname", config = conf) }, timeout = 540)
if (!is.null(sc)) {
return(sc)
}
},
TimeoutException = function(ex)
{
logging(paste0("Couldn't connect to SPARK (Timed up).", ex));
stop("Timeout occured");
},
error = function(err)
{
logging(paste0("Couldn't connect to SPARK (Error). ", err));
stop("Exception occured");
}
)
Related
As all know when is need to print the kafka broker id’s we can use the following cli
zookeeper-shell.sh zoo_server1:2181 <<< "ls /brokers/ids"
this cli print the following
WATCHER::
WatchedEvent state:SyncConnected type:None path:null
[1018, 1017, 1016]
Its means that we have kafka with id’s
1018
1017
1016
But our kafka names are
Kafka_confluent01
Kafka_confluent02
Kafka_confluent03
So how to know which kafka broker id ( 1018 , 1017 , 1016 ) is belong to the real host ( Kafka_confluent01 / Kafka_confluent02 / Kafka_confluent03 )
You can use kafkacat for this with the -L operator:
$ kafkacat -b kafka-01.foo.bar:9092 -L
Metadata for all topics (from broker 1: kafka-01.foo.bar:9092/1):
3 brokers:
broker 2 at kafka-02.foo.bar:9092
broker 3 at kafka-03.foo.bar:9092
broker 1 at kafka-01.foo.bar:9092 (controller)
You can get the list of brokers dynamically, using the following code.
public class KafkaBrokerInfoFetcher {
public static void main(String[] args) throws Exception {
ZooKeeper zk = new ZooKeeper("localhost:2181", 10000, null);
List<String> ids = zk.getChildren("/brokers/ids", false);
for (String id : ids) {
String brokerInfo = new String(zk.getData("/brokers/ids/" + id, false, null));
System.out.println(id + ": " + brokerInfo);
}
}
}
After running the code, you will get the broker id and corresponding host.
1: {"jmx_port":-1,"timestamp":"1428512949385","host":"192.168.0.11","version":1,"port":9093}
2: {"jmx_port":-1,"timestamp":"1428512955512","host":"192.168.0.11","version":1,"port":9094}
3: {"jmx_port":-1,"timestamp":"1428512961043","host":"192.168.0.11","version":1,"port":9095}
I am using embedded cassandra to run unit tests. I notice that if any cql statements fail, then I don't see any descriptive reason for failure. For eg. I am running the following two statements which fails because I am trying to add a table without switching to a keyspace
val statement1 =
"""
|CREATE KEYSPACE test
| WITH REPLICATION = {
| 'class' : 'SimpleStrategy',
| 'replication_factor' : 1
| };
""".stripMargin
val statement3 =
"""
|CREATE TABLE users (
| bucket int,
| email text,
| firstname text,
| lastname text,
| authprovider text,
| password text,
| confirmed boolean,
| id UUID,
| hasher text,
| salt text,
| PRIMARY KEY ((bucket, email), authprovider,firstname, lastname) )
""".stripMargin
val cqlStatements:CqlStatements = new CqlStatements(statement1,statement3)
")
val testCassandra = repoTestEnv.testCassandra
try {
testCassandra.start()
testCassandra.executeScripts(cqlStatements)
} finally testCassandra.stop()
But I don't see correct error. I see the following which doesn't tell exactly what is the problem
[info] c.g.n.e.c.l.WindowsCassandraNode - Apache Cassandra Node '7276' is started
[info] c.g.n.e.c.l.LocalCassandraDatabase - Apache Cassandra '3.11.1' is started (20811 ms)
[warn] c.d.d.c.Connection - /127.0.0.1:9042 did not send an authentication challenge; This is suspicious because the driver expects authentication (configured auth provider = com.datastax.driver.core.PlainTextAuthProvider)
[warn] c.d.d.c.Connection - /127.0.0.1:9042 did not send an authentication challenge; This is suspicious because the driver expects authentication (configured auth provider = com.datastax.driver.core.PlainTextAuthProvider)
[debug] c.g.n.e.c.t.u.CqlUtils - Executing Script: CqlStatements [
CREATE KEYSPACE test
WITH REPLICATION = {
'class' : 'SimpleStrategy',
'replication_factor' : 1
};
,
CREATE TABLE users (
bucket int,
email text,
firstname text,
lastname text,
authprovider text,
password text,
confirmed boolean,
id UUID,
hasher text,
salt text,
PRIMARY KEY ((bucket, email), authprovider,firstname, lastname) )
]
[debug] c.g.n.e.c.t.u.CqlUtils - Executing Statement:
CREATE KEYSPACE test
WITH REPLICATION = {
'class' : 'SimpleStrategy',
'replication_factor' : 1
};
[info] c.g.n.e.c.Cassandra - INFO [Native-Transport-Requests-1] 2019-05-29 07:50:00,788 MigrationManager.java:310 - Create new Keyspace: KeyspaceMetadata{name=test, params=KeyspaceParams{durable_writes=true, replication=ReplicationParams{class=org.apache.cassandra.locator.SimpleStrategy, replication_factor=1}}, tables=[], views=[], functions=[], types=[]}
[debug] c.g.n.e.c.t.u.CqlUtils - Executing Statement:
CREATE TABLE users (
bucket int,
email text,
firstname text,
lastname text,
authprovider text,
password text,
confirmed boolean,
id UUID,
hasher text,
salt text,
PRIMARY KEY ((bucket, email), authprovider,firstname, lastname) )
[debug] c.g.n.e.c.t.TestCassandra - Stop TestCassandra 3.11.1
[info] c.g.n.e.c.l.LocalCassandraDatabase - Stop Apache Cassandra '3.11.1'
[debug] c.g.n.e.c.l.RunProcess - Execute 'powershell -ExecutionPolicy Unrestricted C:\Users\manu\AppData\Local\Temp\embedded-cassandra\3.11.1\0d155e04-97d5-4927-87ac-d46824a77c32\bin\stop-server.ps1 -p C:\Users\manu\AppData\Local\Temp\embedded-cassandra\3.11.1\0d155e04-97d5-4927-87ac-d46824a77c32\1da63488-2624-4141-a49e-174203b7edc4' within a directory 'C:\Users\manu\AppData\Local\Temp\embedded-cassandra\3.11.1\0d155e04-97d5-4927-87ac-d46824a77c32'
[info] c.g.n.e.c.Cassandra - INFO [StorageServiceShutdownHook] 2019-05-29 07:50:03,926 HintsService.java:220 - Paused hints dispatch
[info] c.g.n.e.c.Cassandra - INFO [StorageServiceShutdownHook] 2019-05-29 07:50:03,933 Server.java:176 - Stop listening for CQL clients
[info] c.g.n.e.c.Cassandra - INFO [StorageServiceShutdownHook] 2019-05-29 07:50:03,934 Gossiper.java:1532 - Announcing shutdown
[info] c.g.n.e.c.Cassandra - INFO [StorageServiceShutdownHook] 2019-05-29 07:50:03,938 StorageService.java:2268 - Node localhost/127.0.0.1 state jump to shutdown
[info] c.g.n.e.c.Cassandra - INFO [StorageServiceShutdownHook] 2019-05-29 07:50:05,941 MessagingService.java:984 - Waiting for messaging service to quiesce
[info] c.g.n.e.c.Cassandra - INFO [ACCEPT-localhost/127.0.0.1] 2019-05-29 07:50:05,948 MessagingService.java:1338 - MessagingService has terminated the accept() thread
[info] c.g.n.e.c.Cassandra - INFO [StorageServiceShutdownHook] 2019-05-29 07:50:06,076 HintsService.java:220 - Paused hints dispatch
[info] c.g.n.e.c.l.WindowsCassandraNode - Successfully sent ctrl+c to process with id: 7276.
[info] c.g.n.e.c.l.WindowsCassandraNode - Apache Cassandra Node '7276' is stopped
[info] c.g.n.e.c.l.LocalCassandraDatabase - Apache Cassandra '3.11.1' is stopped (3490 ms)
[info] c.g.n.e.c.l.LocalCassandraDatabase - The working directory 'C:\Users\manu\AppData\Local\Temp\embedded-cassandra\3.11.1\0d155e04-97d5-4927-87ac-d46824a77c32' was deleted.
[debug] c.g.n.e.c.t.TestCassandra - TestCassandra 3.11.1 is stopped
Unable to start TestCassandra 3.11.1
com.github.nosan.embedded.cassandra.CassandraException: Unable to start TestCassandra 3.11.1
at com.github.nosan.embedded.cassandra.test.TestCassandra.start(TestCassandra.java:128)
at UnitSpecs.RepositorySpecs.UsersRepositorySpecs.$anonfun$new$3(UsersRepositorySpecs.scala:146)
at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
Ideally I should get the error similar to what I would get if I was using cqlsh
Is there a way to get more descriptive errors?
I have tried to reproduce your issue, but no luck.
import com.github.nosan.embedded.cassandra.cql.CqlScript;
import com.github.nosan.embedded.cassandra.test.TestCassandra;
class Scratch {
public static void main(String[] args) {
TestCassandra testCassandra = new TestCassandra(CqlScript.statements(createKeyspace(),
createUserTable()));
testCassandra.start();
try {
System.out.println(testCassandra.getSettings());
}
finally {
testCassandra.stop();
}
}
private static String createUserTable() {
return "CREATE TABLE users ( bucket int, "
+ "email text, "
+ "firstname text, "
+ "lastname text, "
+ "authprovider text, "
+ "password text, "
+ "confirmed boolean, "
+ "id UUID, hasher text, "
+ "salt text, "
+ "PRIMARY KEY ((bucket, email), authprovider,firstname, lastname) )";
}
private static String createKeyspace() {
return "CREATE KEYSPACE test WITH REPLICATION = {'class' : 'SimpleStrategy', 'replication_factor' : 1}";
}
}
Output:
Exception in thread "main" com.github.nosan.embedded.cassandra.CassandraException: Unable to start TestCassandra 3.11.4
at com.github.nosan.embedded.cassandra.test.TestCassandra.start(TestCassandra.java:156)
at com.github.nosan.embedded.cassandra.Scratch.main(Scratch.java:27)
Caused by: com.datastax.oss.driver.api.core.servererrors.InvalidQueryException: No keyspace has been specified. USE a keyspace, or explicitly specify keyspace.tablename
at com.datastax.oss.driver.api.core.servererrors.InvalidQueryException.copy(InvalidQueryException.java:48)
at com.datastax.oss.driver.internal.core.util.concurrent.CompletableFutures.getUninterruptibly(CompletableFutures.java:113)
at com.datastax.oss.driver.internal.core.cql.CqlRequestSyncProcessor.process(CqlRequestSyncProcessor.java:53)
at com.datastax.oss.driver.internal.core.cql.CqlRequestSyncProcessor.process(CqlRequestSyncProcessor.java:30)
at com.datastax.oss.driver.internal.core.session.DefaultSession.execute(DefaultSession.java:207)
at com.datastax.oss.driver.api.core.CqlSession.execute(CqlSession.java:47)
at com.datastax.oss.driver.api.core.CqlSession.execute(CqlSession.java:56)
at com.github.nosan.embedded.cassandra.test.util.CqlUtils.execute(CqlUtils.java:68)
at com.github.nosan.embedded.cassandra.test.util.CqlUtils.execute(CqlUtils.java:47)
at com.github.nosan.embedded.cassandra.test.util.CqlSessionUtils.execute(CqlSessionUtils.java:43)
at com.github.nosan.embedded.cassandra.test.CqlSessionConnection.execute(CqlSessionConnection.java:60)
at com.github.nosan.embedded.cassandra.test.DefaultConnection.execute(DefaultConnection.java:53)
at com.github.nosan.embedded.cassandra.test.TestCassandra.executeScripts(TestCassandra.java:256)
at com.github.nosan.embedded.cassandra.test.TestCassandra.doStart(TestCassandra.java:285)
at com.github.nosan.embedded.cassandra.test.TestCassandra.start(TestCassandra.java:147)
I haven't been able to find my Caused isn't printed but I have found this workaround
try {
testCassandra.start()
println(s"cassandra state is ${testCassandra.getState}")
testCassandra.executeScripts(cqlStatements)
//println(s"result of execution is ${result}")
//val settings = testCassandra.getSettings
//println(s"settings are ${settings}")
} catch {
case e:Exception => {
println(s"exception ${e} caused by ${e.getCause}")
//println(s"caused by ${e.getCause()}")
fail( new Throwable(e.getCause))
}
}finally {
testCassandra.stop()
}
the above prints
org.scalatest.exceptions.TestFailedException was thrown.
ScalaTestFailureLocation: UnitSpecs.RepositorySpecs.UsersRepositorySpecs at (UsersRepositorySpecs.scala:157)
...
Caused by: java.lang.Throwable: com.datastax.driver.core.exceptions.InvalidQueryException: No keyspace has been specified. USE a keyspace, or explicitly specify keyspace.tablename
I found the reason. I wasn't using TestCassandra correctly it seems. I didn't realize that if I create TestCassandra and also specify the cql statements at time of instantiation, the start method runs the queries as well. In my code, I was creating TestCassandra as follows
new TestCassandra(factory,cqlStatements)})
and was calling both start and executeScripts
testCassandra.start()
testCassandra.executeScripts(cqlStatements)
I commented that executeScripts line and I now see both Exception and Caused
I think it would be better if the APIs clearly mention that start has side effect of executing statements as well.
I have setup datastax cassandra service and have created a keyspace and my db is running fine.
Below are output from the nodetool status command:
C:\Users\xxx>cd C:\Program Files\DataStax Community\apache-cassandra\bin
C:\Program Files\DataStax Community\apache-cassandra\bin>nodetool status
Datacenter: datacenter1
========================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UN 127.0.0.1 229 KB 256 100.0% d5229669-f8f2-4b06-a887-4ab91a883a74 rack1
Also , the data is created having a keyspace.
cqlsh:axiaglobal> use axiaglobal;
cqlsh:axiaglobal> describe tables;
greetings
cqlsh:axiaglobal> select * from greetings;
user | id | creation_date | greet
------+----+---------------+-------
Now when I try to connect to cassandra via Java I get the following exception:
com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: /127.0.0.1:9042 (null))
at com.datastax.driver.core.ControlConnection.reconnectInternal(ControlConnection.java:196)
at com.datastax.driver.core.ControlConnection.connect(ControlConnection.java:79)
at com.datastax.driver.core.Cluster$Manager.init(Cluster.java:1104)
at com.datastax.driver.core.Cluster.init(Cluster.java:121)
at com.datastax.driver.core.Cluster.connect(Cluster.java:198)
at com.datastax.driver.core.Cluster.connect(Cluster.java:226)
at com.axia.global.dao.cassandra.service.CassandraApp.main(CassandraApp.java:29)
Piece of code which makes a call to Cassandra is listed below:
package com.axia.global.dao.cassandra.service;
import java.net.InetAddress;
import java.net.UnknownHostException;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.data.cassandra.core.CassandraOperations;
import org.springframework.data.cassandra.core.CassandraTemplate;
import com.axia.global.model.cassandra.Person;
import com.datastax.driver.core.Cluster;
import com.datastax.driver.core.Session;
import com.datastax.driver.core.querybuilder.QueryBuilder;
import com.datastax.driver.core.querybuilder.Select;
public class CassandraApp {
private static final Logger LOG = LoggerFactory.getLogger(CassandraApp.class);
private static Cluster cluster;
private static Session session;
public static void main(String[] args) {
try {
cluster = Cluster.builder().addContactPoints("127.0.0.1").build();
session = cluster.connect("axiaglobal");
CassandraOperations cassandraOps = new CassandraTemplate(session);
Select s = QueryBuilder.select().from("greetings");
} catch (Exception e) {
e.printStackTrace();
}
}
}
I am unable to understand where I am going wrong and why my connection is failing to connect to cassandra
can some one help me out with it ?
I have tried setting up the following:
rpc_address: 0.0.0.0 broadcast_rpc_address: 1.2.3.4 and even it did not work.
Did you check you have the correct java Driver for your Cassandra version?
what version of driver and Cassandra are you using?
Check here
https://docs.datastax.com/en/developer/driver-matrix/doc/javaDrivers.html
I face following error occasionally while submitting job. This error goes away if I remove the rootdir of filedao, datadao and sqldao. That means I have to restart the job-server and re-upload my jar.
{
"status": "ERROR",
"result": {
"message": "Ask timed out on [Actor[akka://JobServer/user/context-supervisor/1995aeba-com.spmsoftware.distributed.job.TestJob#-1370794810]] after [10000 ms]. Sender[null] sent message of type \"spark.jobserver.JobManagerActor$StartJob\".",
"errorClass": "akka.pattern.AskTimeoutException",
"stack": ["akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:604)", "akka.actor.Scheduler$$anon$4.run(Scheduler.scala:126)", "scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:601)", "scala.concurrent.BatchingExecutor$class.execute(BatchingExecutor.scala:109)", "scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:599)", "akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(LightArrayRevolverScheduler.scala:331)", "akka.actor.LightArrayRevolverScheduler$$anon$4.executeBucket$1(LightArrayRevolverScheduler.scala:282)", "akka.actor.LightArrayRevolverScheduler$$anon$4.nextTick(LightArrayRevolverScheduler.scala:286)", "akka.actor.LightArrayRevolverScheduler$$anon$4.run(LightArrayRevolverScheduler.scala:238)", "java.lang.Thread.run(Thread.java:745)"]
}
}
My config file is as follows:
# Template for a Spark Job Server configuration file
# When deployed these settings are loaded when job server starts
#
# Spark Cluster / Job Server configuration
# Spark Cluster / Job Server configuration
spark {
# spark.master will be passed to each job's JobContext
master = <spark_master>
# Default # of CPUs for jobs to use for Spark standalone cluster
job-number-cpus = 4
jobserver {
port = 8090
context-per-jvm = false
context-creation-timeout = 100 s
# Note: JobFileDAO is deprecated from v0.7.0 because of issues in
# production and will be removed in future, now defaults to H2 file.
jobdao = spark.jobserver.io.JobSqlDAO
filedao {
rootdir = /tmp/spark-jobserver/filedao/data
}
datadao {
rootdir = /tmp/spark-jobserver/upload
}
sqldao {
slick-driver = slick.driver.H2Driver
jdbc-driver = org.h2.Driver
rootdir = /tmp/spark-jobserver/sqldao/data
jdbc {
url = "jdbc:h2:file:/tmp/spark-jobserver/sqldao/data/h2-db"
user = ""
password = ""
}
dbcp {
enabled = false
maxactive = 20
maxidle = 10
initialsize = 10
}
}
result-chunk-size = 1m
short-timeout = 60 s
}
context-settings {
num-cpu-cores = 2 # Number of cores to allocate. Required.
memory-per-node = 512m # Executor memory per node, -Xmx style eg 512m, #1G, etc.
}
}
akka {
remote.netty.tcp {
# This controls the maximum message size, including job results, that can be sent
# maximum-frame-size = 200 MiB
}
}
# check the reference.conf in spray-can/src/main/resources for all defined settings
spray.can.server.parsing.max-content-length = 250m
I am using spark-2.0-preview version.
I have faced the same error before and was related with timeout, for sure is an syncronus request (sync=true) togheter you must provide the timeout (in seconds) who is a value relative with how long it takes to process your request.
This an example how the request should look like:
curl -k --basic -d '' 'http://localhost:5050/jobs?appName=app&classPath=Main&context=test-context&sync=true&timeout=40'
if your request needs more than 40 seconds maybe you also need to modify the application.conf located on
spark-jobserver-master/job-server/src/main/resources/application.conf
ànd on the spray.can.server section modify:
idle-timeout = 210 s
request-timeout = 200 s
So I'm trying to run job that simply runs a query against cassandra using spark-sql, the job is submitted fine and the job starts fine. This code works when it is not being run through spark jobserver (when simply using spark submit). Could someone tell my what is wrong with my job code or configuration files that is causing the error below?
{
"status": "ERROR",
"ERROR": {
"errorClass": "java.util.concurrent.ExecutionException",
"cause": "Failed to open native connection to Cassandra at {127.0.1.1}:9042",
"stack": ["com.datastax.spark.connector.cql.CassandraConnector$.com$datastax$spark$connector$cql$CassandraConnector$$createSes
sion(CassandraConnector.scala:155)", "com.datastax.spark.connector.cql.CassandraConnector$$anonfun$2.apply(CassandraConnector.scal
a:141)", "com.datastax.spark.connector.cql.CassandraConnector$$anonfun$2.apply(CassandraConnector.scala:141)", "com.datastax.spark
.connector.cql.RefCountedCache.createNewValueAndKeys(RefCountedCache.scala:31)", "com.datastax.spark.connector.cql.RefCountedCache
.acquire(RefCountedCache.scala:56)", "com.datastax.spark.connector.cql.CassandraConnector.openSession(CassandraConnector.scala:73)
", "com.datastax.spark.connector.cql.CassandraConnector.withSessionDo(CassandraConnector.scala:101)", "com.datastax.spark.connecto
r.cql.CassandraConnector.withClusterDo(CassandraConnector.scala:112)", "com.datastax.spark.connector.cql.Schema$.fromCassandra(Sch
ema.scala:243)", "org.apache.spark.sql.cassandra.CassandraCatalog$$anon$1.load(CassandraCatalog.scala:22)", "org.apache.spark.sql.
cassandra.CassandraCatalog$$anon$1.load(CassandraCatalog.scala:19)", "com.google.common.cache.LocalCache$LoadingValueReference.loa
dFuture(LocalCache.java:3599)", "com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2379)", "com.google.common.ca
che.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2342)", "com.google.common.cache.LocalCache$Segment.get(LocalCache.java:225
7)", "com.google.common.cache.LocalCache.get(LocalCache.java:4000)", "com.google.common.cache.LocalCache.getOrLoad(LocalCache.java
:4004)", "com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4874)", "org.apache.spark.sql.cassandra.Cassand
raCatalog.lookupRelation(CassandraCatalog.scala:28)", "org.apache.spark.sql.cassandra.CassandraSQLContext$$anon$2.org$apache$spark
$sql$catalyst$analysis$OverrideCatalog$$super$lookupRelation(CassandraSQLContext.scala:218)", "org.apache.spark.sql.catalyst.analy
sis.OverrideCatalog$$anonfun$lookupRelation$3.apply(Catalog.scala:161)", "org.apache.spark.sql.catalyst.analysis.OverrideCatalog$$
anonfun$lookupRelation$3.apply(Catalog.scala:161)", "scala.Option.getOrElse(Option.scala:120)", "org.apache.spark.sql.catalyst.ana
lysis.OverrideCatalog$class.lookupRelation(Catalog.scala:161)", "org.apache.spark.sql.cassandra.CassandraSQLContext$$anon$2.lookup
Relation(CassandraSQLContext.scala:218)", "org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.getTable(Analyzer.sca
la:174)", "org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$6.applyOrElse(Analyzer.scala:186)", "or
g.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$6.applyOrElse(Analyzer.scala:181)", "org.apache.spar
k.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:188)", "org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.appl
y(TreeNode.scala:188)", "org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:51)", "org.apache.spark.sql.
catalyst.trees.TreeNode.transformDown(TreeNode.scala:187)", "org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNod
e.scala:208)", "scala.collection.Iterator$$anon$11.next(Iterator.scala:328)", "scala.collection.Iterator$class.foreach(Iterator.sc
ala:727)", "scala.collection.AbstractIterator.foreach(Iterator.scala:1157)", "scala.collection.generic.Growable$class.$plus$plus$e
q(Growable.scala:48)", "scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)", "scala.collection.mutable.Arra
yBuffer.$plus$plus$eq(ArrayBuffer.scala:47)", "scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273)", "scala.colle
ction.AbstractIterator.to(Iterator.scala:1157)", "scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265)", "sc
ala.collection.AbstractIterator.toBuffer(Iterator.scala:1157)", "scala.collection.TraversableOnce$class.toArray(TraversableOnce.sc
ala:252)", "scala.collection.AbstractIterator.toArray(Iterator.scala:1157)", "org.apache.spark.sql.catalyst.trees.TreeNode.transfo
rmChildrenDown(TreeNode.scala:238)", "org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:193)", "org.apache
.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:178)", "org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelatio
ns$.apply(Analyzer.scala:181)", "org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.apply(Analyzer.scala:171)", "or
g.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1$$anonfun$apply$2.apply(RuleExecutor.scala:61)", "org.apache.spark.
sql.catalyst.rules.RuleExecutor$$anonfun$apply$1$$anonfun$apply$2.apply(RuleExecutor.scala:59)", "scala.collection.LinearSeqOptimi
zed$class.foldLeft(LinearSeqOptimized.scala:111)", "scala.collection.immutable.List.foldLeft(List.scala:84)", "org.apache.spark.sq
l.catalyst.rules.RuleExecutor$$anonfun$apply$1.apply(RuleExecutor.scala:59)", "org.apache.spark.sql.catalyst.rules.RuleExecutor$$a
nonfun$apply$1.apply(RuleExecutor.scala:51)", "scala.collection.immutable.List.foreach(List.scala:318)", "org.apache.spark.sql.cat
alyst.rules.RuleExecutor.apply(RuleExecutor.scala:51)", "org.apache.spark.sql.SQLContext$QueryExecution.analyzed$lzycompute(SQLCon
text.scala:1082)", "org.apache.spark.sql.SQLContext$QueryExecution.analyzed(SQLContext.scala:1082)", "org.apache.spark.sql.SQLCont
ext$QueryExecution.assertAnalyzed(SQLContext.scala:1080)", "org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:133)", "org.apac
he.spark.sql.cassandra.CassandraSQLContext.cassandraSql(CassandraSQLContext.scala:211)", "org.apache.spark.sql.cassandra.Cassandra
SQLContext.sql(CassandraSQLContext.scala:214)", "CassSparkTest$.runJob(CassSparkTest.scala:23)", "CassSparkTest$.runJob(CassSparkT
est.scala:9)", "spark.jobserver.JobManagerActor$$anonfun$spark$jobserver$JobManagerActor$$getJobFuture$4.apply(JobManagerActor.sca
la:235)", "scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)", "scala.concurrent.impl.Future$P
romiseCompletingRunnable.run(Future.scala:24)", "java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)",
"java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)", "java.lang.Thread.run(Thread.java:745)"],
"causingClass": "java.io.IOException",
"message": "java.io.IOException: Failed to open native connection to Cassandra at {127.0.1.1}:9042"
}
}
Here is the job I am running:
import org.apache.spark.{SparkContext, SparkConf}
import com.datastax.spark.connector._
import org.apache.spark.sql.cassandra.CassandraSQLContext
import org.apache.spark.sql._
import spark.jobserver._
import com.typesafe.config.Config
import com.typesafe.config.ConfigFactory
object CassSparkTest extends SparkJob {
def main(args: Array[String]) {
val sc = new SparkContext("spark://192.168.10.11:7077", "test")
val config = ConfigFactory.parseString("")
val results = runJob(sc, config)
println("Results:" + results)
}
override def validate(sc:SparkContext, config: Config): SparkJobValidation = {
SparkJobValid
}
override def runJob(sc:SparkContext, config: Config): Any = {
val sqlC = new CassandraSQLContext(sc)
val df = sqlC.sql(config.getString("input.sql"))
df.collect()
}
}
and here is my configuration file for spark-jobserver
# Template for a Spark Job Server configuration file
# When deployed these settings are loaded when job server starts
#
# Spark Cluster / Job Server configuration
spark {
# spark.master will be passed to each job's JobContext
master = "spark://192.168.10.11:7077"
# master = "mesos://vm28-hulk-pub:5050"
# master = "yarn-client"
# Default # of CPUs for jobs to use for Spark standalone cluster
job-number-cpus = 1
jobserver {
port = 2020
jar-store-rootdir = /tmp/jobserver/jars
jobdao = spark.jobserver.io.JobFileDAO
filedao {
rootdir = /tmp/spark-job-server/filedao/data
}
}
# predefined Spark contexts
# contexts {
# my-low-latency-context {
# num-cpu-cores = 1 # Number of cores to allocate. Required.
# memory-per-node = 512m # Executor memory per node, -Xmx style eg 512m, 1G, etc.
# }
# # define additional contexts here
# }
# universal context configuration. These settings can be overridden, see README.md
context-settings {
num-cpu-cores = 1 # Number of cores to allocate. Required.
memory-per-node = 512m # Executor memory per node, -Xmx style eg 512m, #1G, etc.
# in case spark distribution should be accessed from HDFS (as opposed to being installed on every mesos slave)
# spark.executor.uri = "hdfs://namenode:8020/apps/spark/spark.tgz"
spark-cassandra-connection-host="127.0.0.1"
# uris of jars to be loaded into the classpath for this context. Uris is a string list, or a string separated by commas ','
# dependent-jar-uris = ["file:///some/path/present/in/each/mesos/slave/somepackage.jar"]
dependent-jar-uris = ["file:///home/vagrant/lib/spark-cassandra-connector-assembly-1.3.0-M2-SNAPSHOT.jar"]
# If you wish to pass any settings directly to the sparkConf as-is, add them here in passthrough,
# such as hadoop connection settings that don't use the "spark." prefix
passthrough {
#es.nodes = "192.1.1.1"
}
}
# This needs to match SPARK_HOME for cluster SparkContexts to be created successfully
# home = "/home/spark/spark"
}
# Note that you can use this file to define settings not only for job server,
# but for your Spark jobs as well. Spark job configuration merges with this configuration file as defaults.
#vicg, first you need spark.cassandra.connection.host -- periods not dashes. Also note in the error how the IP is "127.0.1.1", not the one in the config. You can also pass the IP when you create a context, like:
curl -X POST 'localhost:8090/contexts/my-context?spark.cassandra.connection.host=127.0.0.1'
If the above don't work, try the following PR:
https://github.com/spark-jobserver/spark-jobserver/pull/164