Unable to connect to Cassandra cluster running on local host - cassandra

I have setup datastax cassandra service and have created a keyspace and my db is running fine.
Below are output from the nodetool status command:
C:\Users\xxx>cd C:\Program Files\DataStax Community\apache-cassandra\bin
C:\Program Files\DataStax Community\apache-cassandra\bin>nodetool status
Datacenter: datacenter1
========================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UN 127.0.0.1 229 KB 256 100.0% d5229669-f8f2-4b06-a887-4ab91a883a74 rack1
Also , the data is created having a keyspace.
cqlsh:axiaglobal> use axiaglobal;
cqlsh:axiaglobal> describe tables;
greetings
cqlsh:axiaglobal> select * from greetings;
user | id | creation_date | greet
------+----+---------------+-------
Now when I try to connect to cassandra via Java I get the following exception:
com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: /127.0.0.1:9042 (null))
at com.datastax.driver.core.ControlConnection.reconnectInternal(ControlConnection.java:196)
at com.datastax.driver.core.ControlConnection.connect(ControlConnection.java:79)
at com.datastax.driver.core.Cluster$Manager.init(Cluster.java:1104)
at com.datastax.driver.core.Cluster.init(Cluster.java:121)
at com.datastax.driver.core.Cluster.connect(Cluster.java:198)
at com.datastax.driver.core.Cluster.connect(Cluster.java:226)
at com.axia.global.dao.cassandra.service.CassandraApp.main(CassandraApp.java:29)
Piece of code which makes a call to Cassandra is listed below:
package com.axia.global.dao.cassandra.service;
import java.net.InetAddress;
import java.net.UnknownHostException;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.data.cassandra.core.CassandraOperations;
import org.springframework.data.cassandra.core.CassandraTemplate;
import com.axia.global.model.cassandra.Person;
import com.datastax.driver.core.Cluster;
import com.datastax.driver.core.Session;
import com.datastax.driver.core.querybuilder.QueryBuilder;
import com.datastax.driver.core.querybuilder.Select;
public class CassandraApp {
private static final Logger LOG = LoggerFactory.getLogger(CassandraApp.class);
private static Cluster cluster;
private static Session session;
public static void main(String[] args) {
try {
cluster = Cluster.builder().addContactPoints("127.0.0.1").build();
session = cluster.connect("axiaglobal");
CassandraOperations cassandraOps = new CassandraTemplate(session);
Select s = QueryBuilder.select().from("greetings");
} catch (Exception e) {
e.printStackTrace();
}
}
}
I am unable to understand where I am going wrong and why my connection is failing to connect to cassandra
can some one help me out with it ?
I have tried setting up the following:
rpc_address: 0.0.0.0 broadcast_rpc_address: 1.2.3.4 and even it did not work.

Did you check you have the correct java Driver for your Cassandra version?
what version of driver and Cassandra are you using?
Check here
https://docs.datastax.com/en/developer/driver-matrix/doc/javaDrivers.html

Related

Sparklyr gateway did not respond while retrieving ports

I am using sparklyr in a batch setup where multiple concurrent jobs with different parameters, are arriving and are processed by same sparklyr codebase. In "certain" random situations, the code gives errors (as below). I think it is under high load.
I am seeking guidance on the best way to troubleshoot it (including understanding the architecture of different components in the call chain). Therefore, beside pointing any correction in the code used to establish connection, any pointers to study further will be appreciated.
Thanks.
Stack versions:
Spark version 2.3.2.3.1.5.6091-7
Using Scala version 2.11.12 (OpenJDK 64-Bit Server VM, Java 1.8.0_322)
SparklyR version: sparklyr-2.1-2.11.jar
Yarn Cluster: hdp-3.1.5
Error:
2022-03-03 23:00:09 | Connecting to SPARK ...
2022-03-03 23:02:19 | Couldn't connect to SPARK (Error). Error in force(code): Failed while connecting to sparklyr to port (10980) for sessionid (38361): Sparklyr gateway did not respond while retrieving ports information after 120 seconds
Path: /usr/hdp/3.1.5.6091-7/spark2/bin/spark-submit
Parameters: --driver-memory, 3G, --executor-memory, 3G, --keytab, /etc/security/keytabs/appuser.headless.keytab, --principal, appuser#myorg.com, --class, sparklyr.Shell, '/usr/lib64/R/library/sparklyr/java/sparklyr-2.1-2.11.jar', 10980, 38361
Log: /tmp/RtmpyhpKkv/file187b82097bb55_spark.log
---- Output Log ----
22/03/03 23:00:17 INFO sparklyr: Session (38361) is starting under 127.0.0.1 port 10980
22/03/03 23:00:17 INFO sparklyr: Session (38361) found port 10980 is available
22/03/03 23:00:17 INFO sparklyr: Gateway (38361) is waiting for sparklyr client to connect to port 10980
22/03/03 23:01:17 INFO sparklyr: Gateway (38361) is terminating backend since no client has connected after 60 seconds to 192.168.1.55/10980.
22/03/03 23:01:17 INFO ShutdownHookManager: Shutdown hook called
22/03/03 23:01:17 INFO ShutdownHookManager: Deleting directory /tmp/spark-4fec5364-e440-41a8-87c4-b5e94472bb2f
---- Error Log ----
Connection code:
conf <- spark_config()
conf$spark.executor.memory <- "10G"
conf$spark.executor.cores <- 6
conf$spark.executor.instances <- 6
conf$spark.driver.memory <- "10g"
conf$spark.driver.memoryOverhead <-"3g"
conf$spark.shuffle.service.enabled <- "true"
conf$spark.port.maxRetries <- 125
conf$spark.sql.hive.convertMetastoreOrc <- "true"
conf$spark.local.dir = '/var/log/myapp/sparkjobs'
conf$'sparklyr.shell.driver-memory' <- "3G"
conf$'sparklyr.shell.executor-memory' <- "3G"
conf$spark.serializer <- "org.apache.spark.serializer.KryoSerializer"
conf$hive.metastore.uris = configs$DEFAULT$HIVE_METASTORE_URL
conf$spark.sql.session.timeZone <- "UTC"
# fix as per cloudera suggestion for future timeout issue
conf$spark.sql.broadcastTimeout <- 1200
conf$sparklyr.shell.keytab = "/etc/security/keytabs/appuser.headless.keytab"
conf$sparklyr.shell.principal = "appuser#myorg.com"
conf$spark.yarn.keytab= "/etc/security/keytabs/appuser.headless.keytab"
conf$spark.yarn.principal= "appuser#myorg.com"
conf$spark.sql.catalogImplementation <- "hive"
conf$sparklyr.gateway.config.retries <- 10
conf$sparklyr.connect.timeout <- 120
conf$sparklyr.gateway.port.query.attempts <- 10
conf$sparklyr.gateway.port.query.retry.interval.seconds <- 60
conf$sparklyr.gateway.port <- 10090 + round(runif(1, 1, 1000))
tryCatch
(
{
logging(paste0("Connecting to SPARK ... "))
withTimeout({ sc <- spark_connect(master = "yarn-client", spark_home = eval(SPARK_HOME_PATH), version = "2.1.0",app_name = "myjobname", config = conf) }, timeout = 540)
if (!is.null(sc)) {
return(sc)
}
},
TimeoutException = function(ex)
{
logging(paste0("Couldn't connect to SPARK (Timed up).", ex));
stop("Timeout occured");
},
error = function(err)
{
logging(paste0("Couldn't connect to SPARK (Error). ", err));
stop("Exception occured");
}
)

Spring Data: Connect to Cassandra via SSL

I want to connect to a Cassandra Cluster via SSL from a Java application using Spring Data. We have the following script which successfully connects to the cluster. Basically, it only enables SSL connection without specifying the SSL certificate.
mkdir .cassandra
echo "[ssl]" > .cassandra/cqlshrc
echo "validate = false" >> .cassandra/cqlshrc
cqlsh -u USER -p PASS -k KEYSPACE --debug --ssl HOSTNAME
I tried to set up the same connection options in a Spring #Configuration , via a CassandraClusterFactoryBean. Here is the snippet in question:
CassandraClusterFactoryBean factoryBean = new CassandraClusterFactoryBean();
factoryBean.setContactPoints(contactPoint);
factoryBean.setPort(9042);
factoryBean.setQueryOptions(queryOptions);
factoryBean.setAuthProvider(cassandraAuthentication());
factoryBean.setSslEnabled(true);
The cassandraAuthentication() method creates an AuthProvider with plaintext credentials. Upon trying to connect to the cluster I get the following exception:
Caused by: com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: <hostname>:9042 (com.datastax.driver.core.exceptions.TransportException: [<hostname>:9042] Error writing))
at com.datastax.driver.core.ControlConnection.reconnectInternal(ControlConnection.java:268) ~[cassandra-driver-core-3.6.0.jar!/:na]
at com.datastax.driver.core.ControlConnection.connect(ControlConnection.java:107) ~[cassandra-driver-core-3.6.0.jar!/:na]
at com.datastax.driver.core.Cluster$Manager.negotiateProtocolVersionAndConnect(Cluster.java:1652) ~[cassandra-driver-core-3.6.0.jar!/:na]
at com.datastax.driver.core.Cluster$Manager.init(Cluster.java:1571) ~[cassandra-driver-core-3.6.0.jar!/:na]
at com.datastax.driver.core.Cluster.init(Cluster.java:208) ~[cassandra-driver-core-3.6.0.jar!/:na]
at com.datastax.driver.core.Cluster.connectAsync(Cluster.java:376) ~[cassandra-driver-core-3.6.0.jar!/:na]
at com.datastax.driver.core.Cluster.connect(Cluster.java:332) ~[cassandra-driver-core-3.6.0.jar!/:na]
at org.springframework.data.cassandra.config.CassandraCqlSessionFactoryBean.connect(CassandraCqlSessionFactoryBean.java:89) ~[spring-data-cassandra-2.1.3.RELEASE.jar!/:2.1.3.RELEASE]
at org.springframework.data.cassandra.config.CassandraCqlSessionFactoryBean.afterPropertiesSet(CassandraCqlSessionFactoryBean.java:82) ~[spring-data-cassandra-2.1.3.RELEASE.jar!/:2.1.3.RELEASE]
at org.springframework.data.cassandra.config.CassandraSessionFactoryBean.afterPropertiesSet(CassandraSessionFactoryBean.java:59) ~[spring-data-cassandra-2.1.3.RELEASE.jar!/:2.1.3.RELEASE]
at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.invokeInitMethods(AbstractAutowireCapableBeanFactory.java:1804) ~[spring-beans-5.1.3.RELEASE.jar!/:5.1.3.RELEASE]
at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.initializeBean(AbstractAutowireCapableBeanFactory.java:1741) ~[spring-beans-5.1.3.RELEASE.jar!/:5.1.3.RELEASE]
... 51 common frames omitted
What settings should I add in the cluster configuration to be able to connect to the database? Thanks
If you are using CassandraClusterFactoryBean use below code,
#Bean
#Override
public CassandraClusterFactoryBean cluster() {
CassandraClusterFactoryBean cluster = new CassandraClusterFactoryBean();
PlainTextAuthProvider sap = new PlainTextAuthProvider(env.getProperty("cassandra.username"), env.getProperty("cassandra.password"));
cluster.setContactPoints(env.getProperty("cassandra.contactpoints"));
cluster.setPort(Integer.parseInt(env.getProperty("cassandra.port")));
cluster.setAuthProvider(sap);
return cluster;
}
#Bean #Override public CassandraClusterFactoryBean cluster() {
CassandraClusterFactoryBean cluster = new CassandraClusterFactoryBean();
PlainTextAuthProvider sap = new PlainTextAuthProvider(env.getProperty("cassandra.username"), env.getProperty("cassandra.password"));
cluster.setContactPoints(env.getProperty("cassandra.contactpoints"));
cluster.setPort(Integer.parseInt(env.getProperty("cassandra.port")));
cluster.setAuthProvider(sap);
cluster.setSslEnabled(true);
return cluster; }
Quoted from the original answer by #soUvIk and added cluster.setSslEnabled(true);to try connecting with sssl.

how to get descriptive error messages from embedded cassandra

I am using embedded cassandra to run unit tests. I notice that if any cql statements fail, then I don't see any descriptive reason for failure. For eg. I am running the following two statements which fails because I am trying to add a table without switching to a keyspace
val statement1 =
"""
|CREATE KEYSPACE test
| WITH REPLICATION = {
| 'class' : 'SimpleStrategy',
| 'replication_factor' : 1
| };
""".stripMargin
val statement3 =
"""
|CREATE TABLE users (
| bucket int,
| email text,
| firstname text,
| lastname text,
| authprovider text,
| password text,
| confirmed boolean,
| id UUID,
| hasher text,
| salt text,
| PRIMARY KEY ((bucket, email), authprovider,firstname, lastname) )
""".stripMargin
val cqlStatements:CqlStatements = new CqlStatements(statement1,statement3)
")
val testCassandra = repoTestEnv.testCassandra
try {
testCassandra.start()
testCassandra.executeScripts(cqlStatements)
} finally testCassandra.stop()
But I don't see correct error. I see the following which doesn't tell exactly what is the problem
[info] c.g.n.e.c.l.WindowsCassandraNode - Apache Cassandra Node '7276' is started
[info] c.g.n.e.c.l.LocalCassandraDatabase - Apache Cassandra '3.11.1' is started (20811 ms)
[warn] c.d.d.c.Connection - /127.0.0.1:9042 did not send an authentication challenge; This is suspicious because the driver expects authentication (configured auth provider = com.datastax.driver.core.PlainTextAuthProvider)
[warn] c.d.d.c.Connection - /127.0.0.1:9042 did not send an authentication challenge; This is suspicious because the driver expects authentication (configured auth provider = com.datastax.driver.core.PlainTextAuthProvider)
[debug] c.g.n.e.c.t.u.CqlUtils - Executing Script: CqlStatements [
CREATE KEYSPACE test
WITH REPLICATION = {
'class' : 'SimpleStrategy',
'replication_factor' : 1
};
,
CREATE TABLE users (
bucket int,
email text,
firstname text,
lastname text,
authprovider text,
password text,
confirmed boolean,
id UUID,
hasher text,
salt text,
PRIMARY KEY ((bucket, email), authprovider,firstname, lastname) )
]
[debug] c.g.n.e.c.t.u.CqlUtils - Executing Statement:
CREATE KEYSPACE test
WITH REPLICATION = {
'class' : 'SimpleStrategy',
'replication_factor' : 1
};
[info] c.g.n.e.c.Cassandra - INFO [Native-Transport-Requests-1] 2019-05-29 07:50:00,788 MigrationManager.java:310 - Create new Keyspace: KeyspaceMetadata{name=test, params=KeyspaceParams{durable_writes=true, replication=ReplicationParams{class=org.apache.cassandra.locator.SimpleStrategy, replication_factor=1}}, tables=[], views=[], functions=[], types=[]}
[debug] c.g.n.e.c.t.u.CqlUtils - Executing Statement:
CREATE TABLE users (
bucket int,
email text,
firstname text,
lastname text,
authprovider text,
password text,
confirmed boolean,
id UUID,
hasher text,
salt text,
PRIMARY KEY ((bucket, email), authprovider,firstname, lastname) )
[debug] c.g.n.e.c.t.TestCassandra - Stop TestCassandra 3.11.1
[info] c.g.n.e.c.l.LocalCassandraDatabase - Stop Apache Cassandra '3.11.1'
[debug] c.g.n.e.c.l.RunProcess - Execute 'powershell -ExecutionPolicy Unrestricted C:\Users\manu\AppData\Local\Temp\embedded-cassandra\3.11.1\0d155e04-97d5-4927-87ac-d46824a77c32\bin\stop-server.ps1 -p C:\Users\manu\AppData\Local\Temp\embedded-cassandra\3.11.1\0d155e04-97d5-4927-87ac-d46824a77c32\1da63488-2624-4141-a49e-174203b7edc4' within a directory 'C:\Users\manu\AppData\Local\Temp\embedded-cassandra\3.11.1\0d155e04-97d5-4927-87ac-d46824a77c32'
[info] c.g.n.e.c.Cassandra - INFO [StorageServiceShutdownHook] 2019-05-29 07:50:03,926 HintsService.java:220 - Paused hints dispatch
[info] c.g.n.e.c.Cassandra - INFO [StorageServiceShutdownHook] 2019-05-29 07:50:03,933 Server.java:176 - Stop listening for CQL clients
[info] c.g.n.e.c.Cassandra - INFO [StorageServiceShutdownHook] 2019-05-29 07:50:03,934 Gossiper.java:1532 - Announcing shutdown
[info] c.g.n.e.c.Cassandra - INFO [StorageServiceShutdownHook] 2019-05-29 07:50:03,938 StorageService.java:2268 - Node localhost/127.0.0.1 state jump to shutdown
[info] c.g.n.e.c.Cassandra - INFO [StorageServiceShutdownHook] 2019-05-29 07:50:05,941 MessagingService.java:984 - Waiting for messaging service to quiesce
[info] c.g.n.e.c.Cassandra - INFO [ACCEPT-localhost/127.0.0.1] 2019-05-29 07:50:05,948 MessagingService.java:1338 - MessagingService has terminated the accept() thread
[info] c.g.n.e.c.Cassandra - INFO [StorageServiceShutdownHook] 2019-05-29 07:50:06,076 HintsService.java:220 - Paused hints dispatch
[info] c.g.n.e.c.l.WindowsCassandraNode - Successfully sent ctrl+c to process with id: 7276.
[info] c.g.n.e.c.l.WindowsCassandraNode - Apache Cassandra Node '7276' is stopped
[info] c.g.n.e.c.l.LocalCassandraDatabase - Apache Cassandra '3.11.1' is stopped (3490 ms)
[info] c.g.n.e.c.l.LocalCassandraDatabase - The working directory 'C:\Users\manu\AppData\Local\Temp\embedded-cassandra\3.11.1\0d155e04-97d5-4927-87ac-d46824a77c32' was deleted.
[debug] c.g.n.e.c.t.TestCassandra - TestCassandra 3.11.1 is stopped
Unable to start TestCassandra 3.11.1
com.github.nosan.embedded.cassandra.CassandraException: Unable to start TestCassandra 3.11.1
at com.github.nosan.embedded.cassandra.test.TestCassandra.start(TestCassandra.java:128)
at UnitSpecs.RepositorySpecs.UsersRepositorySpecs.$anonfun$new$3(UsersRepositorySpecs.scala:146)
at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
Ideally I should get the error similar to what I would get if I was using cqlsh
Is there a way to get more descriptive errors?
I have tried to reproduce your issue, but no luck.
import com.github.nosan.embedded.cassandra.cql.CqlScript;
import com.github.nosan.embedded.cassandra.test.TestCassandra;
class Scratch {
public static void main(String[] args) {
TestCassandra testCassandra = new TestCassandra(CqlScript.statements(createKeyspace(),
createUserTable()));
testCassandra.start();
try {
System.out.println(testCassandra.getSettings());
}
finally {
testCassandra.stop();
}
}
private static String createUserTable() {
return "CREATE TABLE users ( bucket int, "
+ "email text, "
+ "firstname text, "
+ "lastname text, "
+ "authprovider text, "
+ "password text, "
+ "confirmed boolean, "
+ "id UUID, hasher text, "
+ "salt text, "
+ "PRIMARY KEY ((bucket, email), authprovider,firstname, lastname) )";
}
private static String createKeyspace() {
return "CREATE KEYSPACE test WITH REPLICATION = {'class' : 'SimpleStrategy', 'replication_factor' : 1}";
}
}
Output:
Exception in thread "main" com.github.nosan.embedded.cassandra.CassandraException: Unable to start TestCassandra 3.11.4
at com.github.nosan.embedded.cassandra.test.TestCassandra.start(TestCassandra.java:156)
at com.github.nosan.embedded.cassandra.Scratch.main(Scratch.java:27)
Caused by: com.datastax.oss.driver.api.core.servererrors.InvalidQueryException: No keyspace has been specified. USE a keyspace, or explicitly specify keyspace.tablename
at com.datastax.oss.driver.api.core.servererrors.InvalidQueryException.copy(InvalidQueryException.java:48)
at com.datastax.oss.driver.internal.core.util.concurrent.CompletableFutures.getUninterruptibly(CompletableFutures.java:113)
at com.datastax.oss.driver.internal.core.cql.CqlRequestSyncProcessor.process(CqlRequestSyncProcessor.java:53)
at com.datastax.oss.driver.internal.core.cql.CqlRequestSyncProcessor.process(CqlRequestSyncProcessor.java:30)
at com.datastax.oss.driver.internal.core.session.DefaultSession.execute(DefaultSession.java:207)
at com.datastax.oss.driver.api.core.CqlSession.execute(CqlSession.java:47)
at com.datastax.oss.driver.api.core.CqlSession.execute(CqlSession.java:56)
at com.github.nosan.embedded.cassandra.test.util.CqlUtils.execute(CqlUtils.java:68)
at com.github.nosan.embedded.cassandra.test.util.CqlUtils.execute(CqlUtils.java:47)
at com.github.nosan.embedded.cassandra.test.util.CqlSessionUtils.execute(CqlSessionUtils.java:43)
at com.github.nosan.embedded.cassandra.test.CqlSessionConnection.execute(CqlSessionConnection.java:60)
at com.github.nosan.embedded.cassandra.test.DefaultConnection.execute(DefaultConnection.java:53)
at com.github.nosan.embedded.cassandra.test.TestCassandra.executeScripts(TestCassandra.java:256)
at com.github.nosan.embedded.cassandra.test.TestCassandra.doStart(TestCassandra.java:285)
at com.github.nosan.embedded.cassandra.test.TestCassandra.start(TestCassandra.java:147)
I haven't been able to find my Caused isn't printed but I have found this workaround
try {
testCassandra.start()
println(s"cassandra state is ${testCassandra.getState}")
testCassandra.executeScripts(cqlStatements)
//println(s"result of execution is ${result}")
//val settings = testCassandra.getSettings
//println(s"settings are ${settings}")
} catch {
case e:Exception => {
println(s"exception ${e} caused by ${e.getCause}")
//println(s"caused by ${e.getCause()}")
fail( new Throwable(e.getCause))
}
}finally {
testCassandra.stop()
}
the above prints
org.scalatest.exceptions.TestFailedException was thrown.
ScalaTestFailureLocation: UnitSpecs.RepositorySpecs.UsersRepositorySpecs at (UsersRepositorySpecs.scala:157)
...
Caused by: java.lang.Throwable: com.datastax.driver.core.exceptions.InvalidQueryException: No keyspace has been specified. USE a keyspace, or explicitly specify keyspace.tablename
I found the reason. I wasn't using TestCassandra correctly it seems. I didn't realize that if I create TestCassandra and also specify the cql statements at time of instantiation, the start method runs the queries as well. In my code, I was creating TestCassandra as follows
new TestCassandra(factory,cqlStatements)})
and was calling both start and executeScripts
testCassandra.start()
testCassandra.executeScripts(cqlStatements)
I commented that executeScripts line and I now see both Exception and Caused
I think it would be better if the APIs clearly mention that start has side effect of executing statements as well.

Error when using SparkJob with NamedRddSupport

Goal is to create the following on a local instance of Spark JobServer:
object foo extends SparkJob with NamedRddSupport
Question: How can I fix the following error which happens on every job:
{
"status": "ERROR",
"result": {
"message": "Ask timed out on [Actor[akka://JobServer/user/context-supervisor/439b2467-spark.jobserver.genderPrediction#884262439]] after [10000 ms]",
"errorClass": "akka.pattern.AskTimeoutException",
"stack: ["akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:334)", "akka.actor.Scheduler$$anon$7.run(Scheduler.scala:117)", "scala.concurrent.Future$InternalCallbackExecutor$.scala$concurrent$Future$InternalCallbackExecutor$$unbatchedExecute(Future.scala:694)", "scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:691)", "akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(Scheduler.scala:467)", "akka.actor.LightArrayRevolverScheduler$$anon$8.executeBucket$1(Scheduler.scala:419)", "akka.actor.LightArrayRevolverScheduler$$anon$8.nextTick(Scheduler.scala:423)", "akka.actor.LightArrayRevolverScheduler$$anon$8.run(Scheduler.scala:375)", "java.lang.Thread.run(Thread.java:745)"]
}
}
A more detailed error description by the Spark JobServer:
job-server[ERROR] Exception in thread "pool-100-thread-1" java.lang.AbstractMethodError: spark.jobserver.genderPrediction$.namedObjectsPrivate()Ljava/util/concurrent/atomic/AtomicReference;
job-server[ERROR] at spark.jobserver.JobManagerActor$$anonfun$spark$jobserver$JobManagerActor$$getJobFuture$4.apply(JobManagerActor.scala:248)
job-server[ERROR] at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
job-server[ERROR] at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
job-server[ERROR] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
job-server[ERROR] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
job-server[ERROR] at java.lang.Thread.run(Thread.java:745)
In case somebody wants to see the code:
package spark.jobserver
import org.apache.spark.SparkContext._
import org.apache.spark.{SparkContext}
import com.typesafe.config.{Config, ConfigFactory}
import collection.JavaConversions._
import scala.io.Source
object genderPrediction extends SparkJob with NamedRddSupport
{
// Main function
def main(args: scala.Array[String])
{
val sc = new SparkContext()
sc.hadoopConfiguration.set("fs.tachyon.impl", "tachyon.hadoop.TFS")
val config = ConfigFactory.parseString("")
val results = runJob(sc, config)
}
def validate(sc: SparkContext, config: Config): SparkJobValidation = {SparkJobValid}
def runJob(sc: SparkContext, config: Config): Any =
{
return "ok";
}
}
Version information:
Spark is 1.5.0 - SparkJobServer is latest version
Thank you all very much in advance!
Adding more explanation to #noorul 's answer
It seems like you compiled the code with an old version of SJS and you are running it with the latest.
NamedObjects were recently added. You are getting AbstractMethodError because your server expects NamedObjects support and you didn't compile the code with that.
Also: you don't need the main method there since it won't be executed by SJS.
Ensure that your.compile and run time library versions of dependent packages are same.

Error when running job that queries against Cassandra via Spark SQL through Spark Jobserver

So I'm trying to run job that simply runs a query against cassandra using spark-sql, the job is submitted fine and the job starts fine. This code works when it is not being run through spark jobserver (when simply using spark submit). Could someone tell my what is wrong with my job code or configuration files that is causing the error below?
{
"status": "ERROR",
"ERROR": {
"errorClass": "java.util.concurrent.ExecutionException",
"cause": "Failed to open native connection to Cassandra at {127.0.1.1}:9042",
"stack": ["com.datastax.spark.connector.cql.CassandraConnector$.com$datastax$spark$connector$cql$CassandraConnector$$createSes
sion(CassandraConnector.scala:155)", "com.datastax.spark.connector.cql.CassandraConnector$$anonfun$2.apply(CassandraConnector.scal
a:141)", "com.datastax.spark.connector.cql.CassandraConnector$$anonfun$2.apply(CassandraConnector.scala:141)", "com.datastax.spark
.connector.cql.RefCountedCache.createNewValueAndKeys(RefCountedCache.scala:31)", "com.datastax.spark.connector.cql.RefCountedCache
.acquire(RefCountedCache.scala:56)", "com.datastax.spark.connector.cql.CassandraConnector.openSession(CassandraConnector.scala:73)
", "com.datastax.spark.connector.cql.CassandraConnector.withSessionDo(CassandraConnector.scala:101)", "com.datastax.spark.connecto
r.cql.CassandraConnector.withClusterDo(CassandraConnector.scala:112)", "com.datastax.spark.connector.cql.Schema$.fromCassandra(Sch
ema.scala:243)", "org.apache.spark.sql.cassandra.CassandraCatalog$$anon$1.load(CassandraCatalog.scala:22)", "org.apache.spark.sql.
cassandra.CassandraCatalog$$anon$1.load(CassandraCatalog.scala:19)", "com.google.common.cache.LocalCache$LoadingValueReference.loa
dFuture(LocalCache.java:3599)", "com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2379)", "com.google.common.ca
che.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2342)", "com.google.common.cache.LocalCache$Segment.get(LocalCache.java:225
7)", "com.google.common.cache.LocalCache.get(LocalCache.java:4000)", "com.google.common.cache.LocalCache.getOrLoad(LocalCache.java
:4004)", "com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4874)", "org.apache.spark.sql.cassandra.Cassand
raCatalog.lookupRelation(CassandraCatalog.scala:28)", "org.apache.spark.sql.cassandra.CassandraSQLContext$$anon$2.org$apache$spark
$sql$catalyst$analysis$OverrideCatalog$$super$lookupRelation(CassandraSQLContext.scala:218)", "org.apache.spark.sql.catalyst.analy
sis.OverrideCatalog$$anonfun$lookupRelation$3.apply(Catalog.scala:161)", "org.apache.spark.sql.catalyst.analysis.OverrideCatalog$$
anonfun$lookupRelation$3.apply(Catalog.scala:161)", "scala.Option.getOrElse(Option.scala:120)", "org.apache.spark.sql.catalyst.ana
lysis.OverrideCatalog$class.lookupRelation(Catalog.scala:161)", "org.apache.spark.sql.cassandra.CassandraSQLContext$$anon$2.lookup
Relation(CassandraSQLContext.scala:218)", "org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.getTable(Analyzer.sca
la:174)", "org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$6.applyOrElse(Analyzer.scala:186)", "or
g.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$6.applyOrElse(Analyzer.scala:181)", "org.apache.spar
k.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:188)", "org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.appl
y(TreeNode.scala:188)", "org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:51)", "org.apache.spark.sql.
catalyst.trees.TreeNode.transformDown(TreeNode.scala:187)", "org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNod
e.scala:208)", "scala.collection.Iterator$$anon$11.next(Iterator.scala:328)", "scala.collection.Iterator$class.foreach(Iterator.sc
ala:727)", "scala.collection.AbstractIterator.foreach(Iterator.scala:1157)", "scala.collection.generic.Growable$class.$plus$plus$e
q(Growable.scala:48)", "scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)", "scala.collection.mutable.Arra
yBuffer.$plus$plus$eq(ArrayBuffer.scala:47)", "scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273)", "scala.colle
ction.AbstractIterator.to(Iterator.scala:1157)", "scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265)", "sc
ala.collection.AbstractIterator.toBuffer(Iterator.scala:1157)", "scala.collection.TraversableOnce$class.toArray(TraversableOnce.sc
ala:252)", "scala.collection.AbstractIterator.toArray(Iterator.scala:1157)", "org.apache.spark.sql.catalyst.trees.TreeNode.transfo
rmChildrenDown(TreeNode.scala:238)", "org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:193)", "org.apache
.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:178)", "org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelatio
ns$.apply(Analyzer.scala:181)", "org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.apply(Analyzer.scala:171)", "or
g.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1$$anonfun$apply$2.apply(RuleExecutor.scala:61)", "org.apache.spark.
sql.catalyst.rules.RuleExecutor$$anonfun$apply$1$$anonfun$apply$2.apply(RuleExecutor.scala:59)", "scala.collection.LinearSeqOptimi
zed$class.foldLeft(LinearSeqOptimized.scala:111)", "scala.collection.immutable.List.foldLeft(List.scala:84)", "org.apache.spark.sq
l.catalyst.rules.RuleExecutor$$anonfun$apply$1.apply(RuleExecutor.scala:59)", "org.apache.spark.sql.catalyst.rules.RuleExecutor$$a
nonfun$apply$1.apply(RuleExecutor.scala:51)", "scala.collection.immutable.List.foreach(List.scala:318)", "org.apache.spark.sql.cat
alyst.rules.RuleExecutor.apply(RuleExecutor.scala:51)", "org.apache.spark.sql.SQLContext$QueryExecution.analyzed$lzycompute(SQLCon
text.scala:1082)", "org.apache.spark.sql.SQLContext$QueryExecution.analyzed(SQLContext.scala:1082)", "org.apache.spark.sql.SQLCont
ext$QueryExecution.assertAnalyzed(SQLContext.scala:1080)", "org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:133)", "org.apac
he.spark.sql.cassandra.CassandraSQLContext.cassandraSql(CassandraSQLContext.scala:211)", "org.apache.spark.sql.cassandra.Cassandra
SQLContext.sql(CassandraSQLContext.scala:214)", "CassSparkTest$.runJob(CassSparkTest.scala:23)", "CassSparkTest$.runJob(CassSparkT
est.scala:9)", "spark.jobserver.JobManagerActor$$anonfun$spark$jobserver$JobManagerActor$$getJobFuture$4.apply(JobManagerActor.sca
la:235)", "scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)", "scala.concurrent.impl.Future$P
romiseCompletingRunnable.run(Future.scala:24)", "java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)",
"java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)", "java.lang.Thread.run(Thread.java:745)"],
"causingClass": "java.io.IOException",
"message": "java.io.IOException: Failed to open native connection to Cassandra at {127.0.1.1}:9042"
}
}
Here is the job I am running:
import org.apache.spark.{SparkContext, SparkConf}
import com.datastax.spark.connector._
import org.apache.spark.sql.cassandra.CassandraSQLContext
import org.apache.spark.sql._
import spark.jobserver._
import com.typesafe.config.Config
import com.typesafe.config.ConfigFactory
object CassSparkTest extends SparkJob {
def main(args: Array[String]) {
val sc = new SparkContext("spark://192.168.10.11:7077", "test")
val config = ConfigFactory.parseString("")
val results = runJob(sc, config)
println("Results:" + results)
}
override def validate(sc:SparkContext, config: Config): SparkJobValidation = {
SparkJobValid
}
override def runJob(sc:SparkContext, config: Config): Any = {
val sqlC = new CassandraSQLContext(sc)
val df = sqlC.sql(config.getString("input.sql"))
df.collect()
}
}
and here is my configuration file for spark-jobserver
# Template for a Spark Job Server configuration file
# When deployed these settings are loaded when job server starts
#
# Spark Cluster / Job Server configuration
spark {
# spark.master will be passed to each job's JobContext
master = "spark://192.168.10.11:7077"
# master = "mesos://vm28-hulk-pub:5050"
# master = "yarn-client"
# Default # of CPUs for jobs to use for Spark standalone cluster
job-number-cpus = 1
jobserver {
port = 2020
jar-store-rootdir = /tmp/jobserver/jars
jobdao = spark.jobserver.io.JobFileDAO
filedao {
rootdir = /tmp/spark-job-server/filedao/data
}
}
# predefined Spark contexts
# contexts {
# my-low-latency-context {
# num-cpu-cores = 1 # Number of cores to allocate. Required.
# memory-per-node = 512m # Executor memory per node, -Xmx style eg 512m, 1G, etc.
# }
# # define additional contexts here
# }
# universal context configuration. These settings can be overridden, see README.md
context-settings {
num-cpu-cores = 1 # Number of cores to allocate. Required.
memory-per-node = 512m # Executor memory per node, -Xmx style eg 512m, #1G, etc.
# in case spark distribution should be accessed from HDFS (as opposed to being installed on every mesos slave)
# spark.executor.uri = "hdfs://namenode:8020/apps/spark/spark.tgz"
spark-cassandra-connection-host="127.0.0.1"
# uris of jars to be loaded into the classpath for this context. Uris is a string list, or a string separated by commas ','
# dependent-jar-uris = ["file:///some/path/present/in/each/mesos/slave/somepackage.jar"]
dependent-jar-uris = ["file:///home/vagrant/lib/spark-cassandra-connector-assembly-1.3.0-M2-SNAPSHOT.jar"]
# If you wish to pass any settings directly to the sparkConf as-is, add them here in passthrough,
# such as hadoop connection settings that don't use the "spark." prefix
passthrough {
#es.nodes = "192.1.1.1"
}
}
# This needs to match SPARK_HOME for cluster SparkContexts to be created successfully
# home = "/home/spark/spark"
}
# Note that you can use this file to define settings not only for job server,
# but for your Spark jobs as well. Spark job configuration merges with this configuration file as defaults.
#vicg, first you need spark.cassandra.connection.host -- periods not dashes. Also note in the error how the IP is "127.0.1.1", not the one in the config. You can also pass the IP when you create a context, like:
curl -X POST 'localhost:8090/contexts/my-context?spark.cassandra.connection.host=127.0.0.1'
If the above don't work, try the following PR:
https://github.com/spark-jobserver/spark-jobserver/pull/164

Resources