kafka configuration + zookeeper cli + get the right info for kafka host - linux

As all know when is need to print the kafka broker id’s we can use the following cli
zookeeper-shell.sh zoo_server1:2181 <<< "ls /brokers/ids"
this cli print the following
WATCHER::
WatchedEvent state:SyncConnected type:None path:null
[1018, 1017, 1016]
Its means that we have kafka with id’s
1018
1017
1016
But our kafka names are
Kafka_confluent01
Kafka_confluent02
Kafka_confluent03
So how to know which kafka broker id ( 1018 , 1017 , 1016 ) is belong to the real host ( Kafka_confluent01 / Kafka_confluent02 / Kafka_confluent03 )

You can use kafkacat for this with the -L operator:
$ kafkacat -b kafka-01.foo.bar:9092 -L
Metadata for all topics (from broker 1: kafka-01.foo.bar:9092/1):
3 brokers:
broker 2 at kafka-02.foo.bar:9092
broker 3 at kafka-03.foo.bar:9092
broker 1 at kafka-01.foo.bar:9092 (controller)

You can get the list of brokers dynamically, using the following code.
public class KafkaBrokerInfoFetcher {
public static void main(String[] args) throws Exception {
ZooKeeper zk = new ZooKeeper("localhost:2181", 10000, null);
List<String> ids = zk.getChildren("/brokers/ids", false);
for (String id : ids) {
String brokerInfo = new String(zk.getData("/brokers/ids/" + id, false, null));
System.out.println(id + ": " + brokerInfo);
}
}
}
After running the code, you will get the broker id and corresponding host.
1: {"jmx_port":-1,"timestamp":"1428512949385","host":"192.168.0.11","version":1,"port":9093}
2: {"jmx_port":-1,"timestamp":"1428512955512","host":"192.168.0.11","version":1,"port":9094}
3: {"jmx_port":-1,"timestamp":"1428512961043","host":"192.168.0.11","version":1,"port":9095}

Related

Sparklyr gateway did not respond while retrieving ports

I am using sparklyr in a batch setup where multiple concurrent jobs with different parameters, are arriving and are processed by same sparklyr codebase. In "certain" random situations, the code gives errors (as below). I think it is under high load.
I am seeking guidance on the best way to troubleshoot it (including understanding the architecture of different components in the call chain). Therefore, beside pointing any correction in the code used to establish connection, any pointers to study further will be appreciated.
Thanks.
Stack versions:
Spark version 2.3.2.3.1.5.6091-7
Using Scala version 2.11.12 (OpenJDK 64-Bit Server VM, Java 1.8.0_322)
SparklyR version: sparklyr-2.1-2.11.jar
Yarn Cluster: hdp-3.1.5
Error:
2022-03-03 23:00:09 | Connecting to SPARK ...
2022-03-03 23:02:19 | Couldn't connect to SPARK (Error). Error in force(code): Failed while connecting to sparklyr to port (10980) for sessionid (38361): Sparklyr gateway did not respond while retrieving ports information after 120 seconds
Path: /usr/hdp/3.1.5.6091-7/spark2/bin/spark-submit
Parameters: --driver-memory, 3G, --executor-memory, 3G, --keytab, /etc/security/keytabs/appuser.headless.keytab, --principal, appuser#myorg.com, --class, sparklyr.Shell, '/usr/lib64/R/library/sparklyr/java/sparklyr-2.1-2.11.jar', 10980, 38361
Log: /tmp/RtmpyhpKkv/file187b82097bb55_spark.log
---- Output Log ----
22/03/03 23:00:17 INFO sparklyr: Session (38361) is starting under 127.0.0.1 port 10980
22/03/03 23:00:17 INFO sparklyr: Session (38361) found port 10980 is available
22/03/03 23:00:17 INFO sparklyr: Gateway (38361) is waiting for sparklyr client to connect to port 10980
22/03/03 23:01:17 INFO sparklyr: Gateway (38361) is terminating backend since no client has connected after 60 seconds to 192.168.1.55/10980.
22/03/03 23:01:17 INFO ShutdownHookManager: Shutdown hook called
22/03/03 23:01:17 INFO ShutdownHookManager: Deleting directory /tmp/spark-4fec5364-e440-41a8-87c4-b5e94472bb2f
---- Error Log ----
Connection code:
conf <- spark_config()
conf$spark.executor.memory <- "10G"
conf$spark.executor.cores <- 6
conf$spark.executor.instances <- 6
conf$spark.driver.memory <- "10g"
conf$spark.driver.memoryOverhead <-"3g"
conf$spark.shuffle.service.enabled <- "true"
conf$spark.port.maxRetries <- 125
conf$spark.sql.hive.convertMetastoreOrc <- "true"
conf$spark.local.dir = '/var/log/myapp/sparkjobs'
conf$'sparklyr.shell.driver-memory' <- "3G"
conf$'sparklyr.shell.executor-memory' <- "3G"
conf$spark.serializer <- "org.apache.spark.serializer.KryoSerializer"
conf$hive.metastore.uris = configs$DEFAULT$HIVE_METASTORE_URL
conf$spark.sql.session.timeZone <- "UTC"
# fix as per cloudera suggestion for future timeout issue
conf$spark.sql.broadcastTimeout <- 1200
conf$sparklyr.shell.keytab = "/etc/security/keytabs/appuser.headless.keytab"
conf$sparklyr.shell.principal = "appuser#myorg.com"
conf$spark.yarn.keytab= "/etc/security/keytabs/appuser.headless.keytab"
conf$spark.yarn.principal= "appuser#myorg.com"
conf$spark.sql.catalogImplementation <- "hive"
conf$sparklyr.gateway.config.retries <- 10
conf$sparklyr.connect.timeout <- 120
conf$sparklyr.gateway.port.query.attempts <- 10
conf$sparklyr.gateway.port.query.retry.interval.seconds <- 60
conf$sparklyr.gateway.port <- 10090 + round(runif(1, 1, 1000))
tryCatch
(
{
logging(paste0("Connecting to SPARK ... "))
withTimeout({ sc <- spark_connect(master = "yarn-client", spark_home = eval(SPARK_HOME_PATH), version = "2.1.0",app_name = "myjobname", config = conf) }, timeout = 540)
if (!is.null(sc)) {
return(sc)
}
},
TimeoutException = function(ex)
{
logging(paste0("Couldn't connect to SPARK (Timed up).", ex));
stop("Timeout occured");
},
error = function(err)
{
logging(paste0("Couldn't connect to SPARK (Error). ", err));
stop("Exception occured");
}
)

If "keyword" in message not working for logstash

I am receiving logs from 5 different sources on one single port. In fact it is a collection of files being sent through syslog from a server in realtime. The server stores logs from 4 VPN servers and one DNS server. Now the server admin started sending all 5 types of files on a single port although I asked something different. Anyways, I thought to make this also work now.
Below are the different types of samples-
------------------
<13>Sep 30 22:03:28 xx2.20.43.100 370 <134>1 2021-09-30T22:03:28+05:30 canopus.domain1.com1 PulseSecure: - - - id=firewall time="2021-09-30 22:03:28" pri=6 fw=xx2.20.43.100 vpn=ive user=System realm="google_auth" roles="" proto= src=1xx.99.110.19 dst= dstname= type=vpn op= arg="" result= sent= rcvd= agent="" duration= msg="AUT23278: User Limit realm restrictions successfully passed for /google_auth "
------------------
<134>Sep 30 22:41:43 xx2.20.43.101 1 2021-09-30T22:41:43+05:30 canopus.domain1.com2 PulseSecure: - - - id=firewall time="2021-09-30 22:41:43" pri=6 fw=xx2.20.43.101 vpn=ive user=user22 realm="google_auth" roles="Domain_check_role" proto= src=1xx.200.27.62 dst= dstname= type=vpn op= arg="" result= sent= rcvd= agent="" duration= msg="NWC24328: Transport mode switched over to SSL for user with NCIP xx2.20.210.252 "
------------------
<134>Sep 30 22:36:59 vpn-dns-1 named[130237]: 30-Sep-2021 22:36:59.172 queries: info: client #0x7f8e0f5cab50 xx2.30.16.147#63335 (ind.event.freefiremobile.com): query: ind.event.freefiremobile.com IN A + (xx2.31.0.171)
------------------
<13>Sep 30 22:40:31 xx2.20.43.101 394 <134>1 2021-09-30T22:40:31+05:30 canopus.domain1.com2 PulseSecure: - - - id=firewall time="2021-09-30 22:40:31" pri=6 fw=xx2.20.43.101 vpn=ive user=user3 realm="google_auth" roles="Domain_check_role" proto= src=1xx.168.77.166 dst= dstname= type=vpn op= arg="" result= sent= rcvd= agent="" duration= msg="NWC23508: Key Exchange number 1 occurred for user with NCIP xx2.20.214.109 "
Below is my config file-
syslog {
port => 1301
ecs_compatibility => disabled
tags => ["vpn"]
}
}
I tried to apply a condition first to get VPN logs (1st sample logline) and pass it to dissect-
filter {
if "vpn" in [tags] {
#if ([message] =~ /vpn=ive/) {
if "vpn=ive" in [message] {
dissect {
mapping => { "message" => "%{reserved} id=firewall %{message1}" }
# using id=firewall to get KV pairs in message1
}
}
}
else { drop {} }
# \/ end of filter brace
}
But when I run with this config file, I am getting mixture of all 5 types of logs in kibana. I don't see any dissect failures as well. I remember this worked in some other server for other type of log, but not working here.
Another question is, if I have to process all 5 types of logs in one config file, will below be a good approach?
if "VPN-logline" in [message] { use KV plugin and add tag of "vpn" }
else if "DNS-logline" in [message] { use JSON plugin and tag of "dns"}
else if "something-irrelevant" in [message] { drop {} }
Or can it be done in input section of config?
So, the problem was to assign every logline with the tag pf vpn. I was doing so because I had to merge this config to a larger config file that carries many more tags.Anyways, now thought to keep this config file separate only.
input {
syslog {
port => 1301
ecs_compatibility => disabled
}
}
filter {
if "vpn=ive" in [message] {
dissect {
mapping => { "message" => "%{reserved} id=firewall %{message1}" }
}
}
else { drop {} }
}
output {
elasticsearch {
hosts => "localhost"
index => "vpn1oct"
user => "elastic"
password => "xxxxxxxxxx"
}
stdout { }
}

how to get descriptive error messages from embedded cassandra

I am using embedded cassandra to run unit tests. I notice that if any cql statements fail, then I don't see any descriptive reason for failure. For eg. I am running the following two statements which fails because I am trying to add a table without switching to a keyspace
val statement1 =
"""
|CREATE KEYSPACE test
| WITH REPLICATION = {
| 'class' : 'SimpleStrategy',
| 'replication_factor' : 1
| };
""".stripMargin
val statement3 =
"""
|CREATE TABLE users (
| bucket int,
| email text,
| firstname text,
| lastname text,
| authprovider text,
| password text,
| confirmed boolean,
| id UUID,
| hasher text,
| salt text,
| PRIMARY KEY ((bucket, email), authprovider,firstname, lastname) )
""".stripMargin
val cqlStatements:CqlStatements = new CqlStatements(statement1,statement3)
")
val testCassandra = repoTestEnv.testCassandra
try {
testCassandra.start()
testCassandra.executeScripts(cqlStatements)
} finally testCassandra.stop()
But I don't see correct error. I see the following which doesn't tell exactly what is the problem
[info] c.g.n.e.c.l.WindowsCassandraNode - Apache Cassandra Node '7276' is started
[info] c.g.n.e.c.l.LocalCassandraDatabase - Apache Cassandra '3.11.1' is started (20811 ms)
[warn] c.d.d.c.Connection - /127.0.0.1:9042 did not send an authentication challenge; This is suspicious because the driver expects authentication (configured auth provider = com.datastax.driver.core.PlainTextAuthProvider)
[warn] c.d.d.c.Connection - /127.0.0.1:9042 did not send an authentication challenge; This is suspicious because the driver expects authentication (configured auth provider = com.datastax.driver.core.PlainTextAuthProvider)
[debug] c.g.n.e.c.t.u.CqlUtils - Executing Script: CqlStatements [
CREATE KEYSPACE test
WITH REPLICATION = {
'class' : 'SimpleStrategy',
'replication_factor' : 1
};
,
CREATE TABLE users (
bucket int,
email text,
firstname text,
lastname text,
authprovider text,
password text,
confirmed boolean,
id UUID,
hasher text,
salt text,
PRIMARY KEY ((bucket, email), authprovider,firstname, lastname) )
]
[debug] c.g.n.e.c.t.u.CqlUtils - Executing Statement:
CREATE KEYSPACE test
WITH REPLICATION = {
'class' : 'SimpleStrategy',
'replication_factor' : 1
};
[info] c.g.n.e.c.Cassandra - INFO [Native-Transport-Requests-1] 2019-05-29 07:50:00,788 MigrationManager.java:310 - Create new Keyspace: KeyspaceMetadata{name=test, params=KeyspaceParams{durable_writes=true, replication=ReplicationParams{class=org.apache.cassandra.locator.SimpleStrategy, replication_factor=1}}, tables=[], views=[], functions=[], types=[]}
[debug] c.g.n.e.c.t.u.CqlUtils - Executing Statement:
CREATE TABLE users (
bucket int,
email text,
firstname text,
lastname text,
authprovider text,
password text,
confirmed boolean,
id UUID,
hasher text,
salt text,
PRIMARY KEY ((bucket, email), authprovider,firstname, lastname) )
[debug] c.g.n.e.c.t.TestCassandra - Stop TestCassandra 3.11.1
[info] c.g.n.e.c.l.LocalCassandraDatabase - Stop Apache Cassandra '3.11.1'
[debug] c.g.n.e.c.l.RunProcess - Execute 'powershell -ExecutionPolicy Unrestricted C:\Users\manu\AppData\Local\Temp\embedded-cassandra\3.11.1\0d155e04-97d5-4927-87ac-d46824a77c32\bin\stop-server.ps1 -p C:\Users\manu\AppData\Local\Temp\embedded-cassandra\3.11.1\0d155e04-97d5-4927-87ac-d46824a77c32\1da63488-2624-4141-a49e-174203b7edc4' within a directory 'C:\Users\manu\AppData\Local\Temp\embedded-cassandra\3.11.1\0d155e04-97d5-4927-87ac-d46824a77c32'
[info] c.g.n.e.c.Cassandra - INFO [StorageServiceShutdownHook] 2019-05-29 07:50:03,926 HintsService.java:220 - Paused hints dispatch
[info] c.g.n.e.c.Cassandra - INFO [StorageServiceShutdownHook] 2019-05-29 07:50:03,933 Server.java:176 - Stop listening for CQL clients
[info] c.g.n.e.c.Cassandra - INFO [StorageServiceShutdownHook] 2019-05-29 07:50:03,934 Gossiper.java:1532 - Announcing shutdown
[info] c.g.n.e.c.Cassandra - INFO [StorageServiceShutdownHook] 2019-05-29 07:50:03,938 StorageService.java:2268 - Node localhost/127.0.0.1 state jump to shutdown
[info] c.g.n.e.c.Cassandra - INFO [StorageServiceShutdownHook] 2019-05-29 07:50:05,941 MessagingService.java:984 - Waiting for messaging service to quiesce
[info] c.g.n.e.c.Cassandra - INFO [ACCEPT-localhost/127.0.0.1] 2019-05-29 07:50:05,948 MessagingService.java:1338 - MessagingService has terminated the accept() thread
[info] c.g.n.e.c.Cassandra - INFO [StorageServiceShutdownHook] 2019-05-29 07:50:06,076 HintsService.java:220 - Paused hints dispatch
[info] c.g.n.e.c.l.WindowsCassandraNode - Successfully sent ctrl+c to process with id: 7276.
[info] c.g.n.e.c.l.WindowsCassandraNode - Apache Cassandra Node '7276' is stopped
[info] c.g.n.e.c.l.LocalCassandraDatabase - Apache Cassandra '3.11.1' is stopped (3490 ms)
[info] c.g.n.e.c.l.LocalCassandraDatabase - The working directory 'C:\Users\manu\AppData\Local\Temp\embedded-cassandra\3.11.1\0d155e04-97d5-4927-87ac-d46824a77c32' was deleted.
[debug] c.g.n.e.c.t.TestCassandra - TestCassandra 3.11.1 is stopped
Unable to start TestCassandra 3.11.1
com.github.nosan.embedded.cassandra.CassandraException: Unable to start TestCassandra 3.11.1
at com.github.nosan.embedded.cassandra.test.TestCassandra.start(TestCassandra.java:128)
at UnitSpecs.RepositorySpecs.UsersRepositorySpecs.$anonfun$new$3(UsersRepositorySpecs.scala:146)
at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
Ideally I should get the error similar to what I would get if I was using cqlsh
Is there a way to get more descriptive errors?
I have tried to reproduce your issue, but no luck.
import com.github.nosan.embedded.cassandra.cql.CqlScript;
import com.github.nosan.embedded.cassandra.test.TestCassandra;
class Scratch {
public static void main(String[] args) {
TestCassandra testCassandra = new TestCassandra(CqlScript.statements(createKeyspace(),
createUserTable()));
testCassandra.start();
try {
System.out.println(testCassandra.getSettings());
}
finally {
testCassandra.stop();
}
}
private static String createUserTable() {
return "CREATE TABLE users ( bucket int, "
+ "email text, "
+ "firstname text, "
+ "lastname text, "
+ "authprovider text, "
+ "password text, "
+ "confirmed boolean, "
+ "id UUID, hasher text, "
+ "salt text, "
+ "PRIMARY KEY ((bucket, email), authprovider,firstname, lastname) )";
}
private static String createKeyspace() {
return "CREATE KEYSPACE test WITH REPLICATION = {'class' : 'SimpleStrategy', 'replication_factor' : 1}";
}
}
Output:
Exception in thread "main" com.github.nosan.embedded.cassandra.CassandraException: Unable to start TestCassandra 3.11.4
at com.github.nosan.embedded.cassandra.test.TestCassandra.start(TestCassandra.java:156)
at com.github.nosan.embedded.cassandra.Scratch.main(Scratch.java:27)
Caused by: com.datastax.oss.driver.api.core.servererrors.InvalidQueryException: No keyspace has been specified. USE a keyspace, or explicitly specify keyspace.tablename
at com.datastax.oss.driver.api.core.servererrors.InvalidQueryException.copy(InvalidQueryException.java:48)
at com.datastax.oss.driver.internal.core.util.concurrent.CompletableFutures.getUninterruptibly(CompletableFutures.java:113)
at com.datastax.oss.driver.internal.core.cql.CqlRequestSyncProcessor.process(CqlRequestSyncProcessor.java:53)
at com.datastax.oss.driver.internal.core.cql.CqlRequestSyncProcessor.process(CqlRequestSyncProcessor.java:30)
at com.datastax.oss.driver.internal.core.session.DefaultSession.execute(DefaultSession.java:207)
at com.datastax.oss.driver.api.core.CqlSession.execute(CqlSession.java:47)
at com.datastax.oss.driver.api.core.CqlSession.execute(CqlSession.java:56)
at com.github.nosan.embedded.cassandra.test.util.CqlUtils.execute(CqlUtils.java:68)
at com.github.nosan.embedded.cassandra.test.util.CqlUtils.execute(CqlUtils.java:47)
at com.github.nosan.embedded.cassandra.test.util.CqlSessionUtils.execute(CqlSessionUtils.java:43)
at com.github.nosan.embedded.cassandra.test.CqlSessionConnection.execute(CqlSessionConnection.java:60)
at com.github.nosan.embedded.cassandra.test.DefaultConnection.execute(DefaultConnection.java:53)
at com.github.nosan.embedded.cassandra.test.TestCassandra.executeScripts(TestCassandra.java:256)
at com.github.nosan.embedded.cassandra.test.TestCassandra.doStart(TestCassandra.java:285)
at com.github.nosan.embedded.cassandra.test.TestCassandra.start(TestCassandra.java:147)
I haven't been able to find my Caused isn't printed but I have found this workaround
try {
testCassandra.start()
println(s"cassandra state is ${testCassandra.getState}")
testCassandra.executeScripts(cqlStatements)
//println(s"result of execution is ${result}")
//val settings = testCassandra.getSettings
//println(s"settings are ${settings}")
} catch {
case e:Exception => {
println(s"exception ${e} caused by ${e.getCause}")
//println(s"caused by ${e.getCause()}")
fail( new Throwable(e.getCause))
}
}finally {
testCassandra.stop()
}
the above prints
org.scalatest.exceptions.TestFailedException was thrown.
ScalaTestFailureLocation: UnitSpecs.RepositorySpecs.UsersRepositorySpecs at (UsersRepositorySpecs.scala:157)
...
Caused by: java.lang.Throwable: com.datastax.driver.core.exceptions.InvalidQueryException: No keyspace has been specified. USE a keyspace, or explicitly specify keyspace.tablename
I found the reason. I wasn't using TestCassandra correctly it seems. I didn't realize that if I create TestCassandra and also specify the cql statements at time of instantiation, the start method runs the queries as well. In my code, I was creating TestCassandra as follows
new TestCassandra(factory,cqlStatements)})
and was calling both start and executeScripts
testCassandra.start()
testCassandra.executeScripts(cqlStatements)
I commented that executeScripts line and I now see both Exception and Caused
I think it would be better if the APIs clearly mention that start has side effect of executing statements as well.

File Tail Inbound Channel Adapter stderr and stdout

I am trying to a tail file using spring integration and it is working as the code below but i have two questions
#Configuration
public class RootConfiguration {
#Bean(name = PollerMetadata.DEFAULT_POLLER)
public PollerMetadata defaultPoller() {
PollerMetadata pollerMetadata = new PollerMetadata();
pollerMetadata.setTrigger(new PeriodicTrigger(10));
return pollerMetadata;
}
#Bean
public MessageChannel input() {
return new QueueChannel(50);
}
#Bean
public FileTailInboundChannelAdapterFactoryBean tailInboundChannelAdapterParser() {
FileTailInboundChannelAdapterFactoryBean x = new FileTailInboundChannelAdapterFactoryBean();
x.setAutoStartup(true);
x.setOutputChannel(input());
x.setTaskExecutor(taskExecutor());
x.setNativeOptions("-F -n +0");
x.setFile(new File("/home/shahbour/Desktop/file.txt"));
return x;
}
#Bean
#ServiceActivator(inputChannel = "input")
public LoggingHandler loggingHandler() {
return new LoggingHandler("info");
}
#Bean
public TaskExecutor taskExecutor() {
ThreadPoolTaskExecutor taskExecutor = new ThreadPoolTaskExecutor();
taskExecutor.setCorePoolSize(4);
taskExecutor.afterPropertiesSet();
return taskExecutor;
}
}
Per below log i have 4 threads used for tailing the file. Do i need all of them or i can disable some. Why do i have a thread for Monitoring process java.lang.UNIXProcess#b37e761 , Reading stderr ,Reading stdout .
I am asking this because i am going to run the program on voip switch and i want to use the minimal resources possible.
2016-12-10 13:22:55.666 INFO 14862 --- [ taskExecutor-1] t.OSDelegatingFileTailingMessageProducer : Starting tail process
2016-12-10 13:22:55.665 INFO 14862 --- [ main] t.OSDelegatingFileTailingMessageProducer : started tailInboundChannelAdapterParser
2016-12-10 13:22:55.682 INFO 14862 --- [ main] o.s.i.endpoint.PollingConsumer : started rootConfiguration.loggingHandler.serviceActivator
2016-12-10 13:22:55.682 INFO 14862 --- [ main] o.s.c.support.DefaultLifecycleProcessor : Starting beans in phase 2147483647
2016-12-10 13:22:55.701 INFO 14862 --- [ main] c.t.SonusbrokerApplication : Started SonusbrokerApplication in 3.84 seconds (JVM running for 4.687)
2016-12-10 13:22:55.703 DEBUG 14862 --- [ taskExecutor-2] t.OSDelegatingFileTailingMessageProducer : Monitoring process java.lang.UNIXProcess#b37e761
2016-12-10 13:22:55.711 DEBUG 14862 --- [ taskExecutor-3] t.OSDelegatingFileTailingMessageProducer : Reading stderr
2016-12-10 13:22:55.711 DEBUG 14862 --- [ taskExecutor-4] t.OSDelegatingFileTailingMessageProducer : Reading stdout
My Second question is , is it possible to start reading the file from the begging and continue to tail , i was thinking in using native options for this -n 1000
note: the real code will be to monitor folder for new files as they are created and then start the tail process
The process monitor is needed to waitFor() the process - it doesn't use any resources except a bit of memory.
The stdout reader is needed to actually handle the data that is produced by the tail command.
The starting thread (in your case taskExecutor-1 exits after it has done its work of starting the other threads).
There is not currently an option to disable the stderr reader, but it would be easy for us to add one so there are only 2 threads at runtime.
Feel free to open a JIRA 'improvement' Issue and, of course, contributions are welcome.

Error while submitting a spark job using spark-jobserver

I face following error occasionally while submitting job. This error goes away if I remove the rootdir of filedao, datadao and sqldao. That means I have to restart the job-server and re-upload my jar.
{
"status": "ERROR",
"result": {
"message": "Ask timed out on [Actor[akka://JobServer/user/context-supervisor/1995aeba-com.spmsoftware.distributed.job.TestJob#-1370794810]] after [10000 ms]. Sender[null] sent message of type \"spark.jobserver.JobManagerActor$StartJob\".",
"errorClass": "akka.pattern.AskTimeoutException",
"stack": ["akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:604)", "akka.actor.Scheduler$$anon$4.run(Scheduler.scala:126)", "scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:601)", "scala.concurrent.BatchingExecutor$class.execute(BatchingExecutor.scala:109)", "scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:599)", "akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(LightArrayRevolverScheduler.scala:331)", "akka.actor.LightArrayRevolverScheduler$$anon$4.executeBucket$1(LightArrayRevolverScheduler.scala:282)", "akka.actor.LightArrayRevolverScheduler$$anon$4.nextTick(LightArrayRevolverScheduler.scala:286)", "akka.actor.LightArrayRevolverScheduler$$anon$4.run(LightArrayRevolverScheduler.scala:238)", "java.lang.Thread.run(Thread.java:745)"]
}
}
My config file is as follows:
# Template for a Spark Job Server configuration file
# When deployed these settings are loaded when job server starts
#
# Spark Cluster / Job Server configuration
# Spark Cluster / Job Server configuration
spark {
# spark.master will be passed to each job's JobContext
master = <spark_master>
# Default # of CPUs for jobs to use for Spark standalone cluster
job-number-cpus = 4
jobserver {
port = 8090
context-per-jvm = false
context-creation-timeout = 100 s
# Note: JobFileDAO is deprecated from v0.7.0 because of issues in
# production and will be removed in future, now defaults to H2 file.
jobdao = spark.jobserver.io.JobSqlDAO
filedao {
rootdir = /tmp/spark-jobserver/filedao/data
}
datadao {
rootdir = /tmp/spark-jobserver/upload
}
sqldao {
slick-driver = slick.driver.H2Driver
jdbc-driver = org.h2.Driver
rootdir = /tmp/spark-jobserver/sqldao/data
jdbc {
url = "jdbc:h2:file:/tmp/spark-jobserver/sqldao/data/h2-db"
user = ""
password = ""
}
dbcp {
enabled = false
maxactive = 20
maxidle = 10
initialsize = 10
}
}
result-chunk-size = 1m
short-timeout = 60 s
}
context-settings {
num-cpu-cores = 2 # Number of cores to allocate. Required.
memory-per-node = 512m # Executor memory per node, -Xmx style eg 512m, #1G, etc.
}
}
akka {
remote.netty.tcp {
# This controls the maximum message size, including job results, that can be sent
# maximum-frame-size = 200 MiB
}
}
# check the reference.conf in spray-can/src/main/resources for all defined settings
spray.can.server.parsing.max-content-length = 250m
I am using spark-2.0-preview version.
I have faced the same error before and was related with timeout, for sure is an syncronus request (sync=true) togheter you must provide the timeout (in seconds) who is a value relative with how long it takes to process your request.
This an example how the request should look like:
curl -k --basic -d '' 'http://localhost:5050/jobs?appName=app&classPath=Main&context=test-context&sync=true&timeout=40'
if your request needs more than 40 seconds maybe you also need to modify the application.conf located on
spark-jobserver-master/job-server/src/main/resources/application.conf
ànd on the spray.can.server section modify:
idle-timeout = 210 s
request-timeout = 200 s

Resources