Exception encountered during startup: Unknown commitlog version 6 - linux

Cannot start cassandra in DataStax Enterprise (single node scenario)
runing $ sudo service dse start I get:
Starting DSE daemon : dse
DSE daemon starting with just Cassandra enabled (edit /etc/default/dse to enable)
running $ nodetool status
Failed to connect to '127.0.0.1:7199': Connection refused (Connection refused)
from output.log:
ERROR 08:03:24,900 Exception encountered during startup
java.lang.IllegalStateException: Unknown commitlog version 6
at org.apache.cassandra.db.commitlog.CommitLogDescriptor.getMessagingVersion(CommitLogDescriptor.java:80)
at org.apache.cassandra.db.commitlog.CommitLogReplayer.recover(CommitLogReplayer.java:182)
at org.apache.cassandra.db.commitlog.CommitLogReplayer.recover(CommitLogReplayer.java:95)
at org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:151)
at org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:131)
at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:336)
at com.datastax.bdp.server.DseDaemon.setup(DseDaemon.java:394)
at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:496)
at com.datastax.bdp.server.DseDaemon.main(DseDaemon.java:574)
java.lang.IllegalStateException: Unknown commitlog version 6
at org.apache.cassandra.db.commitlog.CommitLogDescriptor.getMessagingVersion(CommitLogDescriptor.java:80)
at org.apache.cassandra.db.commitlog.CommitLogReplayer.recover(CommitLogReplayer.java:182)
at org.apache.cassandra.db.commitlog.CommitLogReplayer.recover(CommitLogReplayer.java:95)
at org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:151)
at org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:131)
at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:336)
at com.datastax.bdp.server.DseDaemon.setup(DseDaemon.java:394)
at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:496)
at com.datastax.bdp.server.DseDaemon.main(DseDaemon.java:574)
Exception encountered during startup: Unknown commitlog version 6
INFO 08:03:24,915 DSE shutting down...
What can I do ? any suggestion are wellcome

Related

Error on starting worker nodes in spark standalone cluster

I am trying to setup a spark standalone cluster with 3 nodes. Configurations for Linux servers are below:
master node with 2 core and 25GB memory
worker node 1 with 4 core and 21GB memory
worker node 2 with 8 core and 19GB memory
I have started the master node successfully and its url is spark://IP:7077
when I start any of the worker node with command ./sbin/start-worker.sh spark://IP:7077 I get the below error message:
22/11/25 09:13:58 INFO Worker: Connecting to master IP:7077...
22/11/25 09:14:54 ERROR RpcOutboxMessage: Ask terminated before connecting successfully
22/11/25 09:14:54 WARN NettyRpcEnv: Ignored failure: java.io.IOException: Connecting to /IP:7077 timed out (120000 ms)
22/11/25 09:14:54 WARN Worker: Failed to connect to master IP:7077
org.apache.spark.SparkException: Exception thrown in awaitResult:
openjdk 11.0.17 is the java version installed on all the 3 nodes
Any solutions to resolve this issue will be helpful.

Ambari Hadoop Spark Cluster Firewall Issues

I have just rolled out a Hadoop/Spark cluster in efforts to kick start a data science program at my company. I used Ambari as the manager and installed the Hortonworks distribution (HDFS 2.7.3, Hive 1.2.1, Spark 2.1.1, as well as the other required services. By the way, I am running RHEL 7. I have 2 name nodes, 10 data nodes, 1 hive node and 1 management node (Ambari).
I built a list of firewall ports based on Apache and Ambari documentation and had my infrastructure guys push those rules. I ran into an issue with Spark wanting to pick random ports. When I attempted to run a Spark job (the traditional Pi example), it would fail, as I did not have the whole ephemeral port range open. Since we will probably be running multiple jobs, it makes sense to let Spark handle this and just choose from the ephemeral range of ports (1024 - 65535) rather than specifying a single port. I know I can pick a range, but to make it easy I just asked my guys to open the whole ephemeral range. At first my infrastructure guys balked at that, but when I told them the purpose, they went ahead and did so.
Based on that, I thought I had my issue fixed, but when I attempt to run a job, it still fails with:
Log Type: stderr
Log Upload Time: Thu Oct 12 11:31:01 -0700 2017
Log Length: 14617
Showing 4096 bytes of 14617 total. Click here for the full log.
Failed to connect to driver at 10.10.98.191:33937, retrying ...
17/10/12 11:28:52 ERROR ApplicationMaster: Failed to connect to driver at 10.10.98.191:33937, retrying ...
17/10/12 11:28:53 ERROR ApplicationMaster: Failed to connect to driver at 10.10.98.191:33937, retrying ...
17/10/12 11:28:54 ERROR ApplicationMaster: Failed to connect to driver at 10.10.98.191:33937, retrying ...
17/10/12 11:28:55 ERROR ApplicationMaster: Failed to connect to driver at 10.10.98.191:33937, retrying ...
17/10/12 11:28:56 ERROR ApplicationMaster: Failed to connect to driver at 10.10.98.191:33937, retrying ...
17/10/12 11:28:57 ERROR ApplicationMaster: Failed to connect to driver at 10.10.98.191:33937, retrying ...
17/10/12 11:28:57 ERROR ApplicationMaster: Failed to connect to driver at 10.10.98.191:33937, retrying ...
17/10/12 11:28:59 ERROR ApplicationMaster: Failed to connect to driver at 10.10.98.191:33937, retrying ...
17/10/12 11:29:00 ERROR ApplicationMaster: Failed to connect to driver at 10.10.98.191:33937, retrying ...
17/10/12 11:29:01 ERROR ApplicationMaster: Failed to connect to driver at 10.10.98.191:33937, retrying ...
17/10/12 11:29:02 ERROR ApplicationMaster: Failed to connect to driver at 10.10.98.191:33937, retrying ...
17/10/12 11:29:03 ERROR ApplicationMaster: Failed to connect to driver at 10.10.98.191:33937, retrying ...
17/10/12 11:29:04 ERROR ApplicationMaster: Failed to connect to driver at 10.10.98.191:33937, retrying ...
17/10/12 11:29:05 ERROR ApplicationMaster: Failed to connect to driver at 10.10.98.191:33937, retrying ...
17/10/12 11:29:06 ERROR ApplicationMaster: Failed to connect to driver at 10.10.98.191:33937, retrying ...
17/10/12 11:29:06 ERROR ApplicationMaster: Failed to connect to driver at 10.10.98.191:33937, retrying ...
17/10/12 11:29:07 ERROR ApplicationMaster: Failed to connect to driver at 10.10.98.191:33937, retrying ...
17/10/12 11:29:09 ERROR ApplicationMaster: Failed to connect to driver at 10.10.98.191:33937, retrying ...
17/10/12 11:29:10 ERROR ApplicationMaster: Failed to connect to driver at 10.10.98.191:33937, retrying ...
17/10/12 11:29:11 ERROR ApplicationMaster: Failed to connect to driver at 10.10.98.191:33937, retrying ...
17/10/12 11:29:12 ERROR ApplicationMaster: Failed to connect to driver at 10.10.98.191:33937, retrying ...
17/10/12 11:29:13 ERROR ApplicationMaster: Failed to connect to driver at 10.10.98.191:33937, retrying ...
17/10/12 11:29:14 ERROR ApplicationMaster: Failed to connect to driver at 10.10.98.191:33937, retrying ...
17/10/12 11:29:15 ERROR ApplicationMaster: Failed to connect to driver at 10.10.98.191:33937, retrying ...
17/10/12 11:29:15 ERROR ApplicationMaster: Uncaught exception:
org.apache.spark.SparkException: Failed to connect to driver!
at org.apache.spark.deploy.yarn.ApplicationMaster.waitForSparkDriver(ApplicationMaster.scala:607)
at org.apache.spark.deploy.yarn.ApplicationMaster.runExecutorLauncher(ApplicationMaster.scala:461)
at org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:283)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$main$1.apply$mcV$sp(ApplicationMaster.scala:783)
at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:67)
at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:66)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)
at org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:66)
at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:781)
at org.apache.spark.deploy.yarn.ExecutorLauncher$.main(ApplicationMaster.scala:804)
at org.apache.spark.deploy.yarn.ExecutorLauncher.main(ApplicationMaster.scala)
17/10/12 11:29:15 INFO ApplicationMaster: Final app status: FAILED, exitCode: 10, (reason: Uncaught exception: org.apache.spark.SparkException: Failed to connect to driver!)
17/10/12 11:29:15 INFO ShutdownHookManager: Shutdown hook called
At first I thought maybe I had some sort of misconfiguration with Spark and the namenodes/datanodes. However, to test it, I simply stopped firewalld on every node and attempted the job again and it worked just fine.
So, my question - I have the entire 1024 - 65535 port range open - I can see the Spark drivers are trying to connect on those high ports (as show above - 30k - 40k range). However, for some reason when the firewall is on, it fails and when its off it works. I checked the firewall rules and sure enough, the ports are open - and those rules are working as I can access the web services for Ambair, Yarn and HFDS which are specified in the same firewalld xml rules file....
I am new to Hadoop/Spark, so I am wondering is there something I am missing? Is there some lower port under 1024 I need to account for? Here is a list of the ports below 1024 I have open, in addition to the 1024 - 65535 port range:
88
111
443
1004
1006
1019
It's quite possible I missed a lower number port that I really need and just don't know it. Above that, everything else should be handled by the 1024 - 65535 port range.
Ok, so working with some folks on the Hortonworks community, I was was able to come up with a solution. Basically, you need to have at least one port defined, but you can extend that by specifying the spark.port.MaxRetries = xxxxx. By combining this setting along with the spark.driver.port = xxxxx, you can have a range starting at the spark.driver.port and ending with the spark.port.maxRetries.
If you use Ambari as your manager, the settings are under "Custom spark2-defaults" section (I assume under a fully open-source stack install, this would just be a setting under the normal Spark config):
I was advised to separate out these ports by 32 count blocks, for example, if you start at 40000 for your driver, you should start the spark.blockManager.port at 40033, etc.. See this post at:
https://community.hortonworks.com/questions/141484/spark-jobs-failing-firewall-issue.html

Can we connect to Spark cluster from remote host via java program?

I am trying to connect to spark cluster from remote system. My java code is shown below.
JAVA Code
new SparkConf()
.setAppName("Java API demo")
.setMaster("spark://192.168.XX.XX:7077")
.set("spark.driver.host","192.168.XX.XX")
.set("spark.driver.port","9929");
It gives me this error:
Error Message
16/04/14 16:08:42 ERROR NettyTransport: failed to bind to /192.168.XX.XX:9929, shutting down Netty transport
16/04/14 16:08:42 WARN Utils: Service 'sparkDriver' could not bind on port 9929. Attempting port 9930.
16/04/14 16:08:42 INFO RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon.
First of all, is this possible in spark? I am using Spark 1.4 version. Thanks in advcance..

spark-submit cluster mode is not working

I am getting an error in launching the standalone Spark driver in cluster mode. As per the documentation, it is noted that cluster mode is supported in the Spark 1.2.1 release. However, it is currently not working properly for me. Please help me in fixing the issue(s) that are preventing the proper functioning of Spark.
I have 3 node spark cluster node1 , node2 and node 3
I running below command on node 1 for deploying driver
/usr/local/spark-1.2.1-bin-hadoop2.4/bin/spark-submit --class com.fst.firststep.aggregator.FirstStepMessageProcessor --master spark://ec2-xx-xx-xx-xx.compute-1.amazonaws.com:7077 --deploy-mode cluster --supervise file:///home/xyz/sparkstreaming-0.0.1-SNAPSHOT.jar /home/xyz/config.properties
driver gets launched on node 2 in cluster. but getting exception on node 2 that it is trying to bind to node 1 ip.
2015-02-26 08:47:32 DEBUG AkkaUtils:63 - In createActorSystem, requireCookie is: off
2015-02-26 08:47:32 INFO Slf4jLogger:80 - Slf4jLogger started
2015-02-26 08:47:33 ERROR NettyTransport:65 - failed to bind to ec2-xx.xx.xx.xx.compute-1.amazonaws.com/xx.xx.xx.xx:0, shutting down Netty transport
2015-02-26 08:47:33 WARN Utils:71 - Service 'Driver' could not bind on port 0. Attempting port 1.
2015-02-26 08:47:33 DEBUG AkkaUtils:63 - In createActorSystem, requireCookie is: off
2015-02-26 08:47:33 ERROR Remoting:65 - Remoting error: [Startup failed] [
akka.remote.RemoteTransportException: Startup failed
at akka.remote.Remoting.akka$remote$Remoting$$notifyError(Remoting.scala:136)
at akka.remote.Remoting.start(Remoting.scala:201)
at akka.remote.RemoteActorRefProvider.init(RemoteActorRefProvider.scala:184)
at akka.actor.ActorSystemImpl.liftedTree2$1(ActorSystem.scala:618)
at akka.actor.ActorSystemImpl._start$lzycompute(ActorSystem.scala:615)
at akka.actor.ActorSystemImpl._start(ActorSystem.scala:615)
at akka.actor.ActorSystemImpl.start(ActorSystem.scala:632)
at akka.actor.ActorSystem$.apply(ActorSystem.scala:141)
at akka.actor.ActorSystem$.apply(ActorSystem.scala:118)
at org.apache.spark.util.AkkaUtils$.org$apache$spark$util$AkkaUtils$$doCreateActorSystem(AkkaUtils.scala:121)
at org.apache.spark.util.AkkaUtils$$anonfun$1.apply(AkkaUtils.scala:54)
at org.apache.spark.util.AkkaUtils$$anonfun$1.apply(AkkaUtils.scala:53)
at org.apache.spark.util.Utils$$anonfun$startServiceOnPort$1.apply$mcVI$sp(Utils.scala:1765)
at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141)
at org.apache.spark.util.Utils$.startServiceOnPort(Utils.scala:1756)
at org.apache.spark.util.AkkaUtils$.createActorSystem(AkkaUtils.scala:56)
at org.apache.spark.deploy.worker.DriverWrapper$.main(DriverWrapper.scala:33)
at org.apache.spark.deploy.worker.DriverWrapper.main(DriverWrapper.scala)
Caused by: org.jboss.netty.channel.ChannelException: Failed to bind to: ec2-xx-xx-xx.compute-1.amazonaws.com/xx.xx.xx.xx:0
at org.jboss.netty.bootstrap.ServerBootstrap.bind(ServerBootstrap.java:272)
at akka.remote.transport.netty.NettyTransport$$anonfun$listen$1.apply(NettyTransport.scala:393)
at akka.remote.transport.netty.NettyTransport$$anonfun$listen$1.apply(NettyTransport.scala:389)
at scala.util.Success$$anonfun$map$1.apply(Try.scala:206)
at scala.util.Try$.apply(Try.scala:161)
at scala.util.Success.map(Try.scala:206)
kindly suggest
Thanks`enter code here`
It is not possible to bind to port 0. There is/are errors in your spark configuration. Specifically look at the
spark.webui.port
It is probably set to 0.

Spark 1.2.1 standalone cluster mode spark-submit is not working

I have 3 node spark cluster
node1 , node2 and node 3
I running below command on node 1 for deploying driver
/usr/local/spark-1.2.1-bin-hadoop2.4/bin/spark-submit --class com.fst.firststep.aggregator.FirstStepMessageProcessor --master spark://ec2-xx-xx-xx-xx.compute-1.amazonaws.com:7077 --deploy-mode cluster --supervise file:///home/xyz/sparkstreaming-0.0.1-SNAPSHOT.jar /home/xyz/config.properties
driver gets launched on node 2 in cluster. but getting exception on node 2 that it is trying to bind to node 1 ip.
2015-02-26 08:47:32 DEBUG AkkaUtils:63 - In createActorSystem, requireCookie is: off
2015-02-26 08:47:32 INFO Slf4jLogger:80 - Slf4jLogger started
2015-02-26 08:47:33 ERROR NettyTransport:65 - failed to bind to ec2-xx.xx.xx.xx.compute-1.amazonaws.com/xx.xx.xx.xx:0, shutting down Netty transport
2015-02-26 08:47:33 WARN Utils:71 - Service 'Driver' could not bind on port 0. Attempting port 1.
2015-02-26 08:47:33 DEBUG AkkaUtils:63 - In createActorSystem, requireCookie is: off
2015-02-26 08:47:33 ERROR Remoting:65 - Remoting error: [Startup failed] [
akka.remote.RemoteTransportException: Startup failed
at akka.remote.Remoting.akka$remote$Remoting$$notifyError(Remoting.scala:136)
at akka.remote.Remoting.start(Remoting.scala:201)
at akka.remote.RemoteActorRefProvider.init(RemoteActorRefProvider.scala:184)
at akka.actor.ActorSystemImpl.liftedTree2$1(ActorSystem.scala:618)
at akka.actor.ActorSystemImpl._start$lzycompute(ActorSystem.scala:615)
at akka.actor.ActorSystemImpl._start(ActorSystem.scala:615)
at akka.actor.ActorSystemImpl.start(ActorSystem.scala:632)
at akka.actor.ActorSystem$.apply(ActorSystem.scala:141)
at akka.actor.ActorSystem$.apply(ActorSystem.scala:118)
at org.apache.spark.util.AkkaUtils$.org$apache$spark$util$AkkaUtils$$doCreateActorSystem(AkkaUtils.scala:121)
at org.apache.spark.util.AkkaUtils$$anonfun$1.apply(AkkaUtils.scala:54)
at org.apache.spark.util.AkkaUtils$$anonfun$1.apply(AkkaUtils.scala:53)
at org.apache.spark.util.Utils$$anonfun$startServiceOnPort$1.apply$mcVI$sp(Utils.scala:1765)
at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141)
at org.apache.spark.util.Utils$.startServiceOnPort(Utils.scala:1756)
at org.apache.spark.util.AkkaUtils$.createActorSystem(AkkaUtils.scala:56)
at org.apache.spark.deploy.worker.DriverWrapper$.main(DriverWrapper.scala:33)
at org.apache.spark.deploy.worker.DriverWrapper.main(DriverWrapper.scala)
Caused by: org.jboss.netty.channel.ChannelException: Failed to bind to: ec2-xx-xx-xx.compute-1.amazonaws.com/xx.xx.xx.xx:0
at org.jboss.netty.bootstrap.ServerBootstrap.bind(ServerBootstrap.java:272)
at akka.remote.transport.netty.NettyTransport$$anonfun$listen$1.apply(NettyTransport.scala:393)
at akka.remote.transport.netty.NettyTransport$$anonfun$listen$1.apply(NettyTransport.scala:389)
at scala.util.Success$$anonfun$map$1.apply(Try.scala:206)
at scala.util.Try$.apply(Try.scala:161)
at scala.util.Success.map(Try.scala:206)
kindly suggest
Thanks
after spending lot more time.i got the answer.i did below changes
remove entry of SPARK_LOCAL_IP and SPARK_MASTER_IP
add host name and private ip address of each other nodes in etc/hosts.
use --deploy-mode cluster --supervise
thats all and it works perfectly with fully HA components(Master,Slaves and Driver)
Thanks
Cluster mode is not supported in EC2 1.2 instances where it creates a standalone cluster. Hence you can try removing
--deploy-mode cluster --supervise

Resources