Cassandra nodetool repair is getting stuck sometimes - cassandra

I am running nodetool repair -pr -full my_ks my_tbl in our Cassandra cluster(has two DCs).
It sometimes hangs with below debug logs. It works after restarting the Cassandra process. Any hints on the root cause of this issue?
DEBUG [GossipStage:1] 2020-07-13 14:04:22,818 FailureDetector.java:456 - Ignoring interval time of 2571566434 for /10.22.38.223
DEBUG [GossipStage:1] 2020-07-13 14:04:22,818 FailureDetector.java:456 - Ignoring interval time of 2495429260 for /10.22.38.26
DEBUG [GossipStage:1] 2020-07-13 14:04:22,818 FailureDetector.java:456 - Ignoring interval time of 2571592685 for /10.32.146.85
INFO [Thread-181] 2020-07-13 14:04:22,900 RepairRunnable.java:125 - Starting repair command #2, repairing keyspace my_ks with repair options (parallelism: parallel, primary range: true, incremental: false, job threads: 1, ColumnFamilies: [my_tbl], dataCenters: [], hosts: [], # of ranges: 256)
INFO [HANDSHAKE-/10.32.146.85] 2020-07-13 14:04:23,460 OutboundTcpConnection.java:515 - Handshaking version with /10.32.146.85
DEBUG [GossipStage:1] 2020-07-13 14:04:23,716 FailureDetector.java:456 - Ignoring interval time of 2000838464 for /10.22.38.27
DEBUG [GossipStage:1] 2020-07-13 14:04:23,716 FailureDetector.java:456 - Ignoring interval time of 2000923736 for /10.22.38.68
DEBUG [GossipStage:1] 2020-07-13 14:04:23,815 FailureDetector.java:456 - Ignoring interval time of 2100571952 for /10.32.253.232
DEBUG [GossipStage:1] 2020-07-13 14:04:25,247 FailureDetector.java:456 - Ignoring interval time of 2429005356 for /10.32.144.198
I am using Cassandra 3.9.
Edit: I see below logs with trace enabled:
INFO [HANDSHAKE-/10.32.168.76] 2020-07-18 02:12:41,253 OutboundTcpConnection.java:515 - Handshaking version with /10.32.168.76
INFO [HANDSHAKE-/10.32.142.195] 2020-07-18 02:12:42,260 OutboundTcpConnection.java:515 - Handshaking version with /10.32.142.195
INFO [HANDSHAKE-/10.32.144.198] 2020-07-18 02:12:42,260 OutboundTcpConnection.java:515 - Handshaking version with /10.32.144.198
ERROR [RepairTracePolling] 2020-07-18 02:12:45,836 CassandraDaemon.java:226 - Exception in thread Thread[RepairTracePolling,5,RMI Runtime]
org.apache.cassandra.exceptions.ReadTimeoutException: Operation timed out - received only 0 responses.
at org.apache.cassandra.service.ReadCallback.awaitResults(ReadCallback.java:132) ~[apache-cassandra-3.9.jar:3.9]
at org.apache.cassandra.service.ReadCallback.get(ReadCallback.java:137) ~[apache-cassandra-3.9.jar:3.9]
at org.apache.cassandra.service.AbstractReadExecutor.get(AbstractReadExecutor.java:145) ~[apache-cassandra-3.9.jar:3.9]
at org.apache.cassandra.service.StorageProxy$SinglePartitionReadLifecycle.awaitResultsAndRetryOnDigestMismatch(StorageProxy.java:1718) ~[apache-cassandra-3.9.jar:3.9]
at org.apache.cassandra.service.StorageProxy.fetchRows(StorageProxy.java:1667) ~[apache-cassandra-3.9.jar:3.9]
at org.apache.cassandra.service.StorageProxy.readRegular(StorageProxy.java:1608) ~[apache-cassandra-3.9.jar:3.9]
at org.apache.cassandra.service.StorageProxy.read(StorageProxy.java:1527) ~[apache-cassandra-3.9.jar:3.9]
at org.apache.cassandra.db.SinglePartitionReadCommand$Group.execute(SinglePartitionReadCommand.java:975) ~[apache-cassandra-3.9.jar:3.9]
at org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:271) ~[apache-cassandra-3.9.jar:3.9]
at org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:232) ~[apache-cassandra-3.9.jar:3.9]
at org.apache.cassandra.repair.RepairRunnable$4.runMayThrow(RepairRunnable.java:412) ~[apache-cassandra-3.9.jar:3.9]
at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) ~[apache-cassandra-3.9.jar:3.9]
at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_72]

Related

Cassandra 3.10 - LEAK DETECTED & Stream failed

I want to add or remove nodes to the cluster.
When I tring to add/remove node, I get a LEAK DETECTED and a STREAM FAILED ERROR messange.
If I drop the index - pushcapabilityindx,presetsearchval - before adding / removing nodes, the node add / remove succeeds.
If there is no data update, the data of abc_db.sub is automatically deleted after 24 hours. (TTL 86400 sec)
Also, if I do not have any work including index deletion, add / remove nodes succeeds normally after 14 days.
Where should I start troubleshooting this error?
Ubuntu 16.04.3 LTS
[cqlsh 5.0.1 | Cassandra 3.10 | CQL spec 3.4.4 | Native protocol v4]
CREATE TABLE abc_db.sub (
phoneno text,
deviceid text,
subid text,
callbackdata text,
corelator text,
duration int,
local_index bigint,
phase int,
presetsearchvalue text,
pushcapability list<text>,
pushtoken text,
pushtype text,
searchcriteria frozen<typesearchcriteria>,
PRIMARY KEY (phoneno, deviceid, subid)
) WITH CLUSTERING ORDER BY (deviceid ASC, subid ASC)
AND bloom_filter_fp_chance = 0.01
AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
AND comment = ''
AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
AND crc_check_chance = 1.0
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 0
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99PERCENTILE';
CREATE INDEX pushcapabilityindx ON abc_db.sub (values(pushcapability));
CREATE INDEX presetsearchval ON abc_db.sub (presetsearchvalue);
INFO [main] 2019-05-09 07:57:15,741 StorageService.java:1435 - JOINING: waiting for schema information to complete
INFO [main] 2019-05-09 07:57:16,497 StorageService.java:1435 - JOINING: schema complete, ready to bootstrap
INFO [main] 2019-05-09 07:57:16,497 StorageService.java:1435 - JOINING: waiting for pending range calculation
INFO [main] 2019-05-09 07:57:16,497 StorageService.java:1435 - JOINING: calculation complete, ready to bootstrap
INFO [main] 2019-05-09 07:57:16,498 StorageService.java:1435 - JOINING: getting bootstrap token
INFO [main] 2019-05-09 07:57:16,531 StorageService.java:1435 - JOINING: sleeping 30000 ms for pending range setup
INFO [main] 2019-05-09 07:57:46,532 StorageService.java:1435 - JOINING: Starting to bootstrap...
INFO [main] 2019-05-09 07:57:47,775 StreamResultFuture.java:90 - [Stream #21f71b70-7230-11e9-9495-ab5d6d9b9488] Executing streaming plan for Bootstrap
INFO [StreamConnectionEstablisher:1] 2019-05-09 07:57:47,783 StreamSession.java:266 - [Stream #21f71b70-7230-11e9-9495-ab5d6d9b9488] Starting streaming to /172.50.20.10
INFO [StreamConnectionEstablisher:1] 2019-05-09 07:57:47,786 StreamCoordinator.java:264 - [Stream #21f71b70-7230-11e9-9495-ab5d6d9b9488, ID#0] Beginning stream session with /172.50.20.10
INFO [STREAM-IN-/172.50.20.10:5000] 2019-05-09 07:57:48,887 StreamResultFuture.java:173 - [Stream #21f71b70-7230-11e9-9495-ab5d6d9b9488 ID#0] Prepare completed. Receiving 261 files(1012.328MiB), sending 0 files(0.000KiB)
INFO [StreamConnectionEstablisher:2] 2019-05-09 07:57:48,891 StreamSession.java:266 - [Stream #21f71b70-7230-11e9-9495-ab5d6d9b9488] Starting streaming to /172.50.22.10
INFO [StreamConnectionEstablisher:2] 2019-05-09 07:57:48,893 StreamCoordinator.java:264 - [Stream #21f71b70-7230-11e9-9495-ab5d6d9b9488, ID#0] Beginning stream session with /172.50.22.10
INFO [STREAM-IN-/172.50.22.10:5000] 2019-05-09 07:57:50,020 StreamResultFuture.java:173 - [Stream #21f71b70-7230-11e9-9495-ab5d6d9b9488 ID#0] Prepare completed. Receiving 254 files(1.286GiB), sending 0 files(0.000KiB)
INFO [StreamConnectionEstablisher:3] 2019-05-09 07:57:50,022 StreamSession.java:266 - [Stream #21f71b70-7230-11e9-9495-ab5d6d9b9488] Starting streaming to /172.50.21.10
INFO [StreamConnectionEstablisher:3] 2019-05-09 07:57:50,025 StreamCoordinator.java:264 - [Stream #21f71b70-7230-11e9-9495-ab5d6d9b9488, ID#0] Beginning stream session with /172.50.21.10
INFO [STREAM-IN-/172.50.21.10:5000] 2019-05-09 07:57:50,998 StreamResultFuture.java:173 - [Stream #21f71b70-7230-11e9-9495-ab5d6d9b9488 ID#0] Prepare completed. Receiving 114 files(1.085GiB), sending 0 files(0.000KiB)
INFO [StreamReceiveTask:1] 2019-05-09 07:58:02,509 SecondaryIndexManager.java:365 - Submitting index build of pushcapabilityindx,presetsearchval for data in BigTableReader(path='/cassandra/data/abc_db/sub-28f738d0627f11e9b62cc50700a5eaee/mc-1-big-Data.db'),BigTableReader(path='/cassandra/data/abc_db/sub-28f738d0627f11e9b62cc50700a5eaee/mc-2-big-Data.db'),BigTableReader(path='/cassandra/data/abc_db/sub-28f738d0627f11e9b62cc50700a5eaee/mc-3-big-Data.db')
INFO [StreamReceiveTask:1] 2019-05-09 07:58:02,519 SecondaryIndexManager.java:385 - Index build of pushcapabilityindx,presetsearchval complete
INFO [StreamReceiveTask:1] 2019-05-09 07:58:11,213 SecondaryIndexManager.java:365 - Submitting index build of pushcapabilityindx,presetsearchval for data in BigTableReader(path='/cassandra/data/abc_db/sub-28f738d0627f11e9b62cc50700a5eaee/mc-4-big-Data.db'),BigTableReader(path='/cassandra/data/abc_db/sub-28f738d0627f11e9b62cc50700a5eaee/mc-5-big-Data.db'),BigTableReader(path='/cassandra/data/abc_db/sub-28f738d0627f11e9b62cc50700a5eaee/mc-6-big-Data.db'),BigTableReader(path='/cassandra/data/abc_db/sub-28f738d0627f11e9b62cc50700a5eaee/mc-7-big-Data.db'),BigTableReader(path='/cassandra/data/abc_db/sub-28f738d0627f11e9b62cc50700a5eaee/mc-8-big-Data.db'),BigTableReader(path='/cassandra/data/abc_db/sub-28f738d0627f11e9b62cc50700a5eaee/mc-9-big-Data.db')
ERROR [StreamReceiveTask:1] 2019-05-09 07:58:11,295 StreamSession.java:593 - [Stream #21f71b70-7230-11e9-9495-ab5d6d9b9488] Streaming error occurred on session with peer 172.50.22.10
java.lang.RuntimeException: java.util.concurrent.ExecutionException: java.util.NoSuchElementException
at org.apache.cassandra.utils.Throwables.maybeFail(Throwables.java:51) ~[apache-cassandra-3.10.jar:3.10]
at org.apache.cassandra.utils.FBUtilities.waitOnFutures(FBUtilities.java:393) ~[apache-cassandra-3.10.jar:3.10]
at org.apache.cassandra.index.SecondaryIndexManager.buildIndexesBlocking(SecondaryIndexManager.java:382) ~[apache-cassandra-3.10.jar:3.10]
at org.apache.cassandra.index.SecondaryIndexManager.buildAllIndexesBlocking(SecondaryIndexManager.java:269) ~[apache-cassandra-3.10.jar:3.10]
at org.apache.cassandra.streaming.StreamReceiveTask$OnCompletionRunnable.run(StreamReceiveTask.java:215) ~[apache-cassandra-3.10.jar:3.10]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_191]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) [na:1.8.0_191]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [na:1.8.0_191]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [na:1.8.0_191]
at org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:79) [apache-cassandra-3.10.jar:3.10]
at java.lang.Thread.run(Thread.java:748) ~[na:1.8.0_191]
Caused by: java.util.concurrent.ExecutionException: java.util.NoSuchElementException
at java.util.concurrent.FutureTask.report(FutureTask.java:122) [na:1.8.0_191]
at java.util.concurrent.FutureTask.get(FutureTask.java:192) [na:1.8.0_191]
at org.apache.cassandra.utils.FBUtilities.waitOnFutures(FBUtilities.java:386) ~[apache-cassandra-3.10.jar:3.10]
... 9 common frames omitted
Caused by: java.util.NoSuchElementException: null
at org.apache.cassandra.utils.AbstractIterator.next(AbstractIterator.java:64) ~[apache-cassandra-3.10.jar:3.10]
at org.apache.cassandra.index.SecondaryIndexManager.lambda$indexPartition$20(SecondaryIndexManager.java:618) ~[apache-cassandra-3.10.jar:3.10]
at java.lang.Iterable.forEach(Iterable.java:75) ~[na:1.8.0_191]
at org.apache.cassandra.index.SecondaryIndexManager.indexPartition(SecondaryIndexManager.java:618) ~[apache-cassandra-3.10.jar:3.10]
at org.apache.cassandra.index.internal.CollatedViewIndexBuilder.build(CollatedViewIndexBuilder.java:71) ~[apache-cassandra-3.10.jar:3.10]
at org.apache.cassandra.db.compaction.CompactionManager$14.run(CompactionManager.java:1587) ~[apache-cassandra-3.10.jar:3.10]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_191]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) [na:1.8.0_191]
... 6 common frames omitted
ERROR [STREAM-IN-/172.50.22.10:5000] 2019-05-09 07:58:11,305 StreamSession.java:593 - [Stream #21f71b70-7230-11e9-9495-ab5d6d9b9488] Streaming error occurred on session with peer 172.50.22.10
java.lang.RuntimeException: Outgoing stream handler has been closed
at org.apache.cassandra.streaming.ConnectionHandler.sendMessage(ConnectionHandler.java:143) ~[apache-cassandra-3.10.jar:3.10]
at org.apache.cassandra.streaming.StreamSession.receive(StreamSession.java:655) ~[apache-cassandra-3.10.jar:3.10]
at org.apache.cassandra.streaming.StreamSession.messageReceived(StreamSession.java:523) ~[apache-cassandra-3.10.jar:3.10]
at org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:317) ~[apache-cassandra-3.10.jar:3.10]
at java.lang.Thread.run(Thread.java:748) [na:1.8.0_191]
INFO [StreamReceiveTask:1] 2019-05-09 07:58:11,310 StreamResultFuture.java:187 - [Stream #21f71b70-7230-11e9-9495-ab5d6d9b9488] Session with /172.50.22.10 is complete
ERROR [Reference-Reaper:1] 2019-05-09 07:58:19,115 Ref.java:224 - LEAK DETECTED: a reference (org.apache.cassandra.utils.concurrent.Ref$State#441b19c1) to class org.apache.cassandra.utils.concurrent.WrappedSharedCloseable$Tidy#1255997234:[[OffHeapBitSet]] was not released before the reference was garbage collected
ERROR [Reference-Reaper:1] 2019-05-09 07:58:19,115 Ref.java:224 - LEAK DETECTED: a reference (org.apache.cassandra.utils.concurrent.Ref$State#4ee372b1) to class org.apache.cassandra.io.util.SafeMemory$MemoryTidy#199357705:Memory#[7f12a4ad4970..7f12a4ad4a10) was not released before the reference was garbage collected
ERROR [Reference-Reaper:1] 2019-05-09 07:58:19,115 Ref.java:224 - LEAK DETECTED: a reference (org.apache.cassandra.utils.concurrent.Ref$State#172830a5) to class org.apache.cassandra.io.util.SafeMemory$MemoryTidy#983595037:Memory#[7f12a4b3f670..7f12a4b3f990) was not released before the reference was garbage collected
ERROR [Reference-Reaper:1] 2019-05-09 07:58:19,116 Ref.java:224 - LEAK DETECTED: a reference (org.apache.cassandra.utils.concurrent.Ref$State#6f83e302) to class org.apache.cassandra.io.util.SafeMemory$MemoryTidy#1808665158:Memory#[7f12a41a56d0..7f12a41a56d4) was not released before the reference was garbage collected
INFO [StreamReceiveTask:1] 2019-05-09 07:58:40,657 SecondaryIndexManager.java:365 - Submitting index build of groupchatid_idx_giinfo for data in BigTableReader(path='/cassandra/data/srib_storeserver_objectstore_db/groupinfoobjects-28134170627f11e9b62cc50700a5eaee/mc-1-big-Data.db'),BigTableReader(path='/cassandra/data/srib_storeserver_objectstore_db/groupinfoobjects-28134170627f11e9b62cc50700a5eaee/mc-2-big-Data.db')
INFO [StreamReceiveTask:1] 2019-05-09 07:58:40,714 SecondaryIndexManager.java:385 - Index build of groupchatid_idx_giinfo complete
INFO [StreamReceiveTask:1] 2019-05-09 07:58:41,494 SecondaryIndexManager.java:365 - Submitting index build of groupchatid_idx_giinfo for data in BigTableReader(path='/cassandra/data/srib_storeserver_objectstore_db/groupinfoobjects-28134170627f11e9b62cc50700a5eaee/mc-3-big-Data.db'),BigTableReader(path='/cassandra/data/srib_storeserver_objectstore_db/groupinfoobjects-28134170627f11e9b62cc50700a5eaee/mc-4-big-Data.db'),BigTableReader(path='/cassandra/data/srib_storeserver_objectstore_db/groupinfoobjects-28134170627f11e9b62cc50700a5eaee/mc-5-big-Data.db'),BigTableReader(path='/cassandra/data/srib_storeserver_objectstore_db/groupinfoobjects-28134170627f11e9b62cc50700a5eaee/mc-6-big-Data.db'),BigTableReader(path='/cassandra/data/srib_storeserver_objectstore_db/groupinfoobjects-28134170627f11e9b62cc50700a5eaee/mc-7-big-Data.db')
INFO [StreamReceiveTask:1] 2019-05-09 07:58:41,537 SecondaryIndexManager.java:385 - Index build of groupchatid_idx_giinfo complete
INFO [StreamReceiveTask:1] 2019-05-09 07:58:43,175 SecondaryIndexManager.java:365 - Submitting index build of pushcapabilityindx,presetsearchval for data in BigTableReader(path='/cassandra/data/abc_db/sub-28f738d0627f11e9b62cc50700a5eaee/mc-11-big-Data.db')
INFO [StreamReceiveTask:1] 2019-05-09 07:58:43,209 SecondaryIndexManager.java:385 - Index build of pushcapabilityindx,presetsearchval complete
INFO [StreamReceiveTask:1] 2019-05-09 07:58:43,972 StreamResultFuture.java:187 - [Stream #21f71b70-7230-11e9-9495-ab5d6d9b9488] Session with /172.50.20.10 is complete
INFO [StreamReceiveTask:1] 2019-05-09 07:58:45,643 StreamResultFuture.java:187 - [Stream #21f71b70-7230-11e9-9495-ab5d6d9b9488] Session with /172.50.21.10 is complete
WARN [StreamReceiveTask:1] 2019-05-09 07:58:45,664 StreamResultFuture.java:214 - [Stream #21f71b70-7230-11e9-9495-ab5d6d9b9488] Stream failed
WARN [StreamReceiveTask:1] 2019-05-09 07:58:45,665 StorageService.java:1497 - Error during bootstrap.
org.apache.cassandra.streaming.StreamException: Stream failed
at org.apache.cassandra.streaming.management.StreamEventJMXNotifier.onFailure(StreamEventJMXNotifier.java:88) ~[apache-cassandra-3.10.jar:3.10]
at com.google.common.util.concurrent.Futures$6.run(Futures.java:1310) [guava-18.0.jar:na]
at com.google.common.util.concurrent.MoreExecutors$DirectExecutor.execute(MoreExecutors.java:457) [guava-18.0.jar:na]
at com.google.common.util.concurrent.ExecutionList.executeListener(ExecutionList.java:156) [guava-18.0.jar:na]
at com.google.common.util.concurrent.ExecutionList.execute(ExecutionList.java:145) [guava-18.0.jar:na]
at com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:202) [guava-18.0.jar:na]
at org.apache.cassandra.streaming.StreamResultFuture.maybeComplete(StreamResultFuture.java:215) [apache-cassandra-3.10.jar:3.10]
at org.apache.cassandra.streaming.StreamResultFuture.handleSessionComplete(StreamResultFuture.java:191) [apache-cassandra-3.10.jar:3.10]
at org.apache.cassandra.streaming.StreamSession.closeSession(StreamSession.java:481) [apache-cassandra-3.10.jar:3.10]
at org.apache.cassandra.streaming.StreamSession.maybeCompleted(StreamSession.java:766) [apache-cassandra-3.10.jar:3.10]
at org.apache.cassandra.streaming.StreamSession.taskCompleted(StreamSession.java:727) [apache-cassandra-3.10.jar:3.10]
at org.apache.cassandra.streaming.StreamReceiveTask$OnCompletionRunnable.run(StreamReceiveTask.java:244) [apache-cassandra-3.10.jar:3.10]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_191]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) [na:1.8.0_191]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [na:1.8.0_191]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [na:1.8.0_191]
at org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:79) [apache-cassandra-3.10.jar:3.10]
at java.lang.Thread.run(Thread.java:748) ~[na:1.8.0_191]
ERROR [main] 2019-05-09 07:58:45,665 StorageService.java:1507 - Error while waiting on bootstrap to complete. Bootstrap will have to be restarted.
java.util.concurrent.ExecutionException: org.apache.cassandra.streaming.StreamException: Stream failed
at com.google.common.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:299) ~[guava-18.0.jar:na]
at com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:286) ~[guava-18.0.jar:na]
at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116) ~[guava-18.0.jar:na]
at org.apache.cassandra.service.StorageService.bootstrap(StorageService.java:1502) [apache-cassandra-3.10.jar:3.10]
at org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:962) [apache-cassandra-3.10.jar:3.10]
at org.apache.cassandra.service.StorageService.initServer(StorageService.java:681) [apache-cassandra-3.10.jar:3.10]
at org.apache.cassandra.service.StorageService.initServer(StorageService.java:612) [apache-cassandra-3.10.jar:3.10]
at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:394) [apache-cassandra-3.10.jar:3.10]
at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:601) [apache-cassandra-3.10.jar:3.10]
at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:735) [apache-cassandra-3.10.jar:3.10]
Caused by: org.apache.cassandra.streaming.StreamException: Stream failed
at org.apache.cassandra.streaming.management.StreamEventJMXNotifier.onFailure(StreamEventJMXNotifier.java:88) ~[apache-cassandra-3.10.jar:3.10]
at com.google.common.util.concurrent.Futures$6.run(Futures.java:1310) ~[guava-18.0.jar:na]
at com.google.common.util.concurrent.MoreExecutors$DirectExecutor.execute(MoreExecutors.java:457) ~[guava-18.0.jar:na]
at com.google.common.util.concurrent.ExecutionList.executeListener(ExecutionList.java:156) ~[guava-18.0.jar:na]
at com.google.common.util.concurrent.ExecutionList.execute(ExecutionList.java:145) ~[guava-18.0.jar:na]
at com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:202) ~[guava-18.0.jar:na]
at org.apache.cassandra.streaming.StreamResultFuture.maybeComplete(StreamResultFuture.java:215) ~[apache-cassandra-3.10.jar:3.10]
at org.apache.cassandra.streaming.StreamResultFuture.handleSessionComplete(StreamResultFuture.java:191) ~[apache-cassandra-3.10.jar:3.10]
at org.apache.cassandra.streaming.StreamSession.closeSession(StreamSession.java:481) ~[apache-cassandra-3.10.jar:3.10]
at org.apache.cassandra.streaming.StreamSession.maybeCompleted(StreamSession.java:766) ~[apache-cassandra-3.10.jar:3.10]
at org.apache.cassandra.streaming.StreamSession.taskCompleted(StreamSession.java:727) ~[apache-cassandra-3.10.jar:3.10]
at org.apache.cassandra.streaming.StreamReceiveTask$OnCompletionRunnable.run(StreamReceiveTask.java:244) ~[apache-cassandra-3.10.jar:3.10]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[na:1.8.0_191]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[na:1.8.0_191]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[na:1.8.0_191]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[na:1.8.0_191]
at org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:79) ~[apache-cassandra-3.10.jar:3.10]
at java.lang.Thread.run(Thread.java:748) ~[na:1.8.0_191]
WARN [main] 2019-05-09 07:58:45,676 StorageService.java:1013 - Some data streaming failed. Use nodetool to check bootstrap state and resume. For more, see `nodetool help bootstrap`. IN_PROGRESS
INFO [main] 2019-05-09 07:58:45,677 CassandraDaemon.java:694 - Waiting for gossip to settle before accepting client requests...
INFO [main] 2019-05-09 07:58:53,678 CassandraDaemon.java:725 - No gossip backlog; proceeding
INFO [main] 2019-05-09 07:58:53,737 NativeTransportService.java:70 - Netty using native Epoll event loop
INFO [main] 2019-05-09 07:58:53,781 Server.java:155 - Using Netty Version: [netty-buffer=netty-buffer-4.0.39.Final.38bdf86, netty-codec=netty-codec-4.0.39.Final.38bdf86, netty-codec-haproxy=netty-codec-haproxy-4.0.39.Final.38bdf86, netty-codec-http=netty-codec-http-4.0.39.Final.38bdf86, netty-codec-socks=netty-codec-socks-4.0.39.Final.38bdf86, netty-common=netty-common-4.0.39.Final.38bdf86, netty-handler=netty-handler-4.0.39.Final.38bdf86, netty-tcnative=netty-tcnative-1.1.33.Fork19.fe4816e, netty-transport=netty-transport-4.0.39.Final.38bdf86, netty-transport-native-epoll=netty-transport-native-epoll-4.0.39.Final.38bdf86, netty-transport-rxtx=netty-transport-rxtx-4.0.39.Final.38bdf86, netty-transport-sctp=netty-transport-sctp-4.0.39.Final.38bdf86, netty-transport-udt=netty-transport-udt-4.0.39.Final.38bdf86]
INFO [main] 2019-05-09 07:58:53,781 Server.java:156 - Starting listening for CQL clients on /172.50.20.11:7042 (unencrypted)...
INFO [main] 2019-05-09 07:58:53,809 CassandraDaemon.java:528 - Not starting RPC server as requested. Use JMX (StorageService->startRPCServer()) or nodetool (enablethrift) to start it
I solved this issue by myself.
No. Linux OS Package version RESULT
1 Ubuntu 16.04.3 LTS deb 3.10 Stream Failed
2 Ubuntu 16.04.3 LTS deb 3.11.4 Stream Failed
3 Ubuntu 16.04.3 LTS tgz 3.11.4 Successful
4 Ubuntu 18.04.2 LTS deb 3.10 Stream Failed
5 Ubuntu 18.04.2 LTS deb 3.11.4 Stream Failed
6 Debian GNU/Linux 9.9 (stretch) deb 3.10 Stream Failed
7 Debian GNU/Linux 9.9 (stretch) deb 3.11.4 Stream Failed
8 Amazon Linux 2 rpm 3.11.4 Successful
9 CentOS 7.6 rpm 3.11.4 Successful
Finally, I am going to install from binary tarball files.
I am not a developer, so I can not find the cause.
I hope someone finds the cause and fixes it in Ubuntu & Debian.
Regards.
Sungjae Yun.

Cassandra : Cannot replace address with a node that is already bootstrapped

One of the existing nodes in a 5 node Cassandra (3.9) cluster fails to come up.
I noticed the node to be down and tried to restart using the command
service cassandra restart
But the node fails to come and I see the below exception in system.log
ERROR [main] 2017-04-14 10:03:49,959 CassandraDaemon.java:747 -
Exception encountered during startup java.lang.RuntimeException:
Cannot replace address with a node that is already bootstrapped
at org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:752)
~[apache-cassandra-3.9.jar:3.9]
at org.apache.cassandra.service.StorageService.initServer(StorageService.java:648)
~[apache-cassandra-3.9.jar:3.9]
at org.apache.cassandra.service.StorageService.initServer(StorageService.java:548)
~[apache-cassandra-3.9.jar:3.9]
at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:385)
[apache-cassandra-3.9.jar:3.9]
at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:601)
[apache-cassandra-3.9.jar:3.9]
at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:730)
[apache-cassandra-3.9.jar:3.9] WARN [StorageServiceShutdownHook]
2017-04-14 10:03:49,963 Gossiper.java:1508 - No local state or state
is in silent shutdown, not announcing shutdown
WARN [StorageServiceShutdownHook] 2017-04-14 10:51:49,539
Gossiper.java:1508 - No local state or state is in silent shutdown,
not announcing shutdown
Thanks
Have a look at this guide, basically you have a dead node in the cluster, happens all the time ;)
https://blog.alteroot.org/articles/2014-03-12/replace-a-dead-node-in-cassandra.html
Plus some additional descriptions:
https://issues.apache.org/jira/browse/CASSANDRA-7356
Plus check that you also remove the address from:
/etc/cassandra/cassandra-env.sh

Spark: Association with remote system failed. Reason: Disassociated

I have a standalone spark job and every time the job finishes, the below warning occurs: I don't really understand the meaning of this and also how to solve this. Would be great if you could help. Thanks
WARN [SparkWorker-0 error logger] 2016-10-08 10:18:33,395 SparkWorker-0 ExternalLogger.java:92
- Association with remote system [akka.tcp://sparkExecutor#10.47.183.30:39422] has failed, address is now gated for [5000] ms. Reason is: [Disassociated].
INFO [sparkMaster-akka.actor.default-dispatcher-4] 2016-10-08 10:18:33,406 Logging.scala:59 - Removing executor app-20161008101807-0002/5 because it is EXITED
INFO [sparkMaster-akka.actor.default-dispatcher-4] 2016-10-08 10:18:33,407 Logging.scala:59 - Launching executor app-20161008101807-0002/6 on worker worker-20161008093556-10.47.183.121-41649
WARN [sparkMaster-akka.actor.default-dispatcher-4] 2016-10-08 10:18:33,762 Logging.scala:71 - Got status update for unknown executor app-20161008100608-0001/4
INFO [sparkMaster-akka.actor.default-dispatcher-4] 2016-10-08 10:18:33,819 Logging.scala:59 - akka.tcp://sparkDriver#XXX.196.201.23:36340 got disassociated, removing it.
INFO [SparkWorker-0 logger] 2016-10-08 10:18:33,835 SparkWorker-0 ExternalLogger.java:88 - Executor app-20161008100608-0001/0 finished with state KILLED exitStatus 143
WARN [sparkMaster-akka.actor.default-dispatcher-5] 2016-10-08 10:18:33,837 Logging.scala:71 - Got status update for unknown executor app-20161008100608-0001/0
This is just the executor saying it can not talk to anyone. I would check connection ports and the like on your firewall.

Cassandra: Exiting due to error while processing commit log during initialization

I was loading a large CSV into Cassandra using cassandra-loader.
The VM ran out of disk space during this process and crashed. I allocated more disk space to the VM and tried starting cassandra but it refused to start due to problems with SSTables and commit log.
I could not run nodetool repair as it is only works when the node is online.
I ran sstablescrub which took about 1 hour to finish. So I thought it might have fixed it.
But I still get this error in system.log
ERROR [SSTableBatchOpen:4] 2015-10-23 18:57:45,035 SSTableReader.java:506 - Corrupt sstable /var/lib/cassandra/data/keyspace1/location-777a33d0772911e597a98b820c5778a4/la-1709-big=[TOC.txt, CompressionInfo.db, Statistics.db, Digest.adler32, Data.db, Index.db, Filter.db]; skipping table
org.apache.cassandra.io.sstable.CorruptSSTableException: java.io.EOFException
at org.apache.cassandra.io.compress.CompressionMetadata.<init>(CompressionMetadata.java:125) ~[apache-cassandra-2.2.3.jar:2.2.3]
at org.apache.cassandra.io.compress.CompressionMetadata.create(CompressionMetadata.java:86) ~[apache-cassandra-2.2.3.jar:2.2.3]
at org.apache.cassandra.io.util.CompressedSegmentedFile$Builder.metadata(CompressedSegmentedFile.java:142) ~[apache-cassandra-2.2.3.jar:2.2.3]
at org.apache.cassandra.io.util.CompressedPoolingSegmentedFile$Builder.complete(CompressedPoolingSegmentedFile.java:101) ~[apache-cassandra-2.2.3.jar:2.2.3]
at org.apache.cassandra.io.util.SegmentedFile$Builder.complete(SegmentedFile.java:187) ~[apache-cassandra-2.2.3.jar:2.2.3]
at org.apache.cassandra.io.util.SegmentedFile$Builder.complete(SegmentedFile.java:179) ~[apache-cassandra-2.2.3.jar:2.2.3]
at org.apache.cassandra.io.sstable.format.SSTableReader.load(SSTableReader.java:703) ~[apache-cassandra-2.2.3.jar:2.2.3]
at org.apache.cassandra.io.sstable.format.SSTableReader.load(SSTableReader.java:664) ~[apache-cassandra-2.2.3.jar:2.2.3]
at org.apache.cassandra.io.sstable.format.SSTableReader.open(SSTableReader.java:458) ~[apache-cassandra-2.2.3.jar:2.2.3]
at org.apache.cassandra.io.sstable.format.SSTableReader.open(SSTableReader.java:363) ~[apache-cassandra-2.2.3.jar:2.2.3]
at org.apache.cassandra.io.sstable.format.SSTableReader$4.run(SSTableReader.java:501) ~[apache-cassandra-2.2.3.jar:2.2.3]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_60]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) [na:1.8.0_60]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_60]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_60]
at java.lang.Thread.run(Thread.java:745) [na:1.8.0_60]
Caused by: java.io.EOFException: null
at java.io.DataInputStream.readUnsignedShort(DataInputStream.java:340) ~[na:1.8.0_60]
at java.io.DataInputStream.readUTF(DataInputStream.java:589) ~[na:1.8.0_60]
at java.io.DataInputStream.readUTF(DataInputStream.java:564) ~[na:1.8.0_60]
at org.apache.cassandra.io.compress.CompressionMetadata.<init>(CompressionMetadata.java:96) ~[apache-cassandra-2.2.3.jar:2.2.3]
... 15 common frames omitted
INFO [main] 2015-10-23 18:57:45,193 ColumnFamilyStore.java:382 - Initializing system_auth.role_permissions
INFO [main] 2015-10-23 18:57:45,201 ColumnFamilyStore.java:382 - Initializing system_auth.resource_role_permissons_index
INFO [main] 2015-10-23 18:57:45,213 ColumnFamilyStore.java:382 - Initializing system_auth.roles
INFO [main] 2015-10-23 18:57:45,233 ColumnFamilyStore.java:382 - Initializing system_auth.role_members
INFO [main] 2015-10-23 18:57:45,240 ColumnFamilyStore.java:382 - Initializing system_traces.sessions
INFO [main] 2015-10-23 18:57:45,252 ColumnFamilyStore.java:382 - Initializing system_traces.events
INFO [main] 2015-10-23 18:57:45,265 ColumnFamilyStore.java:382 - Initializing simplex.songs
INFO [main] 2015-10-23 18:57:45,276 ColumnFamilyStore.java:382 - Initializing simplex.playlists
INFO [pool-2-thread-1] 2015-10-23 18:57:45,289 AutoSavingCache.java:187 - reading saved cache /var/lib/cassandra/saved_caches/KeyCache-ca.db
INFO [pool-2-thread-1] 2015-10-23 18:57:45,313 AutoSavingCache.java:163 - Completed loading (25 ms; 36 keys) KeyCache cache
INFO [main] 2015-10-23 18:57:45,351 CommitLog.java:168 - Replaying /var/lib/cassandra/commitlog/CommitLog-5-1445578022702.log, /var/lib/cassandra/commitlog/CommitLog-5-1445578022703.log, /var/lib/cassandra/commitlog/CommitLog-5-1445578022704.log, /var/lib/cassandra/commitlog/CommitLog-5-1445578022705.log, /var/lib/cassandra/commitlog/CommitLog-5-1445578022706.log, /var/lib/cassandra/commitlog/CommitLog-5-1445578022707.log, /var/lib/cassandra/commitlog/CommitLog-5-1445578022708.log, /var/lib/cassandra/commitlog/CommitLog-5-1445578022709.log, /var/lib/cassandra/commitlog/CommitLog-5-1445578022710.log, /var/lib/cassandra/commitlog/CommitLog-5-1445578022712.log, /var/lib/cassandra/commitlog/CommitLog-5-1445578022713.log, /var/lib/cassandra/commitlog/CommitLog-5-1445578022714.log, /var/lib/cassandra/commitlog/CommitLog-5-1445578022715.log, /var/lib/cassandra/commitlog/CommitLog-5-1445578022716.log, /var/lib/cassandra/commitlog/CommitLog-5-1445578022717.log, /var/lib/cassandra/commitlog/CommitLog-5-1445578022719.log, /var/lib/cassandra/commitlog/CommitLog-5-1445578022720.log, /var/lib/cassandra/commitlog/CommitLog-5-1445578022721.log, /var/lib/cassandra/commitlog/CommitLog-5-1445578022722.log, /var/lib/cassandra/commitlog/CommitLog-5-1445578022723.log, /var/lib/cassandra/commitlog/CommitLog-5-1445578022724.log, /var/lib/cassandra/commitlog/CommitLog-5-1445578022725.log, /var/lib/cassandra/commitlog/CommitLog-5-1445578022727.log, /var/lib/cassandra/commitlog/CommitLog-5-1445578022728.log, /var/lib/cassandra/commitlog/CommitLog-5-1445578022730.log, /var/lib/cassandra/commitlog/CommitLog-5-1445578022731.log, /var/lib/cassandra/commitlog/CommitLog-5-1445578022732.log, /var/lib/cassandra/commitlog/CommitLog-5-1445578022733.log, /var/lib/cassandra/commitlog/CommitLog-5-1445578022734.log, /var/lib/cassandra/commitlog/CommitLog-5-1445578022736.log, /var/lib/cassandra/commitlog/CommitLog-5-1445578022738.log, /var/lib/cassandra/commitlog/CommitLog-5-1445578022740.log, /var/lib/cassandra/commitlog/CommitLog-5-1445578022741.log, /var/lib/cassandra/commitlog/CommitLog-5-1445578022743.log, /var/lib/cassandra/commitlog/CommitLog-5-1445578022744.log, /var/lib/cassandra/commitlog/CommitLog-5-1445578022745.log, /var/lib/cassandra/commitlog/CommitLog-5-1445578022746.log, /var/lib/cassandra/commitlog/CommitLog-5-1445578022748.log, /var/lib/cassandra/commitlog/CommitLog-5-1445578022749.log, /var/lib/cassandra/commitlog/CommitLog-5-1445578022750.log, /var/lib/cassandra/commitlog/CommitLog-5-1445578022751.log, /var/lib/cassandra/commitlog/CommitLog-5-1445578022752.log, /var/lib/cassandra/commitlog/CommitLog-5-1445578022753.log, /var/lib/cassandra/commitlog/CommitLog-5-1445578022755.log, /var/lib/cassandra/commitlog/CommitLog-5-1445578022756.log, /var/lib/cassandra/commitlog/CommitLog-5-1445578022758.log, /var/lib/cassandra/commitlog/CommitLog-5-1445578022759.log, /var/lib/cassandra/commitlog/CommitLog-5-1445578022760.log, /var/lib/cassandra/commitlog/CommitLog-5-1445578022761.log, /var/lib/cassandra/commitlog/CommitLog-5-1445578022763.log, /var/lib/cassandra/commitlog/CommitLog-5-1445578022764.log, /var/lib/cassandra/commitlog/CommitLog-5-1445578022765.log, /var/lib/cassandra/commitlog/CommitLog-5-1445578022766.log, /var/lib/cassandra/commitlog/CommitLog-5-1445578022767.log, /var/lib/cassandra/commitlog/CommitLog-5-1445578022768.log, /var/lib/cassandra/commitlog/CommitLog-5-1445578022769.log, /var/lib/cassandra/commitlog/CommitLog-5-1445578022770.log, /var/lib/cassandra/commitlog/CommitLog-5-1445578022771.log, /var/lib/cassandra/commitlog/CommitLog-5-1445578022772.log, /var/lib/cassandra/commitlog/CommitLog-5-1445578022773.log, /var/lib/cassandra/commitlog/CommitLog-5-1445578022774.log, /var/lib/cassandra/commitlog/CommitLog-5-1445578022775.log, /var/lib/cassandra/commitlog/CommitLog-5-1445578022776.log, /var/lib/cassandra/commitlog/CommitLog-5-1445588991268.log, /var/lib/cassandra/commitlog/CommitLog-5-1445589094722.log, /var/lib/cassandra/commitlog/CommitLog-5-1445589149527.log, /var/lib/cassandra/commitlog/CommitLog-5-1445595828633.log, /var/lib/cassandra/commitlog/CommitLog-5-1445595898055.log, /var/lib/cassandra/commitlog/CommitLog-5-1445596033717.log, /var/lib/cassandra/commitlog/CommitLog-5-1445596400441.log, /var/lib/cassandra/commitlog/CommitLog-5-1445596601854.log, /var/lib/cassandra/commitlog/CommitLog-5-1445598032544.log, /var/lib/cassandra/commitlog/CommitLog-5-1445598758663.log, /var/lib/cassandra/commitlog/CommitLog-5-1445601112953.log, /var/lib/cassandra/commitlog/CommitLog-5-1445601937334.log, /var/lib/cassandra/commitlog/CommitLog-5-1445601985416.log, /var/lib/cassandra/commitlog/CommitLog-5-1445604504389.log, /var/lib/cassandra/commitlog/CommitLog-5-1445606516196.log
ERROR [main] 2015-10-23 18:59:05,091 JVMStabilityInspector.java:78 - Exiting due to error while processing commit log during initialization.
org.apache.cassandra.db.commitlog.CommitLogReplayer$CommitLogReplayException: Mutation checksum failure at 4110758 in CommitLog-5-1445578022776.log
at org.apache.cassandra.db.commitlog.CommitLogReplayer.handleReplayError(CommitLogReplayer.java:622) [apache-cassandra-2.2.3.jar:2.2.3]
at org.apache.cassandra.db.commitlog.CommitLogReplayer.replaySyncSection(CommitLogReplayer.java:492) [apache-cassandra-2.2.3.jar:2.2.3]
at org.apache.cassandra.db.commitlog.CommitLogReplayer.recover(CommitLogReplayer.java:388) [apache-cassandra-2.2.3.jar:2.2.3]
at org.apache.cassandra.db.commitlog.CommitLogReplayer.recover(CommitLogReplayer.java:147) [apache-cassandra-2.2.3.jar:2.2.3]
at org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:189) [apache-cassandra-2.2.3.jar:2.2.3]
at org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:169) [apache-cassandra-2.2.3.jar:2.2.3]
at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:273) [apache-cassandra-2.2.3.jar:2.2.3]
at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:513) [apache-cassandra-2.2.3.jar:2.2.3]
at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:622) [apache-cassandra-2.2.3.jar:2.2.3]
Exiting due to error while processing commit log during initialization.
How do I fix this? This is test data, I am okay with losing it. How would one handle this in production to avoid data loss?
Tried setting disk_failure_policy: ignore so that I could run nodetool repair once the server is up. But the server does not start even with this setting.
I am operating a single node and replication factor is 1. Would having more nodes and a >1 replication factor enabled me to fix an issue like this without data loss?
I am using Cassandra 2.2.3
Since you don't care about the data, removing files from \data\commitlogs should be easiest solution.
Having some replication would surely help you to fix this without data loss but it would come with a price.
Despite all your effort you cannot manage to recover your corrupted sstable. So you decide to remove it from your file system to start Cassandra again. If you do not have replication your data is lost. But if you have replication on the cluster, you can possibly fetch the data from other nodes. That is what nodetool repair do !
So nodetool repair does not repair corrupted sstable. Basicallynodetool repair compare tables from node to node to find missing or inconsistent data and then repair it. You can find more information on how it works here.
However nodetool repair is very expensive, it is long and uses a lot of cpu, disk and network. There is this good post about repair benefits and drawbacks.
This is how I fixed the problem with commit logs.
You should only do this if you don't care about preserving the state of your commit logs.
Try to restart cassandra using
sudo systemctl restart cassandra
Then I check
systemctl status cassandra
and see that the status is 'exited' so there is a problem. Check the logs for cassandra using
sudo less /var/log/cassandra/system.log
and see org.apache.cassandra.db.commitlog.CommitLogReplayer$CommitLogReplayException: Could not read commit log descriptor in file /var/lib/cassandra/commitlog/CommitLog-6-1498210233635.log
Because I don't care about preserving the state of Cassandra I delete all of the commit logs and it now boots up fine
sudo rm /var/lib/cassandra/commitlog/CommitLog*
sudo systemctl restart cassandra
systemctl status cassandra (should confirm that it it now running)
Simply goes in log directory in cassandra and delete the log files.
It work fine....

Fail to open thrift connection to Cassandra node via spark-cassandra-connector

I've been struggling the all day and didn't find a solution.
I'm trying to connect a remote Cassandra node via a Spark Streaming application, using the spark-cassandra connector and the application exists with an exception. Any help would be much appreciated.
2015-02-17 19:13:58 DEBUG Connection:114 - Connection[/<MY_PUBLIC_IP>:9042-2, inFlight=0, closed=false] Transport initialized and ready
2015-02-17 19:13:58 DEBUG ControlConnection:492 - [Control connection] Refreshing node list and token map
2015-02-17 19:13:59 DEBUG ControlConnection:262 - [Control connection] Refreshing schema
2015-02-17 19:14:00 DEBUG ControlConnection:492 - [Control connection] Refreshing node list and token map
2015-02-17 19:14:00 DEBUG ControlConnection:172 - [Control connection] Successfully connected to /<MY_PUBLIC_IP>:9042
2015-02-17 19:14:00 INFO Cluster:1267 - New Cassandra host /<MY_PUBLIC_IP>:9042 added
2015-02-17 19:14:00 INFO CassandraConnector:51 - Connected to Cassandra cluster: Test Cluster
2015-02-17 19:14:00 INFO LocalNodeFirstLoadBalancingPolicy:59 - Adding host <MY_PUBLIC_IP> (datacenter1)
2015-02-17 19:14:01 DEBUG Connection:114 - Connection[/<MY_PUBLIC_IP>:9042-3, inFlight=0, closed=false] Transport initialized and ready
2015-02-17 19:14:01 DEBUG Session:304 - Added connection pool for /<MY_PUBLIC_IP>:9042
2015-02-17 19:14:01 INFO LocalNodeFirstLoadBalancingPolicy:59 - Adding host <MY_PUBLIC_IP> (datacenter1)
2015-02-17 19:14:01 DEBUG Schema:55 - Retrieving database schema from cluster Test Cluster...
2015-02-17 19:14:01 DEBUG Schema:55 - 1 keyspaces fetched from cluster Test Cluster: {vehicles}
2015-02-17 19:14:02 DEBUG CassandraConnector:55 - Attempting to open thrift connection to Cassandra at <MY_PUBLIC_IP>:9160
2015-02-17 19:14:02 DEBUG Connection:428 - Connection[/<MY_PUBLIC_IP>:9042-3, inFlight=0, closed=true] closing connection
2015-02-17 19:14:02 DEBUG Cluster:1340 - Shutting down
2015-02-17 19:14:02 DEBUG Connection:428 - Connection[/<MY_PUBLIC_IP>:9042-2, inFlight=0, closed=true] closing connection
2015-02-17 19:14:02 INFO CassandraConnector:51 - Disconnected from Cassandra cluster: Test Cluster
2015-02-17 19:14:03 DEBUG CassandraConnector:55 - Attempting to open thrift connection to Cassandra at <AWS_LOCAL_IP>:9160
2015-02-17 19:14:10 DEBUG HeartbeatReceiver:50 - [actor] received message Heartbeat(localhost,[Lscala.Tuple2;#77008370,BlockManagerId(<driver>, Alon-PC, 62343, 0)) from Actor[akka://sparkDriver/temp/$b]
2015-02-17 19:14:10 DEBUG BlockManagerMasterActor:50 - [actor] received message BlockManagerHeartbeat(BlockManagerId(<driver>, Alon-PC, 62343, 0)) from Actor[akka://sparkDriver/temp/$c]
2015-02-17 19:14:10 DEBUG BlockManagerMasterActor:56 - [actor] handled message (0.491517 ms) BlockManagerHeartbeat(BlockManagerId(<driver>, Alon-PC, 62343, 0)) from Actor[akka://sparkDriver/temp/$c]
2015-02-17 19:14:10 DEBUG HeartbeatReceiver:56 - [actor] handled message (69.725123 ms) Heartbeat(localhost,[Lscala.Tuple2;#77008370,BlockManagerId(<driver>, Alon-PC, 62343, 0)) from Actor[akka://sparkDriver/temp/$b]
2015-02-17 19:14:20 DEBUG HeartbeatReceiver:50 - [actor] received message Heartbeat(localhost,[Lscala.Tuple2;#70a7cd6e,BlockManagerId(<driver>, Alon-PC, 62343, 0)) from Actor[akka://sparkDriver/temp/$d]
2015-02-17 19:14:20 DEBUG BlockManagerMasterActor:50 - [actor] received message BlockManagerHeartbeat(BlockManagerId(<driver>, Alon-PC, 62343, 0)) from Actor[akka://sparkDriver/temp/$e]
2015-02-17 19:14:20 DEBUG BlockManagerMasterActor:56 - [actor] handled message (0.348586 ms) BlockManagerHeartbeat(BlockManagerId(<driver>, Alon-PC, 62343, 0)) from Actor[akka://sparkDriver/temp/$e]
2015-02-17 19:14:20 DEBUG HeartbeatReceiver:56 - [actor] handled message (2.020429 ms) Heartbeat(localhost,[Lscala.Tuple2;#70a7cd6e,BlockManagerId(<driver>, Alon-PC, 62343, 0)) from Actor[akka://sparkDriver/temp/$d]
2015-02-17 19:14:24 ERROR ServerSideTokenRangeSplitter:88 - Failure while fetching splits from Cassandra
java.io.IOException: Failed to open thrift connection to Cassandra at <AWS_LOCAL_IP>:9160
at com.datastax.spark.connector.cql.CassandraConnector.createThriftClient(CassandraConnector.scala:132)
at com.datastax.spark.connector.cql.CassandraConnector.withCassandraClientDo(CassandraConnector.scala:141)
at com.datastax.spark.connector.rdd.partitioner.ServerSideTokenRangeSplitter.com$datastax$spark$connector$rdd$partitioner$ServerSideTokenRangeSplitter$$fetchSplits(ServerSideTokenRangeSplitter.scala:33)
at com.datastax.spark.connector.rdd.partitioner.ServerSideTokenRangeSplitter$$anonfun$1$$anonfun$apply$2.apply(ServerSideTokenRangeSplitter.scala:45)
at com.datastax.spark.connector.rdd.partitioner.ServerSideTokenRangeSplitter$$anonfun$1$$anonfun$apply$2.apply(ServerSideTokenRangeSplitter.scala:45)
at scala.util.Try$.apply(Try.scala:161)
at com.datastax.spark.connector.rdd.partitioner.ServerSideTokenRangeSplitter$$anonfun$1.apply(ServerSideTokenRangeSplitter.scala:45)
at com.datastax.spark.connector.rdd.partitioner.ServerSideTokenRangeSplitter$$anonfun$1.apply(ServerSideTokenRangeSplitter.scala:44)
at scala.collection.immutable.Stream.map(Stream.scala:376)
at com.datastax.spark.connector.rdd.partitioner.ServerSideTokenRangeSplitter.split(ServerSideTokenRangeSplitter.scala:44)
at com.datastax.spark.connector.rdd.partitioner.CassandraRDDPartitioner$$anonfun$com$datastax$spark$connector$rdd$partitioner$CassandraRDDPartitioner$$splitsOf$1.apply(CassandraRDDPartitioner.scala:77)
at com.datastax.spark.connector.rdd.partitioner.CassandraRDDPartitioner$$anonfun$com$datastax$spark$connector$rdd$partitioner$CassandraRDDPartitioner$$splitsOf$1.apply(CassandraRDDPartitioner.scala:76)
at scala.collection.parallel.mutable.ParArray$ParArrayIterator.flatmap2combiner(ParArray.scala:418)
at scala.collection.parallel.ParIterableLike$FlatMap.leaf(ParIterableLike.scala:1075)
at scala.collection.parallel.Task$$anonfun$tryLeaf$1.apply$mcV$sp(Tasks.scala:54)
at scala.collection.parallel.Task$$anonfun$tryLeaf$1.apply(Tasks.scala:53)
at scala.collection.parallel.Task$$anonfun$tryLeaf$1.apply(Tasks.scala:53)
at scala.collection.parallel.Task$class.tryLeaf(Tasks.scala:56)
at scala.collection.parallel.ParIterableLike$FlatMap.tryLeaf(ParIterableLike.scala:1071)
at scala.collection.parallel.AdaptiveWorkStealingTasks$WrappedTask$class.compute(Tasks.scala:165)
at scala.collection.parallel.AdaptiveWorkStealingForkJoinTasks$WrappedTask.compute(Tasks.scala:514)
at scala.concurrent.forkjoin.RecursiveAction.exec(RecursiveAction.java:160)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at scala.concurrent.forkjoin.ForkJoinTask.doJoin(ForkJoinTask.java:341)
at scala.concurrent.forkjoin.ForkJoinTask.join(ForkJoinTask.java:673)
at scala.collection.parallel.ForkJoinTasks$WrappedTask$class.sync(Tasks.scala:444)
at scala.collection.parallel.AdaptiveWorkStealingForkJoinTasks$WrappedTask.sync(Tasks.scala:514)
at scala.collection.parallel.ForkJoinTasks$class.executeAndWaitResult(Tasks.scala:492)
at scala.collection.parallel.ForkJoinTaskSupport.executeAndWaitResult(TaskSupport.scala:64)
at scala.collection.parallel.ParIterableLike$ResultMapping.leaf(ParIterableLike.scala:961)
at scala.collection.parallel.Task$$anonfun$tryLeaf$1.apply$mcV$sp(Tasks.scala:54)
at scala.collection.parallel.Task$$anonfun$tryLeaf$1.apply(Tasks.scala:53)
at scala.collection.parallel.Task$$anonfun$tryLeaf$1.apply(Tasks.scala:53)
at scala.collection.parallel.Task$class.tryLeaf(Tasks.scala:56)
at scala.collection.parallel.ParIterableLike$ResultMapping.tryLeaf(ParIterableLike.scala:956)
at scala.collection.parallel.AdaptiveWorkStealingTasks$WrappedTask$class.compute(Tasks.scala:165)
at scala.collection.parallel.AdaptiveWorkStealingForkJoinTasks$WrappedTask.compute(Tasks.scala:514)
at scala.concurrent.forkjoin.RecursiveAction.exec(RecursiveAction.java:160)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Caused by: org.apache.thrift.transport.TTransportException: java.net.ConnectException: Connection timed out: connect
at org.apache.thrift.transport.TSocket.open(TSocket.java:185)
at org.apache.thrift.transport.TFramedTransport.open(TFramedTransport.java:81)
at org.apache.cassandra.thrift.TFramedTransportFactory.openTransport(TFramedTransportFactory.java:41)
at com.datastax.spark.connector.cql.DefaultConnectionFactory$.createThriftClient(CassandraConnectionFactory.scala:47)
at com.datastax.spark.connector.cql.CassandraConnector.createThriftClient(CassandraConnector.scala:127)
... 41 more
Caused by: java.net.ConnectException: Connection timed out: connect
at java.net.DualStackPlainSocketImpl.connect0(Native Method)
at java.net.DualStackPlainSocketImpl.socketConnect(DualStackPlainSocketImpl.java:79)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:172)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:579)
at org.apache.thrift.transport.TSocket.open(TSocket.java:180)
... 45 more
Exception in thread "main" java.io.IOException: Failed to fetch splits of TokenRange(0,0,Set(CassandraNode(/<AWS_LOCAL_IP>,/<MY_PUBLIC_IP>)),None) from all endpoints: CassandraNode(/<AWS_LOCAL_IP>,/<MY_PUBLIC_IP>)
at com.datastax.spark.connector.rdd.partitioner.ServerSideTokenRangeSplitter$$anonfun$split$2.apply(ServerSideTokenRangeSplitter.scala:55)
at com.datastax.spark.connector.rdd.partitioner.ServerSideTokenRangeSplitter$$anonfun$split$2.apply(ServerSideTokenRangeSplitter.scala:49)
at scala.Option.getOrElse(Option.scala:120)
at com.datastax.spark.connector.rdd.partitioner.ServerSideTokenRangeSplitter.split(ServerSideTokenRangeSplitter.scala:49)
at com.datastax.spark.connector.rdd.partitioner.CassandraRDDPartitioner$$anonfun$com$datastax$spark$connector$rdd$partitioner$CassandraRDDPartitioner$$splitsOf$1.apply(CassandraRDDPartitioner.scala:77)
at com.datastax.spark.connector.rdd.partitioner.CassandraRDDPartitioner$$anonfun$com$datastax$spark$connector$rdd$partitioner$CassandraRDDPartitioner$$splitsOf$1.apply(CassandraRDDPartitioner.scala:76)
at scala.collection.parallel.mutable.ParArray$ParArrayIterator.flatmap2combiner(ParArray.scala:418)
at scala.collection.parallel.ParIterableLike$FlatMap.leaf(ParIterableLike.scala:1075)
at scala.collection.parallel.Task$$anonfun$tryLeaf$1.apply$mcV$sp(Tasks.scala:54)
at scala.collection.parallel.Task$$anonfun$tryLeaf$1.apply(Tasks.scala:53)
at scala.collection.parallel.Task$$anonfun$tryLeaf$1.apply(Tasks.scala:53)
at scala.collection.parallel.Task$class.tryLeaf(Tasks.scala:56)
at scala.collection.parallel.ParIterableLike$FlatMap.tryLeaf(ParIterableLike.scala:1071)
at scala.collection.parallel.AdaptiveWorkStealingTasks$WrappedTask$class.compute(Tasks.scala:165)
at scala.collection.parallel.AdaptiveWorkStealingForkJoinTasks$WrappedTask.compute(Tasks.scala:514)
at scala.concurrent.forkjoin.RecursiveAction.exec(RecursiveAction.java:160)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at scala.concurrent.forkjoin.ForkJoinTask.doJoin(ForkJoinTask.java:341)
at scala.concurrent.forkjoin.ForkJoinTask.join(ForkJoinTask.java:673)
at scala.collection.parallel.ForkJoinTasks$WrappedTask$class.sync(Tasks.scala:444)
at scala.collection.parallel.AdaptiveWorkStealingForkJoinTasks$WrappedTask.sync(Tasks.scala:514)
at scala.collection.parallel.ForkJoinTasks$class.executeAndWaitResult(Tasks.scala:492)
at scala.collection.parallel.ForkJoinTaskSupport.executeAndWaitResult(TaskSupport.scala:64)
at scala.collection.parallel.ParIterableLike$ResultMapping.leaf(ParIterableLike.scala:961)
at scala.collection.parallel.Task$$anonfun$tryLeaf$1.apply$mcV$sp(Tasks.scala:54)
at scala.collection.parallel.Task$$anonfun$tryLeaf$1.apply(Tasks.scala:53)
at scala.collection.parallel.Task$$anonfun$tryLeaf$1.apply(Tasks.scala:53)
at scala.collection.parallel.Task$class.tryLeaf(Tasks.scala:56)
at scala.collection.parallel.ParIterableLike$ResultMapping.tryLeaf(ParIterableLike.scala:956)
at scala.collection.parallel.AdaptiveWorkStealingTasks$WrappedTask$class.compute(Tasks.scala:165)
at scala.collection.parallel.AdaptiveWorkStealingForkJoinTasks$WrappedTask.compute(Tasks.scala:514)
at scala.concurrent.forkjoin.RecursiveAction.exec(RecursiveAction.java:160)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
2015-02-17 19:14:24 DEBUG DiskBlockManager:63 - Shutdown hook called
At the beginning it looks fine (connection succeeds, keyspace is fetched...) but then, when it attempts to open thrift connection, it fails, disconnects and shut down.
I've opened ports 9160, 9042 and 7000.
And in cassandra.yaml I set
listen_address: <AWS_LOCAL_IP>
broadcast_address: <MY_PUBLIC_IP>
What am I missing?
Ok, I was about to post this question but then I finally worked it out:
In cassandra.yaml, I had to set
rpc_adress = 0.0.0.0
Other stackoverflow questions helped me, but I'm posting this because the stack trace may help others to find it.

Resources