Cassandra 3.10 - LEAK DETECTED & Stream failed - cassandra
I want to add or remove nodes to the cluster.
When I tring to add/remove node, I get a LEAK DETECTED and a STREAM FAILED ERROR messange.
If I drop the index - pushcapabilityindx,presetsearchval - before adding / removing nodes, the node add / remove succeeds.
If there is no data update, the data of abc_db.sub is automatically deleted after 24 hours. (TTL 86400 sec)
Also, if I do not have any work including index deletion, add / remove nodes succeeds normally after 14 days.
Where should I start troubleshooting this error?
Ubuntu 16.04.3 LTS
[cqlsh 5.0.1 | Cassandra 3.10 | CQL spec 3.4.4 | Native protocol v4]
CREATE TABLE abc_db.sub (
phoneno text,
deviceid text,
subid text,
callbackdata text,
corelator text,
duration int,
local_index bigint,
phase int,
presetsearchvalue text,
pushcapability list<text>,
pushtoken text,
pushtype text,
searchcriteria frozen<typesearchcriteria>,
PRIMARY KEY (phoneno, deviceid, subid)
) WITH CLUSTERING ORDER BY (deviceid ASC, subid ASC)
AND bloom_filter_fp_chance = 0.01
AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
AND comment = ''
AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
AND crc_check_chance = 1.0
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 0
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99PERCENTILE';
CREATE INDEX pushcapabilityindx ON abc_db.sub (values(pushcapability));
CREATE INDEX presetsearchval ON abc_db.sub (presetsearchvalue);
INFO [main] 2019-05-09 07:57:15,741 StorageService.java:1435 - JOINING: waiting for schema information to complete
INFO [main] 2019-05-09 07:57:16,497 StorageService.java:1435 - JOINING: schema complete, ready to bootstrap
INFO [main] 2019-05-09 07:57:16,497 StorageService.java:1435 - JOINING: waiting for pending range calculation
INFO [main] 2019-05-09 07:57:16,497 StorageService.java:1435 - JOINING: calculation complete, ready to bootstrap
INFO [main] 2019-05-09 07:57:16,498 StorageService.java:1435 - JOINING: getting bootstrap token
INFO [main] 2019-05-09 07:57:16,531 StorageService.java:1435 - JOINING: sleeping 30000 ms for pending range setup
INFO [main] 2019-05-09 07:57:46,532 StorageService.java:1435 - JOINING: Starting to bootstrap...
INFO [main] 2019-05-09 07:57:47,775 StreamResultFuture.java:90 - [Stream #21f71b70-7230-11e9-9495-ab5d6d9b9488] Executing streaming plan for Bootstrap
INFO [StreamConnectionEstablisher:1] 2019-05-09 07:57:47,783 StreamSession.java:266 - [Stream #21f71b70-7230-11e9-9495-ab5d6d9b9488] Starting streaming to /172.50.20.10
INFO [StreamConnectionEstablisher:1] 2019-05-09 07:57:47,786 StreamCoordinator.java:264 - [Stream #21f71b70-7230-11e9-9495-ab5d6d9b9488, ID#0] Beginning stream session with /172.50.20.10
INFO [STREAM-IN-/172.50.20.10:5000] 2019-05-09 07:57:48,887 StreamResultFuture.java:173 - [Stream #21f71b70-7230-11e9-9495-ab5d6d9b9488 ID#0] Prepare completed. Receiving 261 files(1012.328MiB), sending 0 files(0.000KiB)
INFO [StreamConnectionEstablisher:2] 2019-05-09 07:57:48,891 StreamSession.java:266 - [Stream #21f71b70-7230-11e9-9495-ab5d6d9b9488] Starting streaming to /172.50.22.10
INFO [StreamConnectionEstablisher:2] 2019-05-09 07:57:48,893 StreamCoordinator.java:264 - [Stream #21f71b70-7230-11e9-9495-ab5d6d9b9488, ID#0] Beginning stream session with /172.50.22.10
INFO [STREAM-IN-/172.50.22.10:5000] 2019-05-09 07:57:50,020 StreamResultFuture.java:173 - [Stream #21f71b70-7230-11e9-9495-ab5d6d9b9488 ID#0] Prepare completed. Receiving 254 files(1.286GiB), sending 0 files(0.000KiB)
INFO [StreamConnectionEstablisher:3] 2019-05-09 07:57:50,022 StreamSession.java:266 - [Stream #21f71b70-7230-11e9-9495-ab5d6d9b9488] Starting streaming to /172.50.21.10
INFO [StreamConnectionEstablisher:3] 2019-05-09 07:57:50,025 StreamCoordinator.java:264 - [Stream #21f71b70-7230-11e9-9495-ab5d6d9b9488, ID#0] Beginning stream session with /172.50.21.10
INFO [STREAM-IN-/172.50.21.10:5000] 2019-05-09 07:57:50,998 StreamResultFuture.java:173 - [Stream #21f71b70-7230-11e9-9495-ab5d6d9b9488 ID#0] Prepare completed. Receiving 114 files(1.085GiB), sending 0 files(0.000KiB)
INFO [StreamReceiveTask:1] 2019-05-09 07:58:02,509 SecondaryIndexManager.java:365 - Submitting index build of pushcapabilityindx,presetsearchval for data in BigTableReader(path='/cassandra/data/abc_db/sub-28f738d0627f11e9b62cc50700a5eaee/mc-1-big-Data.db'),BigTableReader(path='/cassandra/data/abc_db/sub-28f738d0627f11e9b62cc50700a5eaee/mc-2-big-Data.db'),BigTableReader(path='/cassandra/data/abc_db/sub-28f738d0627f11e9b62cc50700a5eaee/mc-3-big-Data.db')
INFO [StreamReceiveTask:1] 2019-05-09 07:58:02,519 SecondaryIndexManager.java:385 - Index build of pushcapabilityindx,presetsearchval complete
INFO [StreamReceiveTask:1] 2019-05-09 07:58:11,213 SecondaryIndexManager.java:365 - Submitting index build of pushcapabilityindx,presetsearchval for data in BigTableReader(path='/cassandra/data/abc_db/sub-28f738d0627f11e9b62cc50700a5eaee/mc-4-big-Data.db'),BigTableReader(path='/cassandra/data/abc_db/sub-28f738d0627f11e9b62cc50700a5eaee/mc-5-big-Data.db'),BigTableReader(path='/cassandra/data/abc_db/sub-28f738d0627f11e9b62cc50700a5eaee/mc-6-big-Data.db'),BigTableReader(path='/cassandra/data/abc_db/sub-28f738d0627f11e9b62cc50700a5eaee/mc-7-big-Data.db'),BigTableReader(path='/cassandra/data/abc_db/sub-28f738d0627f11e9b62cc50700a5eaee/mc-8-big-Data.db'),BigTableReader(path='/cassandra/data/abc_db/sub-28f738d0627f11e9b62cc50700a5eaee/mc-9-big-Data.db')
ERROR [StreamReceiveTask:1] 2019-05-09 07:58:11,295 StreamSession.java:593 - [Stream #21f71b70-7230-11e9-9495-ab5d6d9b9488] Streaming error occurred on session with peer 172.50.22.10
java.lang.RuntimeException: java.util.concurrent.ExecutionException: java.util.NoSuchElementException
at org.apache.cassandra.utils.Throwables.maybeFail(Throwables.java:51) ~[apache-cassandra-3.10.jar:3.10]
at org.apache.cassandra.utils.FBUtilities.waitOnFutures(FBUtilities.java:393) ~[apache-cassandra-3.10.jar:3.10]
at org.apache.cassandra.index.SecondaryIndexManager.buildIndexesBlocking(SecondaryIndexManager.java:382) ~[apache-cassandra-3.10.jar:3.10]
at org.apache.cassandra.index.SecondaryIndexManager.buildAllIndexesBlocking(SecondaryIndexManager.java:269) ~[apache-cassandra-3.10.jar:3.10]
at org.apache.cassandra.streaming.StreamReceiveTask$OnCompletionRunnable.run(StreamReceiveTask.java:215) ~[apache-cassandra-3.10.jar:3.10]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_191]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) [na:1.8.0_191]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [na:1.8.0_191]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [na:1.8.0_191]
at org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:79) [apache-cassandra-3.10.jar:3.10]
at java.lang.Thread.run(Thread.java:748) ~[na:1.8.0_191]
Caused by: java.util.concurrent.ExecutionException: java.util.NoSuchElementException
at java.util.concurrent.FutureTask.report(FutureTask.java:122) [na:1.8.0_191]
at java.util.concurrent.FutureTask.get(FutureTask.java:192) [na:1.8.0_191]
at org.apache.cassandra.utils.FBUtilities.waitOnFutures(FBUtilities.java:386) ~[apache-cassandra-3.10.jar:3.10]
... 9 common frames omitted
Caused by: java.util.NoSuchElementException: null
at org.apache.cassandra.utils.AbstractIterator.next(AbstractIterator.java:64) ~[apache-cassandra-3.10.jar:3.10]
at org.apache.cassandra.index.SecondaryIndexManager.lambda$indexPartition$20(SecondaryIndexManager.java:618) ~[apache-cassandra-3.10.jar:3.10]
at java.lang.Iterable.forEach(Iterable.java:75) ~[na:1.8.0_191]
at org.apache.cassandra.index.SecondaryIndexManager.indexPartition(SecondaryIndexManager.java:618) ~[apache-cassandra-3.10.jar:3.10]
at org.apache.cassandra.index.internal.CollatedViewIndexBuilder.build(CollatedViewIndexBuilder.java:71) ~[apache-cassandra-3.10.jar:3.10]
at org.apache.cassandra.db.compaction.CompactionManager$14.run(CompactionManager.java:1587) ~[apache-cassandra-3.10.jar:3.10]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_191]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) [na:1.8.0_191]
... 6 common frames omitted
ERROR [STREAM-IN-/172.50.22.10:5000] 2019-05-09 07:58:11,305 StreamSession.java:593 - [Stream #21f71b70-7230-11e9-9495-ab5d6d9b9488] Streaming error occurred on session with peer 172.50.22.10
java.lang.RuntimeException: Outgoing stream handler has been closed
at org.apache.cassandra.streaming.ConnectionHandler.sendMessage(ConnectionHandler.java:143) ~[apache-cassandra-3.10.jar:3.10]
at org.apache.cassandra.streaming.StreamSession.receive(StreamSession.java:655) ~[apache-cassandra-3.10.jar:3.10]
at org.apache.cassandra.streaming.StreamSession.messageReceived(StreamSession.java:523) ~[apache-cassandra-3.10.jar:3.10]
at org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:317) ~[apache-cassandra-3.10.jar:3.10]
at java.lang.Thread.run(Thread.java:748) [na:1.8.0_191]
INFO [StreamReceiveTask:1] 2019-05-09 07:58:11,310 StreamResultFuture.java:187 - [Stream #21f71b70-7230-11e9-9495-ab5d6d9b9488] Session with /172.50.22.10 is complete
ERROR [Reference-Reaper:1] 2019-05-09 07:58:19,115 Ref.java:224 - LEAK DETECTED: a reference (org.apache.cassandra.utils.concurrent.Ref$State#441b19c1) to class org.apache.cassandra.utils.concurrent.WrappedSharedCloseable$Tidy#1255997234:[[OffHeapBitSet]] was not released before the reference was garbage collected
ERROR [Reference-Reaper:1] 2019-05-09 07:58:19,115 Ref.java:224 - LEAK DETECTED: a reference (org.apache.cassandra.utils.concurrent.Ref$State#4ee372b1) to class org.apache.cassandra.io.util.SafeMemory$MemoryTidy#199357705:Memory#[7f12a4ad4970..7f12a4ad4a10) was not released before the reference was garbage collected
ERROR [Reference-Reaper:1] 2019-05-09 07:58:19,115 Ref.java:224 - LEAK DETECTED: a reference (org.apache.cassandra.utils.concurrent.Ref$State#172830a5) to class org.apache.cassandra.io.util.SafeMemory$MemoryTidy#983595037:Memory#[7f12a4b3f670..7f12a4b3f990) was not released before the reference was garbage collected
ERROR [Reference-Reaper:1] 2019-05-09 07:58:19,116 Ref.java:224 - LEAK DETECTED: a reference (org.apache.cassandra.utils.concurrent.Ref$State#6f83e302) to class org.apache.cassandra.io.util.SafeMemory$MemoryTidy#1808665158:Memory#[7f12a41a56d0..7f12a41a56d4) was not released before the reference was garbage collected
INFO [StreamReceiveTask:1] 2019-05-09 07:58:40,657 SecondaryIndexManager.java:365 - Submitting index build of groupchatid_idx_giinfo for data in BigTableReader(path='/cassandra/data/srib_storeserver_objectstore_db/groupinfoobjects-28134170627f11e9b62cc50700a5eaee/mc-1-big-Data.db'),BigTableReader(path='/cassandra/data/srib_storeserver_objectstore_db/groupinfoobjects-28134170627f11e9b62cc50700a5eaee/mc-2-big-Data.db')
INFO [StreamReceiveTask:1] 2019-05-09 07:58:40,714 SecondaryIndexManager.java:385 - Index build of groupchatid_idx_giinfo complete
INFO [StreamReceiveTask:1] 2019-05-09 07:58:41,494 SecondaryIndexManager.java:365 - Submitting index build of groupchatid_idx_giinfo for data in BigTableReader(path='/cassandra/data/srib_storeserver_objectstore_db/groupinfoobjects-28134170627f11e9b62cc50700a5eaee/mc-3-big-Data.db'),BigTableReader(path='/cassandra/data/srib_storeserver_objectstore_db/groupinfoobjects-28134170627f11e9b62cc50700a5eaee/mc-4-big-Data.db'),BigTableReader(path='/cassandra/data/srib_storeserver_objectstore_db/groupinfoobjects-28134170627f11e9b62cc50700a5eaee/mc-5-big-Data.db'),BigTableReader(path='/cassandra/data/srib_storeserver_objectstore_db/groupinfoobjects-28134170627f11e9b62cc50700a5eaee/mc-6-big-Data.db'),BigTableReader(path='/cassandra/data/srib_storeserver_objectstore_db/groupinfoobjects-28134170627f11e9b62cc50700a5eaee/mc-7-big-Data.db')
INFO [StreamReceiveTask:1] 2019-05-09 07:58:41,537 SecondaryIndexManager.java:385 - Index build of groupchatid_idx_giinfo complete
INFO [StreamReceiveTask:1] 2019-05-09 07:58:43,175 SecondaryIndexManager.java:365 - Submitting index build of pushcapabilityindx,presetsearchval for data in BigTableReader(path='/cassandra/data/abc_db/sub-28f738d0627f11e9b62cc50700a5eaee/mc-11-big-Data.db')
INFO [StreamReceiveTask:1] 2019-05-09 07:58:43,209 SecondaryIndexManager.java:385 - Index build of pushcapabilityindx,presetsearchval complete
INFO [StreamReceiveTask:1] 2019-05-09 07:58:43,972 StreamResultFuture.java:187 - [Stream #21f71b70-7230-11e9-9495-ab5d6d9b9488] Session with /172.50.20.10 is complete
INFO [StreamReceiveTask:1] 2019-05-09 07:58:45,643 StreamResultFuture.java:187 - [Stream #21f71b70-7230-11e9-9495-ab5d6d9b9488] Session with /172.50.21.10 is complete
WARN [StreamReceiveTask:1] 2019-05-09 07:58:45,664 StreamResultFuture.java:214 - [Stream #21f71b70-7230-11e9-9495-ab5d6d9b9488] Stream failed
WARN [StreamReceiveTask:1] 2019-05-09 07:58:45,665 StorageService.java:1497 - Error during bootstrap.
org.apache.cassandra.streaming.StreamException: Stream failed
at org.apache.cassandra.streaming.management.StreamEventJMXNotifier.onFailure(StreamEventJMXNotifier.java:88) ~[apache-cassandra-3.10.jar:3.10]
at com.google.common.util.concurrent.Futures$6.run(Futures.java:1310) [guava-18.0.jar:na]
at com.google.common.util.concurrent.MoreExecutors$DirectExecutor.execute(MoreExecutors.java:457) [guava-18.0.jar:na]
at com.google.common.util.concurrent.ExecutionList.executeListener(ExecutionList.java:156) [guava-18.0.jar:na]
at com.google.common.util.concurrent.ExecutionList.execute(ExecutionList.java:145) [guava-18.0.jar:na]
at com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:202) [guava-18.0.jar:na]
at org.apache.cassandra.streaming.StreamResultFuture.maybeComplete(StreamResultFuture.java:215) [apache-cassandra-3.10.jar:3.10]
at org.apache.cassandra.streaming.StreamResultFuture.handleSessionComplete(StreamResultFuture.java:191) [apache-cassandra-3.10.jar:3.10]
at org.apache.cassandra.streaming.StreamSession.closeSession(StreamSession.java:481) [apache-cassandra-3.10.jar:3.10]
at org.apache.cassandra.streaming.StreamSession.maybeCompleted(StreamSession.java:766) [apache-cassandra-3.10.jar:3.10]
at org.apache.cassandra.streaming.StreamSession.taskCompleted(StreamSession.java:727) [apache-cassandra-3.10.jar:3.10]
at org.apache.cassandra.streaming.StreamReceiveTask$OnCompletionRunnable.run(StreamReceiveTask.java:244) [apache-cassandra-3.10.jar:3.10]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_191]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) [na:1.8.0_191]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [na:1.8.0_191]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [na:1.8.0_191]
at org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:79) [apache-cassandra-3.10.jar:3.10]
at java.lang.Thread.run(Thread.java:748) ~[na:1.8.0_191]
ERROR [main] 2019-05-09 07:58:45,665 StorageService.java:1507 - Error while waiting on bootstrap to complete. Bootstrap will have to be restarted.
java.util.concurrent.ExecutionException: org.apache.cassandra.streaming.StreamException: Stream failed
at com.google.common.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:299) ~[guava-18.0.jar:na]
at com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:286) ~[guava-18.0.jar:na]
at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116) ~[guava-18.0.jar:na]
at org.apache.cassandra.service.StorageService.bootstrap(StorageService.java:1502) [apache-cassandra-3.10.jar:3.10]
at org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:962) [apache-cassandra-3.10.jar:3.10]
at org.apache.cassandra.service.StorageService.initServer(StorageService.java:681) [apache-cassandra-3.10.jar:3.10]
at org.apache.cassandra.service.StorageService.initServer(StorageService.java:612) [apache-cassandra-3.10.jar:3.10]
at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:394) [apache-cassandra-3.10.jar:3.10]
at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:601) [apache-cassandra-3.10.jar:3.10]
at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:735) [apache-cassandra-3.10.jar:3.10]
Caused by: org.apache.cassandra.streaming.StreamException: Stream failed
at org.apache.cassandra.streaming.management.StreamEventJMXNotifier.onFailure(StreamEventJMXNotifier.java:88) ~[apache-cassandra-3.10.jar:3.10]
at com.google.common.util.concurrent.Futures$6.run(Futures.java:1310) ~[guava-18.0.jar:na]
at com.google.common.util.concurrent.MoreExecutors$DirectExecutor.execute(MoreExecutors.java:457) ~[guava-18.0.jar:na]
at com.google.common.util.concurrent.ExecutionList.executeListener(ExecutionList.java:156) ~[guava-18.0.jar:na]
at com.google.common.util.concurrent.ExecutionList.execute(ExecutionList.java:145) ~[guava-18.0.jar:na]
at com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:202) ~[guava-18.0.jar:na]
at org.apache.cassandra.streaming.StreamResultFuture.maybeComplete(StreamResultFuture.java:215) ~[apache-cassandra-3.10.jar:3.10]
at org.apache.cassandra.streaming.StreamResultFuture.handleSessionComplete(StreamResultFuture.java:191) ~[apache-cassandra-3.10.jar:3.10]
at org.apache.cassandra.streaming.StreamSession.closeSession(StreamSession.java:481) ~[apache-cassandra-3.10.jar:3.10]
at org.apache.cassandra.streaming.StreamSession.maybeCompleted(StreamSession.java:766) ~[apache-cassandra-3.10.jar:3.10]
at org.apache.cassandra.streaming.StreamSession.taskCompleted(StreamSession.java:727) ~[apache-cassandra-3.10.jar:3.10]
at org.apache.cassandra.streaming.StreamReceiveTask$OnCompletionRunnable.run(StreamReceiveTask.java:244) ~[apache-cassandra-3.10.jar:3.10]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[na:1.8.0_191]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[na:1.8.0_191]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[na:1.8.0_191]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[na:1.8.0_191]
at org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:79) ~[apache-cassandra-3.10.jar:3.10]
at java.lang.Thread.run(Thread.java:748) ~[na:1.8.0_191]
WARN [main] 2019-05-09 07:58:45,676 StorageService.java:1013 - Some data streaming failed. Use nodetool to check bootstrap state and resume. For more, see `nodetool help bootstrap`. IN_PROGRESS
INFO [main] 2019-05-09 07:58:45,677 CassandraDaemon.java:694 - Waiting for gossip to settle before accepting client requests...
INFO [main] 2019-05-09 07:58:53,678 CassandraDaemon.java:725 - No gossip backlog; proceeding
INFO [main] 2019-05-09 07:58:53,737 NativeTransportService.java:70 - Netty using native Epoll event loop
INFO [main] 2019-05-09 07:58:53,781 Server.java:155 - Using Netty Version: [netty-buffer=netty-buffer-4.0.39.Final.38bdf86, netty-codec=netty-codec-4.0.39.Final.38bdf86, netty-codec-haproxy=netty-codec-haproxy-4.0.39.Final.38bdf86, netty-codec-http=netty-codec-http-4.0.39.Final.38bdf86, netty-codec-socks=netty-codec-socks-4.0.39.Final.38bdf86, netty-common=netty-common-4.0.39.Final.38bdf86, netty-handler=netty-handler-4.0.39.Final.38bdf86, netty-tcnative=netty-tcnative-1.1.33.Fork19.fe4816e, netty-transport=netty-transport-4.0.39.Final.38bdf86, netty-transport-native-epoll=netty-transport-native-epoll-4.0.39.Final.38bdf86, netty-transport-rxtx=netty-transport-rxtx-4.0.39.Final.38bdf86, netty-transport-sctp=netty-transport-sctp-4.0.39.Final.38bdf86, netty-transport-udt=netty-transport-udt-4.0.39.Final.38bdf86]
INFO [main] 2019-05-09 07:58:53,781 Server.java:156 - Starting listening for CQL clients on /172.50.20.11:7042 (unencrypted)...
INFO [main] 2019-05-09 07:58:53,809 CassandraDaemon.java:528 - Not starting RPC server as requested. Use JMX (StorageService->startRPCServer()) or nodetool (enablethrift) to start it
I solved this issue by myself.
No. Linux OS Package version RESULT
1 Ubuntu 16.04.3 LTS deb 3.10 Stream Failed
2 Ubuntu 16.04.3 LTS deb 3.11.4 Stream Failed
3 Ubuntu 16.04.3 LTS tgz 3.11.4 Successful
4 Ubuntu 18.04.2 LTS deb 3.10 Stream Failed
5 Ubuntu 18.04.2 LTS deb 3.11.4 Stream Failed
6 Debian GNU/Linux 9.9 (stretch) deb 3.10 Stream Failed
7 Debian GNU/Linux 9.9 (stretch) deb 3.11.4 Stream Failed
8 Amazon Linux 2 rpm 3.11.4 Successful
9 CentOS 7.6 rpm 3.11.4 Successful
Finally, I am going to install from binary tarball files.
I am not a developer, so I can not find the cause.
I hope someone finds the cause and fixes it in Ubuntu & Debian.
Regards.
Sungjae Yun.
Related
Pyspark job stuck in sleep and retry loop while reading from GCS
I tried running my spark job on GKE using spark-operator and dataproc but on both instances the hadoop adaptor is able to list the files but gets stuck in a sleep-retry loop while trying to read them from GCS. The service account has full access and I was able to fetch the file using gsutil on the same executor container using the same service account. This seems to rule out network or permission issues. Using spark-operator version v2.4.0-v1beta1-latest Logs: 2019-07-12 11:33:12 INFO HadoopRDD:54 - Input split: gs://app-logs/2019/07/04/08/ip-10-1-34-63-app-json.log-2019-07-04-08-20.gz:0+295144331 2019-07-12 11:33:12 INFO HadoopRDD:54 - Input split: gs://app-logs/2019/07/04/08/ip-10-1-33-94-app-json.log-2019-07-04-08-20.gz:0+305812437 2019-07-12 11:33:12 INFO HadoopRDD:54 - Input split: gs://app-logs/2019/07/04/08/ip-10-1-34-61-app-json.log-2019-07-04-08-20.gz:0+297933921 2019-07-12 11:33:12 INFO HadoopRDD:54 - Input split: gs://app-logs/2019/07/04/08/ip-10-1-33-112-app-json.log-2019-07-04-08-20.gz:0+309553279 2019-07-12 11:33:12 INFO TorrentBroadcast:54 - Started reading broadcast variable 0 2019-07-12 11:33:12 INFO MemoryStore:54 - Block broadcast_0_piece0 stored as bytes in memory (estimated size 23.1 KB, free 3.3 GB) 2019-07-12 11:33:12 INFO TorrentBroadcast:54 - Reading broadcast variable 0 took 13 ms 2019-07-12 11:33:12 INFO MemoryStore:54 - Block broadcast_0 stored as values in memory (estimated size 323.8 KB, free 3.3 GB) 2019-07-12 11:33:14 INFO CodecPool:181 - Got brand-new decompressor [.gz] 2019-07-12 11:33:14 INFO CodecPool:181 - Got brand-new decompressor [.gz] 2019-07-12 11:33:14 INFO CodecPool:181 - Got brand-new decompressor [.gz] 2019-07-12 11:33:14 INFO CodecPool:181 - Got brand-new decompressor [.gz] 2019-07-12 11:42:00 WARN GoogleCloudStorageReadChannel:76 - Failed read retry #1/10 for 'gs://app-logs/2019/07/04/08/ip-10-1-33-112-app-json.log-2019-07-04-08-20.gz'. Sleeping... java.net.SocketException: Connection reset at java.net.SocketInputStream.read(SocketInputStream.java:210) at java.net.SocketInputStream.read(SocketInputStream.java:141) at sun.security.ssl.InputRecord.readFully(InputRecord.java:465) at sun.security.ssl.InputRecord.readV3Record(InputRecord.java:593) at sun.security.ssl.InputRecord.read(InputRecord.java:532) at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:983) at sun.security.ssl.SSLSocketImpl.readDataRecord(SSLSocketImpl.java:940) at sun.security.ssl.AppInputStream.read(AppInputStream.java:105) at java.io.BufferedInputStream.read1(BufferedInputStream.java:284) at java.io.BufferedInputStream.read(BufferedInputStream.java:345) at sun.net.www.MeteredStream.read(MeteredStream.java:134) at java.io.FilterInputStream.read(FilterInputStream.java:133) at sun.net.www.protocol.http.HttpURLConnection$HttpInputStream.read(HttpURLConnection.java:3393) at com.google.cloud.hadoop.repackaged.gcs.com.google.api.client.http.javanet.NetHttpResponse$SizeValidatingInputStream.read(NetHttpResponse.java:169) at java.nio.channels.Channels$ReadableByteChannelImpl.read(Channels.java:385) at com.google.cloud.hadoop.repackaged.gcs.com.google.cloud.hadoop.gcsio.GoogleCloudStorageReadChannel.read(GoogleCloudStorageReadChannel.java:370) at com.google.cloud.hadoop.fs.gcs.GoogleHadoopFSInputStream.read(GoogleHadoopFSInputStream.java:130) at java.io.DataInputStream.read(DataInputStream.java:149) at org.apache.hadoop.io.compress.DecompressorStream.getCompressedData(DecompressorStream.java:159) at org.apache.hadoop.io.compress.DecompressorStream.decompress(DecompressorStream.java:143) at org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:85) at java.io.InputStream.read(InputStream.java:101) at org.apache.hadoop.util.LineReader.fillBuffer(LineReader.java:180) at org.apache.hadoop.util.LineReader.readDefaultLine(LineReader.java:216) at org.apache.hadoop.util.LineReader.readLine(LineReader.java:174) at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:248) at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:48) at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:293) at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:224) at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73) at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409) at scala.collection.Iterator$class.foreach(Iterator.scala:891) at scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:224) at org.apache.spark.api.python.PythonRunner$$anon$2.writeIteratorToStream(PythonRunner.scala:557) at org.apache.spark.api.python.BasePythonRunner$WriterThread$$anonfun$run$1.apply(PythonRunner.scala:345) at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1945) at org.apache.spark.api.python.BasePythonRunner$WriterThread.run(PythonRunner.scala:194) 2019-07-12 11:42:00 INFO GoogleCloudStorageReadChannel:76 - Done sleeping before retry #1/10 for 'gs://app-logs/2019/07/04/08/ip-10-1-33-112-app-json.log-2019-07-04-08-20.gz' 2019-07-12 11:42:00 INFO GoogleCloudStorageReadChannel:76 - Success after 1 retries on reading 'gs://app-logs/2019/07/04/08/ip-10-1-33-112-app-json.log-2019-07-04-08-20.gz' 2019-07-12 11:42:00 WARN GoogleCloudStorageReadChannel:76 - Failed read retry #1/10 for 'gs://app-logs/2019/07/04/08/ip-10-1-34-61-app-json.log-2019-07-04-08-20.gz'. Sleeping... java.net.SocketException: Connection reset at java.net.SocketInputStream.read(SocketInputStream.java:210) at java.net.SocketInputStream.read(SocketInputStream.java:141) at sun.security.ssl.InputRecord.readFully(InputRecord.java:465) at sun.security.ssl.InputRecord.readV3Record(InputRecord.java:593) at sun.security.ssl.InputRecord.read(InputRecord.java:532) at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:983) at sun.security.ssl.SSLSocketImpl.readDataRecord(SSLSocketImpl.java:940) at sun.security.ssl.AppInputStream.read(AppInputStream.java:105) at java.io.BufferedInputStream.read1(BufferedInputStream.java:284) at java.io.BufferedInputStream.read(BufferedInputStream.java:345) at sun.net.www.MeteredStream.read(MeteredStream.java:134) at java.io.FilterInputStream.read(FilterInputStream.java:133) at sun.net.www.protocol.http.HttpURLConnection$HttpInputStream.read(HttpURLConnection.java:3393) at com.google.cloud.hadoop.repackaged.gcs.com.google.api.client.http.javanet.NetHttpResponse$SizeValidatingInputStream.read(NetHttpResponse.java:169) at java.nio.channels.Channels$ReadableByteChannelImpl.read(Channels.java:385) at com.google.cloud.hadoop.repackaged.gcs.com.google.cloud.hadoop.gcsio.GoogleCloudStorageReadChannel.read(GoogleCloudStorageReadChannel.java:370) at com.google.cloud.hadoop.fs.gcs.GoogleHadoopFSInputStream.read(GoogleHadoopFSInputStream.java:130) at java.io.DataInputStream.read(DataInputStream.java:149) at org.apache.hadoop.io.compress.DecompressorStream.getCompressedData(DecompressorStream.java:159) at org.apache.hadoop.io.compress.DecompressorStream.decompress(DecompressorStream.java:143) at org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:85) at java.io.InputStream.read(InputStream.java:101) at org.apache.hadoop.util.LineReader.fillBuffer(LineReader.java:180) at org.apache.hadoop.util.LineReader.readDefaultLine(LineReader.java:216) at org.apache.hadoop.util.LineReader.readLine(LineReader.java:174) at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:248) at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:48) at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:293) at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:224) at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73) at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409) at scala.collection.Iterator$class.foreach(Iterator.scala:891) at scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:224) at org.apache.spark.api.python.PythonRunner$$anon$2.writeIteratorToStream(PythonRunner.scala:557) at org.apache.spark.api.python.BasePythonRunner$WriterThread$$anonfun$run$1.apply(PythonRunner.scala:345) at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1945) at org.apache.spark.api.python.BasePythonRunner$WriterThread.run(PythonRunner.scala:194) 2019-07-12 11:42:00 INFO GoogleCloudStorageReadChannel:76 - Done sleeping before retry #1/10 for 'gs://app-logs/2019/07/04/08/ip-10-1-34-61-app-json.log-2019-07-04-08-20.gz' 2019-07-12 11:42:00 INFO GoogleCloudStorageReadChannel:76 - Success after 1 retries on reading 'gs://app-logs/2019/07/04/08/ip-10-1-34-61-app-json.log-2019-07-04-08-20.gz' 2019-07-12 11:50:44 WARN GoogleCloudStorageReadChannel:76 - Failed read retry #1/10 for 'gs://app-logs/2019/07/04/08/ip-10-1-33-112-app-json.log-2019-07-04-08-20.gz'. Sleeping... java.net.SocketException: Connection reset at java.net.SocketInputStream.read(SocketInputStream.java:210) at java.net.SocketInputStream.read(SocketInputStream.java:141) at sun.security.ssl.InputRecord.readFully(InputRecord.java:465) at sun.security.ssl.InputRecord.readV3Record(InputRecord.java:593) at sun.security.ssl.InputRecord.read(InputRecord.java:532) at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:983) at sun.security.ssl.SSLSocketImpl.readDataRecord(SSLSocketImpl.java:940) at sun.security.ssl.AppInputStream.read(AppInputStream.java:105) at java.io.BufferedInputStream.fill(BufferedInputStream.java:246) at java.io.BufferedInputStream.read1(BufferedInputStream.java:286) at java.io.BufferedInputStream.read(BufferedInputStream.java:345) at sun.net.www.http.ChunkedInputStream.fastRead(ChunkedInputStream.java:244) at sun.net.www.http.ChunkedInputStream.read(ChunkedInputStream.java:689) at java.io.FilterInputStream.read(FilterInputStream.java:133) at sun.net.www.protocol.http.HttpURLConnection$HttpInputStream.read(HttpURLConnection.java:3393) at com.google.cloud.hadoop.repackaged.gcs.com.google.api.client.http.javanet.NetHttpResponse$SizeValidatingInputStream.read(NetHttpResponse.java:169) at java.nio.channels.Channels$ReadableByteChannelImpl.read(Channels.java:385) at com.google.cloud.hadoop.repackaged.gcs.com.google.cloud.hadoop.gcsio.GoogleCloudStorageReadChannel.read(GoogleCloudStorageReadChannel.java:370) at com.google.cloud.hadoop.fs.gcs.GoogleHadoopFSInputStream.read(GoogleHadoopFSInputStream.java:130) at java.io.DataInputStream.read(DataInputStream.java:149) at org.apache.hadoop.io.compress.DecompressorStream.getCompressedData(DecompressorStream.java:159) at org.apache.hadoop.io.compress.DecompressorStream.decompress(DecompressorStream.java:143) at org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:85) at java.io.InputStream.read(InputStream.java:101) at org.apache.hadoop.util.LineReader.fillBuffer(LineReader.java:180) at org.apache.hadoop.util.LineReader.readDefaultLine(LineReader.java:216) at org.apache.hadoop.util.LineReader.readLine(LineReader.java:174) at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:248) at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:48) at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:293) at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:224) at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73) at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409) at scala.collection.Iterator$class.foreach(Iterator.scala:891) at scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:224) at org.apache.spark.api.python.PythonRunner$$anon$2.writeIteratorToStream(PythonRunner.scala:557) at org.apache.spark.api.python.BasePythonRunner$WriterThread$$anonfun$run$1.apply(PythonRunner.scala:345) at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1945) at org.apache.spark.api.python.BasePythonRunner$WriterThread.run(PythonRunner.scala:194) 2019-07-12 11:50:44 INFO GoogleCloudStorageReadChannel:76 - Done sleeping before retry #1/10 for 'gs://app-logs/2019/07/04/08/ip-10-1-33-112-app-json.log-2019-07-04-08-20.gz' 2019-07-12 11:50:44 INFO GoogleCloudStorageReadChannel:76 - Success after 1 retries on reading 'gs://app-logs/2019/07/04/08/ip-10-1-33-112-app-json.log-2019-07-04-08-20.gz' 2019-07-12 11:55:06 WARN GoogleCloudStorageReadChannel:76 - Failed read retry #1/10 for 'gs://app-logs/2019/07/04/08/ip-10-1-33-94-app-json.log-2019-07-04-08-20.gz'. Sleeping... java.net.SocketException: Connection reset at java.net.SocketInputStream.read(SocketInputStream.java:210) at java.net.SocketInputStream.read(SocketInputStream.java:141) at sun.security.ssl.InputRecord.readFully(InputRecord.java:465) at sun.security.ssl.InputRecord.readV3Record(InputRecord.java:593) at sun.security.ssl.InputRecord.read(InputRecord.java:532) at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:983) at sun.security.ssl.SSLSocketImpl.readDataRecord(SSLSocketImpl.java:940) at sun.security.ssl.AppInputStream.read(AppInputStream.java:105) at java.io.BufferedInputStream.read1(BufferedInputStream.java:284) at java.io.BufferedInputStream.read(BufferedInputStream.java:345) at sun.net.www.MeteredStream.read(MeteredStream.java:134) at java.io.FilterInputStream.read(FilterInputStream.java:133) at sun.net.www.protocol.http.HttpURLConnection$HttpInputStream.read(HttpURLConnection.java:3393) at com.google.cloud.hadoop.repackaged.gcs.com.google.api.client.http.javanet.NetHttpResponse$SizeValidatingInputStream.read(NetHttpResponse.java:169) at java.nio.channels.Channels$ReadableByteChannelImpl.read(Channels.java:385) at com.google.cloud.hadoop.repackaged.gcs.com.google.cloud.hadoop.gcsio.GoogleCloudStorageReadChannel.read(GoogleCloudStorageReadChannel.java:370) at com.google.cloud.hadoop.fs.gcs.GoogleHadoopFSInputStream.read(GoogleHadoopFSInputStream.java:130) at java.io.DataInputStream.read(DataInputStream.java:149) at org.apache.hadoop.io.compress.DecompressorStream.getCompressedData(DecompressorStream.java:159) at org.apache.hadoop.io.compress.DecompressorStream.decompress(DecompressorStream.java:143) at org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:85) at java.io.InputStream.read(InputStream.java:101) at org.apache.hadoop.util.LineReader.fillBuffer(LineReader.java:180) at org.apache.hadoop.util.LineReader.readDefaultLine(LineReader.java:216) at org.apache.hadoop.util.LineReader.readLine(LineReader.java:174) at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:248) at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:48) at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:293) at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:224) at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73) at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409) at scala.collection.Iterator$class.foreach(Iterator.scala:891) at scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:224) at org.apache.spark.api.python.PythonRunner$$anon$2.writeIteratorToStream(PythonRunner.scala:557) at org.apache.spark.api.python.BasePythonRunner$WriterThread$$anonfun$run$1.apply(PythonRunner.scala:345) at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1945) at org.apache.spark.api.python.BasePythonRunner$WriterThread.run(PythonRunner.scala:194) 2019-07-12 11:55:06 INFO GoogleCloudStorageReadChannel:76 - Done sleeping before retry #1/10 for 'gs://app-logs/2019/07/04/08/ip-10-1-33-94-app-json.log-2019-07-04-08-20.gz' 2019-07-12 11:55:06 INFO GoogleCloudStorageReadChannel:76 - Success after 1 retries on reading 'gs://app-logs/2019/07/04/08/ip-10-1-33-94-app-json.log-2019-07-04-08-20.gz' 2019-07-12 11:55:10 WARN GoogleCloudStorageReadChannel:76 - Failed read retry #1/10 for 'gs://app-logs/2019/07/04/08/ip-10-1-34-63-app-json.log-2019-07-04-08-20.gz'. Sleeping... java.net.SocketException: Connection reset at java.net.SocketInputStream.read(SocketInputStream.java:210) at java.net.SocketInputStream.read(SocketInputStream.java:141) at sun.security.ssl.InputRecord.readFully(InputRecord.java:465) at sun.security.ssl.InputRecord.readV3Record(InputRecord.java:593) at sun.security.ssl.InputRecord.read(InputRecord.java:532) at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:983) at sun.security.ssl.SSLSocketImpl.readDataRecord(SSLSocketImpl.java:940) at sun.security.ssl.AppInputStream.read(AppInputStream.java:105) at java.io.BufferedInputStream.fill(BufferedInputStream.java:246) at java.io.BufferedInputStream.read1(BufferedInputStream.java:286) at java.io.BufferedInputStream.read(BufferedInputStream.java:345) at sun.net.www.MeteredStream.read(MeteredStream.java:134) at java.io.FilterInputStream.read(FilterInputStream.java:133) at sun.net.www.protocol.http.HttpURLConnection$HttpInputStream.read(HttpURLConnection.java:3393) at com.google.cloud.hadoop.repackaged.gcs.com.google.api.client.http.javanet.NetHttpResponse$SizeValidatingInputStream.read(NetHttpResponse.java:169) at java.nio.channels.Channels$ReadableByteChannelImpl.read(Channels.java:385) at com.google.cloud.hadoop.repackaged.gcs.com.google.cloud.hadoop.gcsio.GoogleCloudStorageReadChannel.read(GoogleCloudStorageReadChannel.java:370) at com.google.cloud.hadoop.fs.gcs.GoogleHadoopFSInputStream.read(GoogleHadoopFSInputStream.java:130) at java.io.DataInputStream.read(DataInputStream.java:149) at org.apache.hadoop.io.compress.DecompressorStream.getCompressedData(DecompressorStream.java:159) at org.apache.hadoop.io.compress.DecompressorStream.decompress(DecompressorStream.java:143) at org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:85) at java.io.InputStream.read(InputStream.java:101) at org.apache.hadoop.util.LineReader.fillBuffer(LineReader.java:180) at org.apache.hadoop.util.LineReader.readDefaultLine(LineReader.java:216) at org.apache.hadoop.util.LineReader.readLine(LineReader.java:174) at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:248) at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:48) at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:293) at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:224) at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73) at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409) at scala.collection.Iterator$class.foreach(Iterator.scala:891) at scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:224) at org.apache.spark.api.python.PythonRunner$$anon$2.writeIteratorToStream(PythonRunner.scala:557) at org.apache.spark.api.python.BasePythonRunner$WriterThread$$anonfun$run$1.apply(PythonRunner.scala:345) at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1945) at org.apache.spark.api.python.BasePythonRunner$WriterThread.run(PythonRunner.scala:194) 2019-07-12 11:55:10 INFO GoogleCloudStorageReadChannel:76 - Done sleeping before retry #1/10 for 'gs://app-logs/2019/07/04/08/ip-10-1-34-63-app-json.log-2019-07-04-08-20.gz' 2019-07-12 11:55:10 INFO GoogleCloudStorageReadChannel:76 - Success after 1 retries on reading 'gs://app-logs/2019/07/04/08/ip-10-1-34-63-app-json.log-2019-07-04-08-20.gz' What could be causing this? Have relaxed the firewall rules as well.
You need to check if your cluster and GCS bucket that you are reading from are in the same GCP region - it could be slow if reads are cross regional. Also, it seems that you are processing gzipped log files that are can not be split and need to be re-read from the beginning on each failure, which could lead to long read times if there are some network flakiness (because of cross regional reads, for example).
"Stream Fail & NoSuchElementException: null" with Cassandra 3.10
I am having some problem with Cassandra 3.10 that when I try to join/decommission node, it cannot join/decommission the cluster with the "stream failed" exception, then it stops running. Util that time, node stays in JOINING state. version : cqlsh 5.0.1 | Cassandra 3.10 | CQL spec 3.4.4 | Native protocol v4 Node 1 Error log (seed) INFO [STREAM-INIT-/10.0.120.11:44986] 2019-03-04 02:25:13,844 StreamResultFuture.java:116 - [Stream #bd205bb0-3e24-11e9-9d30-fb2ced1ba79f ID#0] Creating new streaming plan for Bootstrap INFO [STREAM-INIT-/10.0.120.11:44986] 2019-03-04 02:25:13,861 StreamResultFuture.java:123 - [Stream #bd205bb0-3e24-11e9-9d30-fb2ced1ba79f, ID#0] Received streaming plan for Bootstrap INFO [STREAM-INIT-/10.0.120.11:44988] 2019-03-04 02:25:13,861 StreamResultFuture.java:123 - [Stream #bd205bb0-3e24-11e9-9d30-fb2ced1ba79f, ID#0] Received streaming plan for Bootstrap INFO [STREAM-IN-/10.0.120.11:44988] 2019-03-04 02:25:14,130 StreamResultFuture.java:173 - [Stream #bd205bb0-3e24-11e9-9d30-fb2ced1ba79f ID#0] Prepare completed. Receiving 0 files(0.000KiB), sending 154 files(64.507MiB) ERROR [STREAM-IN-/10.0.120.11:44988] 2019-03-04 02:25:14,401 StreamSession.java:706 - [Stream #bd205bb0-3e24-11e9-9d30-fb2ced1ba79f] Remote peer 10.0.120.11 failed stream session. INFO [STREAM-IN-/10.0.120.11:44988] 2019-03-04 02:25:14,415 StreamResultFuture.java:187 - [Stream #bd205bb0-3e24-11e9-9d30-fb2ced1ba79f] Session with /10.0.120.11 is complete WARN [STREAM-IN-/10.0.120.11:44988EHgks ] 2019-03-04 02:25:14,417 StreamResultFuture.java:214 - [Stream #bd205bb0-3e24-11e9-9d30-fb2ced1ba79f] Stream failed ERROR [STREAM-OUT-/10.0.120.11:44986] 2019-03-04 02:25:14,418 StreamSession.java:593 - [Stream #bd205bb0-3e24-11e9-9d30-fb2ced1ba79f] Streaming error occurred on session with peer 10.0.120.11 org.apache.cassandra.io.FSReadError: java.io.IOException: Broken pipe at org.apache.cassandra.io.util.ChannelProxy.transferTo(ChannelProxy.java:145) ~[apache-cassandra-3.10.jar:3.10] at org.apache.cassandra.streaming.compress.CompressedStreamWriter.lambda$write$0(CompressedStreamWriter.java:85) ~[apache-cassandra-3.10.jar:3.10] at org.apache.cassandra.io.util.BufferedDataOutputStreamPlus.applyToChannel(BufferedDataOutputStreamPlus.java:350) ~[apache-cassandra-3.10.jar:3.10] at org.apache.cassandra.streaming.compress.CompressedStreamWriter.write(CompressedStreamWriter.java:85) ~[apache-cassandra-3.10.jar:3.10] at org.apache.cassandra.streaming.messages.OutgoingFileMessage.serialize(OutgoingFileMessage.java:101) ~[apache-cassandra-3.10.jar:3.10] at org.apache.cassandra.streaming.messages.OutgoingFileMessage$1.serialize(OutgoingFileMessage.java:52) ~[apache-cassandra-3.10.jar:3.10] at org.apache.cassandra.streaming.messages.OutgoingFileMessage$1.serialize(OutgoingFileMessage.java:41) ~[apache-cassandra-3.10.jar:3.10] at org.apache.cassandra.streaming.messages.StreamMessage.serialize(StreamMessage.java:50) ~[apache-cassandra-3.10.jar:3.10] at org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.sendMessage(ConnectionHandler.java:408) ~[apache-cassandra-3.10.jar:3.10] at org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.run(ConnectionHandler.java:380) ~[apache-cassandra-3.10.jar:3.10] at java.lang.Thread.run(Thread.java:748) [na:1.8.0_152] Caused by: java.io.IOException: Broken pipe at sun.nio.ch.FileChannelImpl.transferTo0(Native Method) ~[na:1.8.0_152] at sun.nio.ch.FileChannelImpl.transferToDirectlyInternal(FileChannelImpl.java:428) ~[na:1.8.0_152] at sun.nio.ch.FileChannelImpl.transferToDirectly(FileChannelImpl.java:493) ~[na:1.8.0_152] at sun.nio.ch.FileChannelImpl.transferTo(FileChannelImpl.java:608) ~[na:1.8.0_152] at org.apache.cassandra.io.util.ChannelProxy.transferTo(ChannelProxy.java:141) ~[apache-cassandra-3.10.jar:3.10] ... 10 common frames omitted Node 2 Error log INFO [main] 2019-03-04 02:25:12,424 StorageService.java:1435 - JOINING: Starting to bootstrap... INFO [main] 2019-03-04 02:25:13,832 StreamResultFuture.java:90 - [Stream #bd205bb0-3e24-11e9-9d30-fb2ced1ba79f] Executing streaming plan for Bootstrap INFO [StreamConnectionEstablisher:1] 2019-03-04 02:25:13,837 StreamSession.java:266 - [Stream #bd205bb0-3e24-11e9-9d30-fb2ced1ba79f] Starting streaming to /10.0.110.11 INFO [StreamConnectionEstablisher:1] 2019-03-04 02:25:13,842 StreamCoordinator.java:264 - [Stream #bd205bb0-3e24-11e9-9d30-fb2ced1ba79f, ID#0] Beginning stream session with /10.0.110.11 INFO [STREAM-IN-/10.0.110.11:5000] 2019-03-04 02:25:14,138 StreamResultFuture.java:173 - [Stream #bd205bb0-3e24-11e9-9d30-fb2ced1ba79f ID#0] Prepare completed. Receiving 154 files(64.507MiB), sending 0 files(0.000KiB) INFO [StreamReceiveTask:1] 2019-03-04 02:25:14,376 SecondaryIndexManager.java:365 - Submitting index build of pushcapabilityindx,presetsearchval for data in BigTableReader(path='/cassandra/data/abc_sub_db/sub-94fe59103afb11e9a042932fa01bd6f1/mc-1-big-Data.db'),BigTableReader(path='/cassandra/data/abc_sub_db/sub-94fe59103afb11e9a042932fa01bd6f1/mc-2-big-Data.db') ERROR [StreamReceiveTask:1] 2019-03-04 02:25:14,400 StreamSession.java:593 - [Stream #bd205bb0-3e24-11e9-9d30-fb2ced1ba79f] Streaming error occurred on session with peer 10.0.110.11 java.lang.RuntimeException: java.util.concurrent.ExecutionException: java.util.NoSuchElementException at org.apache.cassandra.utils.Throwables.maybeFail(Throwables.java:51) ~[apache-cassandra-3.10.jar:3.10] at org.apache.cassandra.utils.FBUtilities.waitOnFutures(FBUtilities.java:393) ~[apache-cassandra-3.10.jar:3.10] at org.apache.cassandra.index.SecondaryIndexManager.buildIndexesBlocking(SecondaryIndexManager.java:382) ~[apache-cassandra-3.10.jar:3.10] at org.apache.cassandra.index.SecondaryIndexManager.buildAllIndexesBlocking(SecondaryIndexManager.java:269) ~[apache-cassandra-3.10.jar:3.10] at org.apache.cassandra.streaming.StreamReceiveTask$OnCompletionRunnable.run(StreamReceiveTask.java:215) ~[apache-cassandra-3.10.jar:3.10] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_152] at java.util.concurrent.FutureTask.run(FutureTask.java:266) [na:1.8.0_152] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [na:1.8.0_152] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [na:1.8.0_152] at org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:79) [apache-cassandra-3.10.jar:3.10] at java.lang.Thread.run(Thread.java:748) ~[na:1.8.0_152] Caused by: java.util.concurrent.ExecutionException: java.util.NoSuchElementException at java.util.concurrent.FutureTask.report(FutureTask.java:122) [na:1.8.0_152] at java.util.concurrent.FutureTask.get(FutureTask.java:192) [na:1.8.0_152] at org.apache.cassandra.utils.FBUtilities.waitOnFutures(FBUtilities.java:386) ~[apache-cassandra-3.10.jar:3.10] ... 9 common frames omitted Caused by: java.util.NoSuchElementException: null at org.apache.cassandra.utils.AbstractIterator.next(AbstractIterator.java:64) ~[apache-cassandra-3.10.jar:3.10] at org.apache.cassandra.index.SecondaryIndexManager.lambda$indexPartition$20(SecondaryIndexManager.java:618) ~[apache-cassandra-3.10.jar:3.10] at java.lang.Iterable.forEach(Iterable.java:75) ~[na:1.8.0_152] at org.apache.cassandra.index.SecondaryIndexManager.indexPartition(SecondaryIndexManager.java:618) ~[apache-cassandra-3.10.jar:3.10] at org.apache.cassandra.index.internal.CollatedViewIndexBuilder.build(CollatedViewIndexBuilder.java:71) ~[apache-cassandra-3.10.jar:3.10] at org.apache.cassandra.db.compaction.CompactionManager$14.run(CompactionManager.java:1587) ~[apache-cassandra-3.10.jar:3.10] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_152] at java.util.concurrent.FutureTask.run(FutureTask.java:266) [na:1.8.0_152] ... 6 common frames omitted ERROR [STREAM-IN-/10.0.110.11:5000] 2019-03-04 02:25:14,407 StreamSession.java:593 - [Stream #bd205bb0-3e24-11e9-9d30-fb2ced1ba79f] Streaming error occurred on session with peer 10.0.110.11 java.lang.RuntimeException: Outgoing stream handler has been closed at org.apache.cassandra.streaming.ConnectionHandler.sendMessage(ConnectionHandler.java:143) ~[apache-cassandra-3.10.jar:3.10] at org.apache.cassandra.streaming.StreamSession.receive(StreamSession.java:655) ~[apache-cassandra-3.10.jar:3.10] at org.apache.cassandra.streaming.StreamSession.messageReceived(StreamSession.java:523) ~[apache-cassandra-3.10.jar:3.10] at org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:317) ~[apache-cassandra-3.10.jar:3.10] at java.lang.Thread.run(Thread.java:748) [na:1.8.0_152] INFO [StreamReceiveTask:1] 2019-03-04 02:25:14,409 StreamResultFuture.java:187 - [Stream #bd205bb0-3e24-11e9-9d30-fb2ced1ba79f] Session with /10.0.110.11 is complete WARN [StreamReceiveTask:1] 2019-03-04 02:25:14,411 StreamResultFuture.java:214 - [Stream #bd205bb0-3e24-11e9-9d30-fb2ced1ba79f] Stream failed ERROR [main] 2019-03-04 02:25:14,412 StorageService.java:1507 - Error while waiting on bootstrap to complete. Bootstrap will have to be restarted. java.util.concurrent.ExecutionException: org.apache.cassandra.streaming.StreamException: Stream failed at com.google.common.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:299) ~[guava-18.0.jar:na] at com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:286) ~[guava-18.0.jar:na] at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116) ~[guava-18.0.jar:na] at org.apache.cassandra.service.StorageService.bootstrap(StorageService.java:1502) [apache-cassandra-3.10.jar:3.10] at org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:962) [apache-cassandra-3.10.jar:3.10] at org.apache.cassandra.service.StorageService.initServer(StorageService.java:681) [apache-cassandra-3.10.jar:3.10] at org.apache.cassandra.service.StorageService.initServer(StorageService.java:612) [apache-cassandra-3.10.jar:3.10] at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:394) [apache-cassandra-3.10.jar:3.10] at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:601) [apache-cassandra-3.10.jar:3.10] at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:735) [apache-cassandra-3.10.jar:3.10] Caused by: org.apache.cassandra.streaming.StreamException: Stream failed at org.apache.cassandra.streaming.management.StreamEventJMXNotifier.onFailure(StreamEventJMXNotifier.java:88) ~[apache-cassandra-3.10.jar:3.10] at com.google.common.util.concurrent.Futures$6.run(Futures.java:1310) ~[guava-18.0.jar:na] at com.google.common.util.concurrent.MoreExecutors$DirectExecutor.execute(MoreExecutors.java:457) ~[guava-18.0.jar:na] at com.google.common.util.concurrent.ExecutionList.executeListener(ExecutionList.java:156) ~[guava-18.0.jar:na] at com.google.common.util.concurrent.ExecutionList.execute(ExecutionList.java:145) ~[guava-18.0.jar:na] at com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:202) ~[guava-18.0.jar:na] at org.apache.cassandra.streaming.StreamResultFuture.maybeComplete(StreamResultFuture.java:215) ~[apache-cassandra-3.10.jar:3.10] at org.apache.cassandra.streaming.StreamResultFuture.handleSessionComplete(StreamResultFuture.java:191) ~[apache-cassandra-3.10.jar:3.10] at org.apache.cassandra.streaming.StreamSession.closeSession(StreamSession.java:481) ~[apache-cassandra-3.10.jar:3.10] at org.apache.cassandra.streaming.StreamSession.onError(StreamSession.java:571) ~[apache-cassandra-3.10.jar:3.10] at org.apache.cassandra.streaming.StreamReceiveTask$OnCompletionRunnable.run(StreamReceiveTask.java:249) ~[apache-cassandra-3.10.jar:3.10] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[na:1.8.0_152] at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[na:1.8.0_152] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[na:1.8.0_152] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[na:1.8.0_152] at org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:79) ~[apache-cassandra-3.10.jar:3.10] at java.lang.Thread.run(Thread.java:748) ~[na:1.8.0_152] WARN [StreamReceiveTask:1] 2019-03-04 02:25:14,412 StorageService.java:1497 - Error during bootstrap. org.apache.cassandra.streaming.StreamException: Stream failed at org.apache.cassandra.streaming.management.StreamEventJMXNotifier.onFailure(StreamEventJMXNotifier.java:88) ~[apache-cassandra-3.10.jar:3.10] at com.google.common.util.concurrent.Futures$6.run(Futures.java:1310) [guava-18.0.jar:na] at com.google.common.util.concurrent.MoreExecutors$DirectExecutor.execute(MoreExecutors.java:457) [guava-18.0.jar:na] at com.google.common.util.concurrent.ExecutionList.executeListener(ExecutionList.java:156) [guava-18.0.jar:na] at com.google.common.util.concurrent.ExecutionList.execute(ExecutionList.java:145) [guava-18.0.jar:na] at com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:202) [guava-18.0.jar:na] at org.apache.cassandra.streaming.StreamResultFuture.maybeComplete(StreamResultFuture.java:215) [apache-cassandra-3.10.jar:3.10] at org.apache.cassandra.streaming.StreamResultFuture.handleSessionComplete(StreamResultFuture.java:191) [apache-cassandra-3.10.jar:3.10] at org.apache.cassandra.streaming.StreamSession.closeSession(StreamSession.java:481) [apache-cassandra-3.10.jar:3.10] at org.apache.cassandra.streaming.StreamSession.onError(StreamSession.java:571) [apache-cassandra-3.10.jar:3.10] at org.apache.cassandra.streaming.StreamReceiveTask$OnCompletionRunnable.run(StreamReceiveTask.java:249) [apache-cassandra-3.10.jar:3.10] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_152] at java.util.concurrent.FutureTask.run(FutureTask.java:266) [na:1.8.0_152] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [na:1.8.0_152] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [na:1.8.0_152] at org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:79) [apache-cassandra-3.10.jar:3.10] at java.lang.Thread.run(Thread.java:748) ~[na:1.8.0_152] WARN [main] 2019-03-04 02:25:14,420 StorageService.java:1013 - Some data streaming failed. Use nodetool to check bootstrap state and resume. For more, see `nodetool help bootstrap`. IN_PROGRESS INFO [main] 2019-03-04 02:25:14,420 CassandraDaemon.java:694 - Waiting for gossip to settle before accepting client requests... INFO [main] 2019-03-04 02:25:22,421 CassandraDaemon.java:725 - No gossip backlog; proceeding INFO [main] 2019-03-04 02:25:22,482 NativeTransportService.java:70 - Netty using native Epoll event loop INFO [main] 2019-03-04 02:25:22,531 Server.java:155 - Using Netty Version: [netty-buffer=netty-buffer-4.0.39.Final.38bdf86, netty-codec=netty-codec-4.0.39.Final.38bdf86, netty-codec-haproxy=netty-codec-haproxy-4.0.39.Final.38bdf86, netty-codec-http=netty-codec-http-4.0.39.Final.38bdf86, netty-codec-socks=netty-codec-socks-4.0.39.Final.38bdf86, netty-common=netty-common-4.0.39.Final.38bdf86, netty-handler=netty-handler-4.0.39.Final.38bdf86, netty-tcnative=netty-tcnative-1.1.33.Fork19.fe4816e, netty-transport=netty-transport-4.0.39.Final.38bdf86, netty-transport-native-epoll=netty-transport-native-epoll-4.0.39.Final.38bdf86, netty-transport-rxtx=netty-transport-rxtx-4.0.39.Final.38bdf86, netty-transport-sctp=netty-transport-sctp-4.0.39.Final.38bdf86, netty-transport-udt=netty-transport-udt-4.0.39.Final.38bdf86] INFO [main] 2019-03-04 02:25:22,531 Server.java:156 - Starting listening for CQL clients on /10.0.120.11:7042 (unencrypted)... INFO [main] 2019-03-04 02:25:22,564 CassandraDaemon.java:528 - Not starting RPC server as requested. Use JMX (StorageService->startRPCServer()) or nodetool (enablethrift) to start it It succeeds in the following cases. case 1. drop index > node join/decommission SUCCEED > create index > node join/decommission FAIL case 2. table data backup (copy to command) > truncate table > data recovery (copy from command) > node join/decommission SUCCEED > New DATA insert from service > node join/decommission FAIL case 3. table data backup (copy to command) > drop table > create table >data recovery (copy from command) >node join/decommission SUCCEED > New DATA insert from service > node join/decommission FAIL Unfortunately, I do not know about new insert queries from the service. Q1. I want to see which queries come into Cassandra. Can I check queries from the service session? Q2. Do you have any idea why stream fails in cassandra?
Q1. Enable trace on logging level from nodetool setlogginglevels TRACE for cassandra transport. you can get queries on system.log. Q2. Generally this error is coming due to network issue between nodes while streaming the data or gossip is not happening.
how to understand yarn appattempt log and diagnostics?
I am running a spark application on YARN cluster(on AWS EMR). The application seems to be killed and I want to find the cause. I try to understand the YARN info given in the following screen. The diagnostic line in the screen seems to show that YARN killing the app because of the memory limit: Diagnostics: Container [pid=1540,containerID=container_1488651686158_0012_02_000001] is running beyond physical memory limits. Current usage: 1.6 GB of 1.4 GB physical memory used; 3.6 GB of 6.9 GB virtual memory used. Killing container. However, the appattempt log shows completely different exception, something related to the IO/network. My question is : should I trust the diagnostic in the screen or the appattempt log? Is the IO exception causing the kill or the out of memory cause the IO exception in the appattempt log? Is it another log/diagnostic I should look at? Thanks. 17/03/04 21:59:02 ERROR Utils: Uncaught exception in thread task-result-getter-0 java.lang.InterruptedException at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:998) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304) at scala.concurrent.impl.Promise$DefaultPromise.tryAwait(Promise.scala:202) at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:218) at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223) at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:190) at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53) at scala.concurrent.Await$.result(package.scala:190) at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:190) at org.apache.spark.network.BlockTransferService.fetchBlockSync(BlockTransferService.scala:104) at org.apache.spark.storage.BlockManager.getRemoteBytes(BlockManager.scala:579) at org.apache.spark.scheduler.TaskResultGetter$$anon$3$$anonfun$run$1.apply$mcV$sp(TaskResultGetter.scala:82) at org.apache.spark.scheduler.TaskResultGetter$$anon$3$$anonfun$run$1.apply(TaskResultGetter.scala:63) at org.apache.spark.scheduler.TaskResultGetter$$anon$3$$anonfun$run$1.apply(TaskResultGetter.scala:63) at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1951) at org.apache.spark.scheduler.TaskResultGetter$$anon$3.run(TaskResultGetter.scala:62) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Exception in thread "task-result-getter-0" java.lang.Error: java.lang.InterruptedException at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1148) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.InterruptedException at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:998) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304) at scala.concurrent.impl.Promise$DefaultPromise.tryAwait(Promise.scala:202) at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:218) at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223) at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:190) at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53) at scala.concurrent.Await$.result(package.scala:190) at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:190) at org.apache.spark.network.BlockTransferService.fetchBlockSync(BlockTransferService.scala:104) at org.apache.spark.storage.BlockManager.getRemoteBytes(BlockManager.scala:579) at org.apache.spark.scheduler.TaskResultGetter$$anon$3$$anonfun$run$1.apply$mcV$sp(TaskResultGetter.scala:82) at org.apache.spark.scheduler.TaskResultGetter$$anon$3$$anonfun$run$1.apply(TaskResultGetter.scala:63) at org.apache.spark.scheduler.TaskResultGetter$$anon$3$$anonfun$run$1.apply(TaskResultGetter.scala:63) at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1951) at org.apache.spark.scheduler.TaskResultGetter$$anon$3.run(TaskResultGetter.scala:62) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) ... 2 more 17/03/04 21:59:02 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped! 17/03/04 21:59:02 ERROR TransportResponseHandler: Still have 1 requests outstanding when connection from ip-172-31-9-207.ec2.internal/172.31.9.207:38437 is closed 17/03/04 21:59:02 INFO RetryingBlockFetcher: Retrying fetch (1/3) for 1 outstanding blocks after 5000 ms 17/03/04 21:59:02 ERROR DiskBlockManager: Exception while deleting local spark dir: /mnt/yarn/usercache/hadoop/appcache/application_1488651686158_0012/blockmgr-941a13d8-1b31-4347-bdec-180125b6f4ca java.io.IOException: Failed to delete: /mnt/yarn/usercache/hadoop/appcache/application_1488651686158_0012/blockmgr-941a13d8-1b31-4347-bdec-180125b6f4ca at org.apache.spark.util.Utils$.deleteRecursively(Utils.scala:1010) at org.apache.spark.storage.DiskBlockManager$$anonfun$org$apache$spark$storage$DiskBlockManager$$doStop$1.apply(DiskBlockManager.scala:169) at org.apache.spark.storage.DiskBlockManager$$anonfun$org$apache$spark$storage$DiskBlockManager$$doStop$1.apply(DiskBlockManager.scala:165) at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186) at org.apache.spark.storage.DiskBlockManager.org$apache$spark$storage$DiskBlockManager$$doStop(DiskBlockManager.scala:165) at org.apache.spark.storage.DiskBlockManager.stop(DiskBlockManager.scala:160) at org.apache.spark.storage.BlockManager.stop(BlockManager.scala:1361) at org.apache.spark.SparkEnv.stop(SparkEnv.scala:89) at org.apache.spark.SparkContext$$anonfun$stop$11.apply$mcV$sp(SparkContext.scala:1842) at org.apache.spark.util.Utils$.tryLogNonFatalError(Utils.scala:1283) at org.apache.spark.SparkContext.stop(SparkContext.scala:1841) at org.apache.spark.SparkContext$$anonfun$2.apply$mcV$sp(SparkContext.scala:581) at org.apache.spark.util.SparkShutdownHook.run(ShutdownHookManager.scala:216) at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ShutdownHookManager.scala:188) at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:188) at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:188) at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1951) at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply$mcV$sp(ShutdownHookManager.scala:188) at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:188) at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:188) at scala.util.Try$.apply(Try.scala:192) at org.apache.spark.util.SparkShutdownHookManager.runAll(ShutdownHookManager.scala:188) at org.apache.spark.util.SparkShutdownHookManager$$anon$2.run(ShutdownHookManager.scala:178) at org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54) 17/03/04 21:59:02 INFO MemoryStore: MemoryStore cleared 17/03/04 21:59:02 INFO BlockManager: BlockManager stopped 17/03/04 21:59:02 INFO BlockManagerMaster: BlockManagerMaster stopped 17/03/04 21:59:02 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped! 17/03/04 21:59:02 ERROR Utils: Uncaught exception in thread Thread-3 java.lang.NoClassDefFoundError: Could not initialize class java.nio.file.FileSystems$DefaultFileSystemHolder at java.nio.file.FileSystems.getDefault(FileSystems.java:176) at java.nio.file.Paths.get(Paths.java:138) at org.apache.spark.util.Utils$.isSymlink(Utils.scala:1021) at org.apache.spark.util.Utils$.deleteRecursively(Utils.scala:991) at org.apache.spark.SparkEnv.stop(SparkEnv.scala:102) at org.apache.spark.SparkContext$$anonfun$stop$11.apply$mcV$sp(SparkContext.scala:1842) at org.apache.spark.util.Utils$.tryLogNonFatalError(Utils.scala:1283) at org.apache.spark.SparkContext.stop(SparkContext.scala:1841) at org.apache.spark.SparkContext$$anonfun$2.apply$mcV$sp(SparkContext.scala:581) at org.apache.spark.util.SparkShutdownHook.run(ShutdownHookManager.scala:216) at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ShutdownHookManager.scala:188) at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:188) at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:188) at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1951) at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply$mcV$sp(ShutdownHookManager.scala:188) at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:188) at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:188) at scala.util.Try$.apply(Try.scala:192) at org.apache.spark.util.SparkShutdownHookManager.runAll(ShutdownHookManager.scala:188) at org.apache.spark.util.SparkShutdownHookManager$$anon$2.run(ShutdownHookManager.scala:178) at org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54) 17/03/04 21:59:02 WARN ShutdownHookManager: ShutdownHook '$anon$2' failed, java.lang.NoClassDefFoundError: Could not initialize class java.nio.file.FileSystems$DefaultFileSystemHolder java.lang.NoClassDefFoundError: Could not initialize class java.nio.file.FileSystems$DefaultFileSystemHolder at java.nio.file.FileSystems.getDefault(FileSystems.java:176) at java.nio.file.Paths.get(Paths.java:138) at org.apache.spark.util.Utils$.isSymlink(Utils.scala:1021) at org.apache.spark.util.Utils$.deleteRecursively(Utils.scala:991) at org.apache.spark.SparkEnv.stop(SparkEnv.scala:102) at org.apache.spark.SparkContext$$anonfun$stop$11.apply$mcV$sp(SparkContext.scala:1842) at org.apache.spark.util.Utils$.tryLogNonFatalError(Utils.scala:1283) at org.apache.spark.SparkContext.stop(SparkContext.scala:1841) at org.apache.spark.SparkContext$$anonfun$2.apply$mcV$sp(SparkContext.scala:581) at org.apache.spark.util.SparkShutdownHook.run(ShutdownHookManager.scala:216) at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ShutdownHookManager.scala:188) at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:188) at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:188) at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1951) at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply$mcV$sp(ShutdownHookManager.scala:188) at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:188) at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:188) at scala.util.Try$.apply(Try.scala:192) at org.apache.spark.util.SparkShutdownHookManager.runAll(ShutdownHookManager.scala:188) at org.apache.spark.util.SparkShutdownHookManager$$anon$2.run(ShutdownHookManager.scala:178) at org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)
The information in your screenshot is the most relevant. Your ApplicationMaster container ran out of memory. You need to increase yarn.app.mapreduce.am.resource.mb which is set in mapred-site.xml. I recommend a value of 2000 since that will usually accommodate running Spark and MapReduce applications at scale.
The container was killed (memory exceeds physical memory limits) so any attempt to reach this container fails. Yarn is fine to have an overall view of the process, but you should prefer spark history server to analyse better your job (check unbalanced memory in spark history).
unable to start DSE server
I'm using ubuntu 16.04 LTS. I've installed datastax with this procedure. The very first day it worked perfectly. But now it's not working. Here's the screenshot. My debug.log: ERROR [main] 2017-01-30 13:54:08,951 DseDaemon.java:490 - Unable to start DSE server. java.lang.RuntimeException: Unable to create thrift socket to localhost/127.0.0.1:9042 at org.apache.cassandra.thrift.CustomTThreadPoolServer$Factory.buildTServer(CustomTThreadPoolServer.java:269) ~[cassandra-all-3.0.11.1485.jar:5.0.5] at org.apache.cassandra.thrift.TServerCustomFactory.buildTServer(TServerCustomFactory.java:46) ~[cassandra-all-3.0.11.1485.jar:5.0.5] at org.apache.cassandra.thrift.ThriftServer$ThriftServerThread.<init>(ThriftServer.java:131) ~[cassandra-all-3.0.11.1485.jar:5.0.5] at org.apache.cassandra.thrift.ThriftServer.start(ThriftServer.java:58) ~[cassandra-all-3.0.11.1485.jar:5.0.5] at com.datastax.bdp.server.DseThriftServer.start(DseThriftServer.java:46) ~[dse-core-5.0.5.jar:5.0.5] at org.apache.cassandra.service.CassandraDaemon.start(CassandraDaemon.java:486) [cassandra-all-3.0.11.1485.jar:3.0.11.1485] at com.datastax.bdp.server.DseDaemon.start(DseDaemon.java:485) ~[dse-core-5.0.5.jar:5.0.5] at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:583) [cassandra-all-3.0.11.1485.jar:3.0.11.1485] at com.datastax.bdp.DseModule.main(DseModule.java:91) [dse-core-5.0.5.jar:5.0.5] Caused by: org.apache.thrift.transport.TTransportException: Could not create ServerSocket on address localhost/127.0.0.1:9042. at org.apache.cassandra.thrift.TCustomServerSocket.<init>(TCustomServerSocket.java:75) ~[cassandra-all-3.0.11.1485.jar:5.0.5] at org.apache.cassandra.thrift.CustomTThreadPoolServer$Factory.buildTServer(CustomTThreadPoolServer.java:264) ~[cassandra-all-3.0.11.1485.jar:5.0.5] ... 8 common frames omitted ERROR [main] 2017-01-30 13:54:08,952 CassandraDaemon.java:709 - Exception encountered during startup java.lang.RuntimeException: java.lang.RuntimeException: Unable to create thrift socket to localhost/127.0.0.1:9042 at com.datastax.bdp.server.DseDaemon.start(DseDaemon.java:493) ~[dse-core-5.0.5.jar:5.0.5] at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:583) ~[cassandra-all-3.0.11.1485.jar:3.0.11.1485] at com.datastax.bdp.DseModule.main(DseModule.java:91) [dse-core-5.0.5.jar:5.0.5] Caused by: java.lang.RuntimeException: Unable to create thrift socket to localhost/127.0.0.1:9042 at org.apache.cassandra.thrift.CustomTThreadPoolServer$Factory.buildTServer(CustomTThreadPoolServer.java:269) ~[cassandra-all-3.0.11.1485.jar:5.0.5] at org.apache.cassandra.thrift.TServerCustomFactory.buildTServer(TServerCustomFactory.java:46) ~[cassandra-all-3.0.11.1485.jar:5.0.5] at org.apache.cassandra.thrift.ThriftServer$ThriftServerThread.<init>(ThriftServer.java:131) ~[cassandra-all-3.0.11.1485.jar:5.0.5] at org.apache.cassandra.thrift.ThriftServer.start(ThriftServer.java:58) ~[cassandra-all-3.0.11.1485.jar:5.0.5] at com.datastax.bdp.server.DseThriftServer.start(DseThriftServer.java:46) ~[dse-core-5.0.5.jar:5.0.5] at org.apache.cassandra.service.CassandraDaemon.start(CassandraDaemon.java:486) ~[cassandra-all-3.0.11.1485.jar:3.0.11.1485] at com.datastax.bdp.server.DseDaemon.start(DseDaemon.java:485) ~[dse-core-5.0.5.jar:5.0.5] ... 2 common frames omitted Caused by: org.apache.thrift.transport.TTransportException: Could not create ServerSocket on address localhost/127.0.0.1:9042. at org.apache.cassandra.thrift.TCustomServerSocket.<init>(TCustomServerSocket.java:75) ~[cassandra-all-3.0.11.1485.jar:5.0.5] at org.apache.cassandra.thrift.CustomTThreadPoolServer$Factory.buildTServer(CustomTThreadPoolServer.java:264) ~[cassandra-all-3.0.11.1485.jar:5.0.5] ... 8 common frames omitted INFO [Daemon shutdown] 2017-01-30 13:54:08,953 DseDaemon.java:577 - DSE shutting down... DEBUG [StorageServiceShutdownHook] 2017-01-30 13:54:08,954 StorageService.java:1210 - DRAINING: starting drain process INFO [StorageServiceShutdownHook] 2017-01-30 13:54:08,955 HintsService.java:212 - Paused hints dispatch INFO [Daemon shutdown] 2017-01-30 13:54:08,957 PluginManager.java:347 - Deactivating plugin: com.datastax.bdp.plugin.health.NodeHealthPlugin INFO [Daemon shutdown] 2017-01-30 13:54:08,958 PluginManager.java:347 - Deactivating plugin: com.datastax.bdp.plugin.DseClientToolPlugin INFO [Daemon shutdown] 2017-01-30 13:54:08,959 PluginManager.java:347 - Deactivating plugin: com.datastax.bdp.plugin.InternalQueryRouterPlugin INFO [Daemon shutdown] 2017-01-30 13:54:08,959 PluginManager.java:347 - Deactivating plugin: com.datastax.bdp.leasemanager.LeasePlugin INFO [Daemon shutdown] 2017-01-30 13:54:08,962 PluginManager.java:347 - Deactivating plugin: com.datastax.bdp.cassandra.cql3.CqlSlowLogPlugin INFO [Daemon shutdown] 2017-01-30 13:54:08,966 PluginManager.java:347 - Deactivating plugin: com.datastax.bdp.cassandra.metrics.PerformanceObjectsPlugin INFO [Daemon shutdown] 2017-01-30 13:54:08,966 PluginManager.java:347 - Deactivating plugin: com.datastax.bdp.plugin.ThreadPoolPlugin
Connection refused while using sstableLoader to stream data to cassandra
I am trying to stream some sstables to Cassandra cluster using SStableLoader utility. I am getting a streaming error. Here is the stack. Established connection to initial hosts Opening sstables and calculating sections to stream 18:05:04.058 [main] DEBUG o.a.c.i.s.m.MetadataSerializer - Load metadata for /path/new/xyz/search/xyz-search-ka-1 18:05:04.073 [main] INFO o.a.c.io.sstable.SSTableReader - Opening /path/new/xyz/new/xyz_search/search/xyz_search-search-ka-1 (330768 bytes) Streaming relevant part of /path/new/xyz/xyz_search/search/xyz_search-search-ka-1-Data.db to [/10.XXX.XXX.XXX, /10.XXX.XXX.XXX, /10.XXX.XXX.XXX, /10.XXX.XXX.XXX, /10.XXX.XXX.XXX] 18:05:04.411 [main] INFO o.a.c.streaming.StreamResultFuture - [Stream #ed3a0cd0-fd25-11e5-8509-63e9961cf787] Executing streaming plan for Bulk Load Streaming relevant part of /path/xyz-search-ka-1-Data.db to [/10.XXX.XXX.XXX, /10.XXX.XXX.XXX, /10.XXX.XXX.XXX, /10.XXX.XXX.XXX, /10.XXX.XXX.XXX] 17:22:44.175 [main] INFO o.a.c.streaming.StreamResultFuture - [Stream #0327a9e0-fd20-11e5-b350-63e9961cf787] Executing streaming plan for Bulk Load 17:22:44.177 [StreamConnectionEstablisher:1] INFO o.a.c.streaming.StreamSession - [Stream #0327a9e0-fd20-11e5-b350-63e9961cf787] Starting streaming to /10.XX.XX.XX 17:22:44.177 [StreamConnectionEstablisher:1] DEBUG o.a.c.streaming.ConnectionHandler - [Stream #0327a9e0-fd20-11e5-b350-63e9961cf787] Sending stream init for incoming stream 17:22:44.183 [StreamConnectionEstablisher:2] INFO o.a.c.streaming.StreamSession - [Stream #0327a9e0-fd20-11e5-b350-63e9961cf787] Starting streaming to /10.XX.XX.XX 17:22:44.183 [StreamConnectionEstablisher:2] DEBUG o.a.c.streaming.ConnectionHandler - [Stream #0327a9e0-fd20-11e5-b350-63e9961cf787] Sending stream init for incoming stream 17:23:47.191 [StreamConnectionEstablisher:2] ERROR o.a.c.streaming.StreamSession - [Stream #0327a9e0-fd20-11e5-b350-63e9961cf787] Streaming error occurred java.net.ConnectException: Connection timed out at sun.nio.ch.Net.connect0(Native Method) ~[na:1.8.0_45] at sun.nio.ch.Net.connect(Net.java:458) ~[na:1.8.0_45] at sun.nio.ch.Net.connect(Net.java:450) ~[na:1.8.0_45] at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:648) ~[na:1.8.0_45] at java.nio.channels.SocketChannel.open(SocketChannel.java:189) ~[na:1.8.0_45] at org.apache.cassandra.tools.BulkLoadConnectionFactory.createConnection(BulkLoadConnectionFactory.java:62) ~[cassandra-all-2.1.6.jar:2.1.6] at org.apache.cassandra.streaming.StreamSession.createConnection(StreamSession.java:236) ~[cassandra-all-2.1.6.jar:2.1.6] at org.apache.cassandra.streaming.ConnectionHandler.initiate(ConnectionHandler.java:79) ~[cassandra-all-2.1.6.jar:2.1.6] at org.apache.cassandra.streaming.StreamSession.start(StreamSession.java:223) ~[cassandra-all-2.1.6.jar:2.1.6] at org.apache.cassandra.streaming.StreamCoordinator$StreamSessionConnector.run(StreamCoordinator.java:208) [cassandra-all-2.1.6.jar:2.1.6] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_45] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_45] at java.lang.Thread.run(Thread.java:745) [na:1.8.0_45] 17:23:47.202 [StreamConnectionEstablisher:2] DEBUG o.a.c.streaming.ConnectionHandler - [Stream #0327a9e0-fd20-11e5-b350-63e9961cf787] Closing stream connection handler on /10.XXX.XXX.XXX 17:23:47.205 [StreamConnectionEstablisher:1] ERROR o.a.c.streaming.StreamSession - [Stream #0327a9e0-fd20-11e5-b350-63e9961cf787] Streaming error occurred java.net.ConnectException: Connection timed out at sun.nio.ch.Net.connect0(Native Method) ~[na:1.8.0_45] at sun.nio.ch.Net.connect(Net.java:458) ~[na:1.8.0_45] at sun.nio.ch.Net.connect(Net.java:450) ~[na:1.8.0_45] at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:648) ~[na:1.8.0_45] at java.nio.channels.SocketChannel.open(SocketChannel.java:189) ~[na:1.8.0_45] at org.apache.cassandra.tools.BulkLoadConnectionFactory.createConnection(BulkLoadConnectionFactory.java:62) ~[cassandra-all-2.1.6.jar:2.1.6] at org.apache.cassandra.streaming.StreamSession.createConnection(StreamSession.java:236) ~[cassandra-all-2.1.6.jar:2.1.6] at org.apache.cassandra.streaming.ConnectionHandler.initiate(ConnectionHandler.java:79) ~[cassandra-all-2.1.6.jar:2.1.6] at org.apache.cassandra.streaming.StreamSession.start(StreamSession.java:223) ~[cassandra-all-2.1.6.jar:2.1.6] at org.apache.cassandra.streaming.StreamCoordinator$StreamSessionConnector.run(StreamCoordinator.java:208) [cassandra-all-2.1.6.jar:2.1.6] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_45] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_45] at java.lang.Thread.run(Thread.java:745) [na:1.8.0_45] Also the machine where I am running the sstableloader is out of the cassandra cluster. Thanks
After debugging a little more, found that sstableloader also uses port 7000 while streaming sstables to cassandra cluster. My local machine did not have access to port 7000 on the machines on cassandra cluster. That's why i was getting connection time out exception. Anyone who encounters this make sure that your machine from where you are running the sstableloader has access to port 9160,7000 and 9042 of all the casssandra nodes you are trying to stream to.
DEBUG o.a.c.streaming.ConnectionHandler - [Stream #0327a9e0-fd20-11e5-b350-63e9961cf787] Closing stream connection handler on /10.XXX.XXX.XXX Hint: I suspect the machine 10.xxx.xxx.xxx is under heavy load. Worth checking the /var/log/cassandra/system.log file on this machine to narrow down the root cause