I have a test setup in which sstableloader fails to upload data if one of the Cassandra node is down. I can see that
Is there a way to instruct sstableloader not to connect (or open a stream) to the dead node (I don't want to decommission/remove the node from cluster)?
Cassandra cluster info: Datastax community version 2.1.2, 3 node cluster out of which 2 are seed nodes.
During testing bulk upload, one of the seed node was down. The keyspace has replication factor = 2.
Exception encountered:
progress: total: 100% 0 MB/s(avg: 0 MB/s)ERROR 09:07:48 [Stream #8972f510-efe1-11e4-abad-9d409520f182] Streaming error occurred
java.net.ConnectException: Connection refused
at sun.nio.ch.Net.connect0(Native Method) ~[na:1.7.0_65]
at sun.nio.ch.Net.connect(Net.java:465) ~[na:1.7.0_65]
at sun.nio.ch.Net.connect(Net.java:457) ~[na:1.7.0_65]
at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:670) ~[na:1.7.0_65]
at java.nio.channels.SocketChannel.open(SocketChannel.java:184) ~[na:1.7.0_65]
at org.apache.cassandra.tools.BulkLoadConnectionFactory.createConnection(BulkLoadConnectionFactory.java:62) ~[apache-cassandra-2.1.2.jar:2.1.2]
at org.apache.cassandra.streaming.StreamSession.createConnection(StreamSession.java:229) ~[apache-cassandra-2.1.2.jar:2.1.2]
at org.apache.cassandra.streaming.ConnectionHandler.initiate(ConnectionHandler.java:79) ~[apache-cassandra-2.1.2.jar:2.1.2]
at org.apache.cassandra.streaming.StreamSession.start(StreamSession.java:216) ~[apache-cassandra-2.1.2.jar:2.1.2]
at org.apache.cassandra.streaming.StreamCoordinator$StreamSessionConnector.run(StreamCoordinator.java:208) [apache-cassandra-2.1.2.jar:2.1.2]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [na:1.7.0_65]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_65]
at java.lang.Thread.run(Thread.java:745) [na:1.7.0_65]
progress: [/192.168.1.17]0:1/1 100% total: 100% 0 MB/s(avg: 1 MB/s)WARN 09:07:48 [Stream #8972f510-efe1-11e4-abad-9d409520f182] Stream failed
Streaming to the following hosts failed:
[/192.168.1.15]
java.util.concurrent.ExecutionException: org.apache.cassandra.streaming.StreamException: Stream failed
at com.google.common.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:299)
at com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:286)
at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116)
at org.apache.cassandra.tools.BulkLoader.main(BulkLoader.java:121)
Caused by: org.apache.cassandra.streaming.StreamException: Stream failed
at org.apache.cassandra.streaming.StreamResultFuture.maybeComplete(StreamResultFuture.java:208)
at org.apache.cassandra.streaming.StreamResultFuture.handleSessionComplete(StreamResultFuture.java:184)
at org.apache.cassandra.streaming.StreamSession.closeSession(StreamSession.java:382)
at org.apache.cassandra.streaming.StreamSession.complete(StreamSession.java:574)
at org.apache.cassandra.streaming.StreamSession.messageReceived(StreamSession.java:438)
at org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:251)
at java.lang.Thread.run(Thread.java:745)
Thanks in advance,
Anirban.
I just figured that I can pass an ignore list to sstableloader. By passing dead nodes in the ignore list sstableloader ran successfully in my test setup.
Related
I have upgraded DSE cluster with 2 nodes from 5.0.7 to 6.7.3. After upgrade with nodetool status shows both nodes are "UP NORMAL" with apprx 75 GB load on each and cluster works for applications with read write. but getting error during
Nodetool repair -pr some repair failed
Upgrade sstable makes node down.
and observing exception every 10 seconds in system.log file
WARN [OptionalTasks:1] 2019-07-18 08:20:14,495 CassandraRoleManager.java:386 - CassandraRoleManager skipped default role setup: some nodes were not ready
INFO [OptionalTasks:1] 2019-07-18 08:20:14,495 CassandraRoleManager.java:432 - Setup task failed with error, rescheduling
org.apache.cassandra.exceptions.UnavailableException: Cannot achieve consistency level ONE
at org.apache.cassandra.db.ConsistencyLevel.assureSufficientLiveNodes(ConsistencyLevel.java:392)
at org.apache.cassandra.service.AbstractReadExecutor.getReadExecutor(AbstractReadExecutor.java:214)
at org.apache.cassandra.service.AbstractReadExecutor.getReadExecutor(AbstractReadExecutor.java:190)
at org.apache.cassandra.service.StorageProxy$SinglePartitionReadLifecycle.<init>(StorageProxy.java:1541)
at org.apache.cassandra.service.StorageProxy.fetchRows(StorageProxy.java:1524)
at org.apache.cassandra.service.StorageProxy.readRegular(StorageProxy.java:1447)
at org.apache.cassandra.service.StorageProxy.read(StorageProxy.java:1325)
at org.apache.cassandra.db.SinglePartitionReadCommand$Group.execute(SinglePartitionReadCommand.java:1274)
at org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:366)
at org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:574)
at org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:307)
at org.apache.cassandra.cql3.QueryProcessor.lambda$processStatement$4(QueryProcessor.java:256)
at io.reactivex.internal.operators.single.SingleDefer.subscribeActual(SingleDefer.java:36)
at io.reactivex.Single.subscribe(Single.java:2700)
at io.reactivex.internal.operators.single.SingleMap.subscribeActual(SingleMap.java:34)
at io.reactivex.Single.subscribe(Single.java:2700)
at io.reactivex.Single.blockingGet(Single.java:2153)
at org.apache.cassandra.concurrent.TPCUtils.blockingGet(TPCUtils.java:75)
at org.apache.cassandra.cql3.QueryProcessor.processBlocking(QueryProcessor.java:352)
at org.apache.cassandra.auth.CassandraRoleManager.hasExistingRoles(CassandraRoleManager.java:396)
at org.apache.cassandra.auth.CassandraRoleManager.setupDefaultRole(CassandraRoleManager.java:370)
at org.apache.cassandra.auth.CassandraRoleManager.doSetupDefaultRole(CassandraRoleManager.java:428)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:79)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.lang.Thread.run(Thread.java:748)
I am trying to test Zeppelin 0.6.2 with a Spark 2.0.1 installed in a Windows Server 2012.
I started the Spark master and tested the Spark Shell.
Then I configured the following in the conf\zeppeling-env.cmd file:
set SPARK_HOME=C:\spark-2.0.1-bin-hadoop2.7
set MASTER=spark://100.79.240.26:7077
I have not set the HADOOP_CONF_DIR and SPARK_SUBMIT_OPTIONS (that is optional according to the documentation)
I checked the values in the Interpreter configuration page and the spark master is Ok.
When I run the Zeppelin tutorial --> "Load data into table" note I am getting a connection refused error. Here is part of the messages in the error log:
INFO [2016-11-17 21:58:12,518] ({pool-1-thread-11} Paragraph.java[jobRun]:252) - run paragraph 20150210-015259_1403135953 using null org.apache.zeppelin.interpreter.LazyOpenInterpreter#8bbfd7
INFO [2016-11-17 21:58:12,518] ({pool-1-thread-11} RemoteInterpreterProcess.java[reference]:148) - Run interpreter process [C:\zeppelin-0.6.2-bin-all\bin\interpreter.cmd, -d, C:\zeppelin-0.6.2-bin-all\interpreter\spark, -p, 50163, -l, C:\zeppelin-0.6.2-bin-all/local-repo/2C3FBS414]
INFO [2016-11-17 21:58:12,614] ({Exec Default Executor} RemoteInterpreterProcess.java[onProcessFailed]:288) - Interpreter process failed {}
org.apache.commons.exec.ExecuteException: Process exited with an error: 255 (Exit value: 255)
at org.apache.commons.exec.DefaultExecutor.executeInternal(DefaultExecutor.java:404)
at org.apache.commons.exec.DefaultExecutor.access$200(DefaultExecutor.java:48)
at org.apache.commons.exec.DefaultExecutor$1.run(DefaultExecutor.java:200)
at java.lang.Thread.run(Thread.java:745)
ERROR [2016-11-17 21:58:43,846] ({Thread-49} RemoteScheduler.java[getStatus]:255) - Can't get status information
org.apache.zeppelin.interpreter.InterpreterException: org.apache.thrift.transport.TTransportException: java.net.ConnectException: Connection refused: connect
at org.apache.zeppelin.interpreter.remote.ClientFactory.create(ClientFactory.java:53)
at org.apache.zeppelin.interpreter.remote.ClientFactory.create(ClientFactory.java:37)
at org.apache.commons.pool2.BasePooledObjectFactory.makeObject(BasePooledObjectFactory.java:60)
at org.apache.commons.pool2.impl.GenericObjectPool.create(GenericObjectPool.java:861)
at org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:435)
at org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:363)
at org.apache.zeppelin.interpreter.remote.RemoteInterpreterProcess.getClient(RemoteInterpreterProcess.java:189)
at org.apache.zeppelin.scheduler.RemoteScheduler$JobStatusPoller.getStatus(RemoteScheduler.java:253)
at org.apache.zeppelin.scheduler.RemoteScheduler$JobStatusPoller.run(RemoteScheduler.java:211)
Caused by: org.apache.thrift.transport.TTransportException: java.net.ConnectException: Connection refused: connect
at org.apache.thrift.transport.TSocket.open(TSocket.java:187)
at org.apache.zeppelin.interpreter.remote.ClientFactory.create(ClientFactory.java:51)
... 8 more
Caused by: java.net.ConnectException: Connection refused: connect
at java.net.DualStackPlainSocketImpl.connect0(Native Method)
at java.net.DualStackPlainSocketImpl.socketConnect(DualStackPlainSocketImpl.java:79)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:172)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:579)
at org.apache.thrift.transport.TSocket.open(TSocket.java:182)
... 9 more
ERROR [2016-11-17 21:58:43,846] ({pool-1-thread-11} Job.java[run]:189) - Job failed
org.apache.zeppelin.interpreter.InterpreterException: org.apache.zeppelin.interpreter.InterpreterException: org.apache.thrift.transport.TTransportException: java.net.ConnectException: Connection refused: connect
at org.apache.zeppelin.interpreter.remote.RemoteInterpreter.init(RemoteInterpreter.java:165)
at org.apache.zeppelin.interpreter.remote.RemoteInterpreter.getFormType(RemoteInterpreter.java:328)
at org.apache.zeppelin.interpreter.LazyOpenInterpreter.getFormType(LazyOpenInterpreter.java:105)
at org.apache.zeppelin.notebook.Paragraph.jobRun(Paragraph.java:260)
at org.apache.zeppelin.scheduler.Job.run(Job.java:176)
at org.apache.zeppelin.scheduler.RemoteScheduler$JobRunner.run(RemoteScheduler.java:328)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
In the Zeppelin logs there is only one file for zeppelin, the interpreter is an external Spark installation, which is not logging any error because it is never reached by the interpreter process.
I read some suggestion about the max and min memory of the JVM but I could not fix it yet.
Any comment will be appreciated.
Paul
I am trying to stream some sstables to Cassandra cluster using SStableLoader utility. I am getting a streaming error. Here is the stack.
Established connection to initial hosts
Opening sstables and calculating sections to stream
18:05:04.058 [main] DEBUG o.a.c.i.s.m.MetadataSerializer - Load metadata for /path/new/xyz/search/xyz-search-ka-1
18:05:04.073 [main] INFO o.a.c.io.sstable.SSTableReader - Opening /path/new/xyz/new/xyz_search/search/xyz_search-search-ka-1 (330768 bytes)
Streaming relevant part of /path/new/xyz/xyz_search/search/xyz_search-search-ka-1-Data.db to [/10.XXX.XXX.XXX, /10.XXX.XXX.XXX, /10.XXX.XXX.XXX, /10.XXX.XXX.XXX, /10.XXX.XXX.XXX]
18:05:04.411 [main] INFO o.a.c.streaming.StreamResultFuture - [Stream #ed3a0cd0-fd25-11e5-8509-63e9961cf787] Executing streaming plan for Bulk Load
Streaming relevant part of /path/xyz-search-ka-1-Data.db to [/10.XXX.XXX.XXX, /10.XXX.XXX.XXX, /10.XXX.XXX.XXX, /10.XXX.XXX.XXX, /10.XXX.XXX.XXX]
17:22:44.175 [main] INFO o.a.c.streaming.StreamResultFuture - [Stream #0327a9e0-fd20-11e5-b350-63e9961cf787] Executing streaming plan for Bulk Load
17:22:44.177 [StreamConnectionEstablisher:1] INFO o.a.c.streaming.StreamSession - [Stream #0327a9e0-fd20-11e5-b350-63e9961cf787] Starting streaming to /10.XX.XX.XX
17:22:44.177 [StreamConnectionEstablisher:1] DEBUG o.a.c.streaming.ConnectionHandler - [Stream #0327a9e0-fd20-11e5-b350-63e9961cf787] Sending stream init for incoming stream
17:22:44.183 [StreamConnectionEstablisher:2] INFO o.a.c.streaming.StreamSession - [Stream #0327a9e0-fd20-11e5-b350-63e9961cf787] Starting streaming to /10.XX.XX.XX
17:22:44.183 [StreamConnectionEstablisher:2] DEBUG o.a.c.streaming.ConnectionHandler - [Stream #0327a9e0-fd20-11e5-b350-63e9961cf787] Sending stream init for incoming stream
17:23:47.191 [StreamConnectionEstablisher:2] ERROR o.a.c.streaming.StreamSession - [Stream #0327a9e0-fd20-11e5-b350-63e9961cf787] Streaming error occurred
java.net.ConnectException: Connection timed out
at sun.nio.ch.Net.connect0(Native Method) ~[na:1.8.0_45]
at sun.nio.ch.Net.connect(Net.java:458) ~[na:1.8.0_45]
at sun.nio.ch.Net.connect(Net.java:450) ~[na:1.8.0_45]
at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:648) ~[na:1.8.0_45]
at java.nio.channels.SocketChannel.open(SocketChannel.java:189) ~[na:1.8.0_45]
at org.apache.cassandra.tools.BulkLoadConnectionFactory.createConnection(BulkLoadConnectionFactory.java:62) ~[cassandra-all-2.1.6.jar:2.1.6]
at org.apache.cassandra.streaming.StreamSession.createConnection(StreamSession.java:236) ~[cassandra-all-2.1.6.jar:2.1.6]
at org.apache.cassandra.streaming.ConnectionHandler.initiate(ConnectionHandler.java:79) ~[cassandra-all-2.1.6.jar:2.1.6]
at org.apache.cassandra.streaming.StreamSession.start(StreamSession.java:223) ~[cassandra-all-2.1.6.jar:2.1.6]
at org.apache.cassandra.streaming.StreamCoordinator$StreamSessionConnector.run(StreamCoordinator.java:208) [cassandra-all-2.1.6.jar:2.1.6]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_45]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_45]
at java.lang.Thread.run(Thread.java:745) [na:1.8.0_45]
17:23:47.202 [StreamConnectionEstablisher:2] DEBUG o.a.c.streaming.ConnectionHandler - [Stream #0327a9e0-fd20-11e5-b350-63e9961cf787] Closing stream connection handler on /10.XXX.XXX.XXX
17:23:47.205 [StreamConnectionEstablisher:1] ERROR o.a.c.streaming.StreamSession - [Stream #0327a9e0-fd20-11e5-b350-63e9961cf787] Streaming error occurred
java.net.ConnectException: Connection timed out
at sun.nio.ch.Net.connect0(Native Method) ~[na:1.8.0_45]
at sun.nio.ch.Net.connect(Net.java:458) ~[na:1.8.0_45]
at sun.nio.ch.Net.connect(Net.java:450) ~[na:1.8.0_45]
at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:648) ~[na:1.8.0_45]
at java.nio.channels.SocketChannel.open(SocketChannel.java:189) ~[na:1.8.0_45]
at org.apache.cassandra.tools.BulkLoadConnectionFactory.createConnection(BulkLoadConnectionFactory.java:62) ~[cassandra-all-2.1.6.jar:2.1.6]
at org.apache.cassandra.streaming.StreamSession.createConnection(StreamSession.java:236) ~[cassandra-all-2.1.6.jar:2.1.6]
at org.apache.cassandra.streaming.ConnectionHandler.initiate(ConnectionHandler.java:79) ~[cassandra-all-2.1.6.jar:2.1.6]
at org.apache.cassandra.streaming.StreamSession.start(StreamSession.java:223) ~[cassandra-all-2.1.6.jar:2.1.6]
at org.apache.cassandra.streaming.StreamCoordinator$StreamSessionConnector.run(StreamCoordinator.java:208) [cassandra-all-2.1.6.jar:2.1.6]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_45]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_45]
at java.lang.Thread.run(Thread.java:745) [na:1.8.0_45]
Also the machine where I am running the sstableloader is out of the cassandra cluster.
Thanks
After debugging a little more, found that sstableloader also uses port 7000 while streaming sstables to cassandra cluster. My local machine did not have access to port 7000 on the machines on cassandra cluster. That's why i was getting connection time out exception.
Anyone who encounters this make sure that your machine from where you are running the sstableloader has access to port 9160,7000 and 9042 of all the casssandra nodes you are trying to stream to.
DEBUG o.a.c.streaming.ConnectionHandler - [Stream #0327a9e0-fd20-11e5-b350-63e9961cf787] Closing stream connection handler on /10.XXX.XXX.XXX
Hint: I suspect the machine 10.xxx.xxx.xxx is under heavy load. Worth checking the /var/log/cassandra/system.log file on this machine to narrow down the root cause
I am running a spark streaming application with spark version 1.4.0
If I kill the worker (using kill -9) when my job is running, then the worker and executor both on that node dies,but it still shows up in the executors tab of the UI. The number of active tasks sometimes shows as negative on those executors.
Because of this the jobs keep on failing with the following exception
16/04/01 23:54:20 WARN TaskSetManager: Lost task 141.0 in stage 19859.0 (TID 190333, 192.168.33.96): java.io.IOException: Failed to connect to /192.168.33.97:63276
at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:193)
at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:156)
at org.apache.spark.network.netty.NettyBlockTransferService$$anon$1.createAndStart(NettyBlockTransferService.scala:88)
at org.apache.spark.network.shuffle.RetryingBlockFetcher.fetchAllOutstanding(RetryingBlockFetcher.java:140)
at org.apache.spark.network.shuffle.RetryingBlockFetcher.access$200(RetryingBlockFetcher.java:43)
at org.apache.spark.network.shuffle.RetryingBlockFetcher$1.run(RetryingBlockFetcher.java:170)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.net.ConnectException: Connection refused: /192.168.33.97:63276
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:716)
at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:208)
at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:287)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:528)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116)
... 1 more
On relaunching the worker a new executor is allocated but the old (dead) executor's entry is still there and the stages fail with "java.io.IOException: Failed to connect to " error.
We have been running Cassandra 1.0.2 in production for many months, with minimal trouble. Lately, we have started to see consistent failures in all nodes, with this error:
ERROR [CompactionExecutor:199] 2013-03-02 00:21:26,179 AbstractCassandraDaemon.java (line 133) Fatal exception in thread Thread[CompactionExecutor:199,1,RMI Runtime]
java.io.IOError: java.io.IOException: Map failed
at org.apache.cassandra.io.util.MmappedSegmentedFile$Builder.createSegments(MmappedSegmentedFile.java:225)
at org.apache.cassandra.io.util.MmappedSegmentedFile$Builder.complete(MmappedSegmentedFile.java:202)
at org.apache.cassandra.io.sstable.SSTableWriter.closeAndOpenReader(SSTableWriter.java:308)
at org.apache.cassandra.db.compaction.CompactionTask.execute(CompactionTask.java:174)
at org.apache.cassandra.db.compaction.CompactionManager$4.call(CompactionManager.java:275)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)
Caused by: java.io.IOException: Map failed
at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:758)
at org.apache.cassandra.io.util.MmappedSegmentedFile$Builder.createSegments(MmappedSegmentedFile.java:217)
... 9 more
Caused by: java.lang.OutOfMemoryError: Map failed
at sun.nio.ch.FileChannelImpl.map0(Native Method)
at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:755)
... 10 more
INFO [Thread-2] 2013-03-02 00:21:26,222 MessagingService.java (line 488) Shutting down MessageService...
The node dies after this. We remove the oldest data files and start the node again. The node runs for a while, then dies again. This happens for all eight nodes in our ring.
We are using the default compaction strategy (size tiered) and we think it is the best choice for our problem domain.
Q: Why is this happening and what should we do to fix it?