Cassandra version 3.9 (https://github.com/docker-library/cassandra/blob/4bb926527d4a9eb534508fe0bbae604dee81f40a/3.9/Dockerfile)
It happened when I added 2 node to cluster, and this error occurs only in this 2 node with periodicity every 2 minutes. I done repair for all cluster, but it didn't help. It's occur while Cassandra is running.
See this error on 2 of 3 nodes in a cluster.
ERROR 07:13:44 Failed to apply mutation locally : {}
java.nio.BufferOverflowException: null
at org.apache.cassandra.io.util.DataOutputBufferFixed.doFlush(DataOutputBufferFixed.java:52) ~[apache-cassandra-3.9.jar:3.9]
at org.apache.cassandra.io.util.BufferedDataOutputStreamPlus.write(BufferedDataOutputStreamPlus.java:132) ~[apache-cassandra-3.9.jar:3.9]
...
at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:109) [apache-cassandra-3.9.jar:3.9]
at java.lang.Thread.run(Thread.java:745) [na:1.8.0_111]
Even I came across the same error. The answer lies here
Mutation of bytes is too large for the maxiumum size of
It was not present in datastax help center when the question was posted, but added a year later. Hope it helps
Related
After Reaper failed to run repair on 18 nodes of Cassandra cluster, I ran a full repair of each node to fix the failed repair issue, after the full repair, Reaper executed successfully, but after a few days again the Reaper failed to run, I can see the following error in system.log
ERROR [RMI TCP Connection(33673)-10.196.83.241] 2021-09-01 09:01:18,005 RepairRunnable.java:276 - Repair session 81540931-0b20-11ec-a7fa-8d6977dd3c87 for range [(-606604147644314041,-98440495518284645], (-3131564913406859309,-3010160047914391044]] failed with error Terminate session is called
java.io.IOException: Terminate session is called
at org.apache.cassandra.service.ActiveRepairService.terminateSessions(ActiveRepairService.java:191) ~[apache-cassandra-3.11.0.jar:3.11.0]
INFO [Native-Transport-Requests-2] 2021-09-01 09:02:52,020 Message.java:619 - Unexpected exception during request; channel = [id: 0x1e99a957, L:/10.196.18.230:9042 ! R:/10.254.252.33:62100]
io.netty.channel.unix.Errors$NativeIoException: readAddress() failed: Connection timed out
in nodetool tpstats I can see some pending tasks
Pool Name Active Pending
ReadStage 0 0
Repair#18 3 90
ValidationExecutor 3 3
Also in nodetool compactionstats there are 4 pending tasks:
-bash-4.2$ nodetool compactionstats
pending tasks: 4
- Main.visit: 1
- Main.post: 1
- Main.stream: 2
My question is why even after a full repair, reaper is still failing? and what is the root cause of pending repair?
PS: version of Reaper is 2.2.3, not sure if it is a bug in Reaper!
You most likely don't have enough segments in your Reaper repair definition, or the default timeout (30 mins) is too low for your repair.
Segments (and the associated repair session) get terminated when they reach the timeout, in order to avoid stuck repairs. When tuned inappropriately, this can give the behavior you're observing.
Nodetool doesn't set a timeout on repairs, which explains why it passes there. The good news is that nothing will prevent repair from passing with Reaper once tuned correctly.
We're currently working on adaptive repairs to have Reaper deal with this situation automatically, but in the meantime you'll need to deal with this manually.
Check the list of segments in the UI and apply the following rule:
If you have less than 20% of segments failing, double the timeout by adjusting the hangingRepairTimeoutMins value in the config yaml.
If you have more than 20% of segments failing, double the number of segments.
Once repair passes at least twice, check the maximum duration of segments and further tune the number of segments to have them last at most 15 mins.
Assuming you're not running Cassandra 4.0 yet, now that you ran repair through nodetool, you have sstables which are marked as repaired like incremental repair would. This will create a problem as Reaper's repairs don't mark sstables as repaired and you now have two different sstables pools (repaired and unrepaired), which cannot be compacted together.
You'll need to use the sstablerepairedset tool to mark all sstables as unrepaired to put all sstables back in the same pool. Please read the documentation to learn how to achieve this.
There could be a number of things taking place such as Reaper can't connect to the nodes via JMX (for whatever reason). It isn't possible to diagnose the problem with the limited information you've provided.
You'll need to check the Reaper logs for clues on the root cause.
As a side note, this isn't related to repairs and is a client/driver/app connecting to the node on the CQL port:
INFO [Native-Transport-Requests-2] 2021-09-01 09:02:52,020 Message.java:619 - Unexpected exception during request; channel = [id: 0x1e99a957, L:/10.196.18.230:9042 ! R:/10.254.252.33:62100]
io.netty.channel.unix.Errors$NativeIoException: readAddress() failed: Connection timed out
Cheers!
we are using Cassandra 1.2.9 + BAM 2.5 for API analysis.
We have scheduled a job to do cassandra data purge. This data purge job is divived into three steps.
The 1st step is to query the original column family and then insert them into the temporary columnFamily_purge.
The 2nd step is to delete from the orinal column family by adding tombstone,and insert the data from columnFamily_purge into the original column family.
The 3rd step is to drop the temporary columnFamily_purge
The 1st works well, but the 2nd step frequently crashes the cassandra servers during Hadoop map tasks,which makes Cassandra unavailable.The exception stacktrack is as follows:
2016-08-23 10:27:43,718 INFO org.apache.hadoop.io.nativeio.NativeIO: Got UserName hadoop for UID 47338 from the native implementation
2016-08-23 10:27:43,720 WARN org.apache.hadoop.mapred.Child: Error running child
me.prettyprint.hector.api.exceptions.HectorException: All host pools marked down. Retry burden pushed out to client.
at me.prettyprint.cassandra.connection.HConnectionManager.getClientFromLBPolicy(HConnectionManager.java:390)
at me.prettyprint.cassandra.connection.HConnectionManager.operateWithFailover(HConnectionManager.java:244)
at me.prettyprint.cassandra.model.ExecutingKeyspace.doExecuteOperation(ExecutingKeyspace.java:113)
at me.prettyprint.cassandra.model.MutatorImpl.execute(MutatorImpl.java:243)
at me.prettyprint.cassandra.service.template.AbstractColumnFamilyTemplate.deleteRow(AbstractColumnFamilyTemplate.java:173)
at org.wso2.carbon.bam.cassandra.data.archive.mapred.CassandraMapReduceRowDeletion$RowKeyMapper.map(CassandraMapReduceRowDeletion.java:246)
at org.wso2.carbon.bam.cassandra.data.archive.mapred.CassandraMapReduceRowDeletion$RowKeyMapper.map(CassandraMapReduceRowDeletion.java:139)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:364)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
Could someone help on this what may lead to this problem? Thanks!
This can happen due to 3 reasons.
1) Cassandra servers are down. I don't thing this is the case in your setup.
2) Network issues
3) The load is higher than what cluster can handle.
How do you delete data? Using a hive script?
After I increase the number of open files and max thread number,the problem is gone.
I am using Cassandra 1.1.1 and Using Hector 1.0.5, am trying to insert data (heavy volume) in to a column family. During execution of my program, the cassandra server crashes and displays the Out-of-memory error. After that I am left with no option than quitting the server. This gets repeated for one column family where I am trying to store html file(s) content and I never get a chance to complete it. The html file contents varies from 225 KB data to 700 KB data for one row and I am trying to insert almost 1000 records.
In the program it throws the below
Exception in thread "main" me.prettyprint.hector.api.exceptions.HectorException: All host pools marked down. Retry burden pushed out to client.
at me.prettyprint.cassandra.connection.HConnectionManager.getClientFromLBPolicy(HConnectionManager.java:393)
at me.prettyprint.cassandra.connection.HConnectionManager.operateWithFailover(HConnectionManager.java:249)
at me.prettyprint.cassandra.model.ExecutingKeyspace.doExecuteOperation(ExecutingKeyspace.java:97)
at me.prettyprint.cassandra.model.MutatorImpl.execute(MutatorImpl.java:243)
at com.epocrates.soa.rx.util.DiseaseImporter.insertDisease(DiseaseImporter.java:207)
at com.epocrates.soa.rx.util.DiseaseImporter.batchProcess(DiseaseImporter.java:81)
at com.epocrates.soa.rx.
util.DiseaseImporter.main(DiseaseImporter.java:37)
In System.log, I find the below
java.io.IOError: java.io.IOException: Map failed
at org.apache.cassandra.db.commitlog.CommitLogSegment.<init>(CommitLogSegment.java:127)
at org.apache.cassandra.db.commitlog.CommitLogSegment.freshSegment(CommitLogSegment.java:80)
at org.apache.cassandra.db.commitlog.CommitLogAllocator.createFreshSegment(CommitLogAllocator.java:244)
at org.apache.cassandra.db.commitlog.CommitLogAllocator.access$500(CommitLogAllocator.java:49)
at org.apache.cassandra.db.commitlog.CommitLogAllocator$1.runMayThrow(CommitLogAllocator.java:104)
at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.io.IOException: Map failed
at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:758)
at org.apache.cassandra.db.commitlog.CommitLogSegment.<init>(CommitLogSegment.java:119)
... 6 more
Caused by: java.lang.OutOfMemoryError: Map failed
at sun.nio.ch.FileChannelImpl.map0(Native Method)
at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:755)
... 7 more
java.util.concurrent.RejectedExecutionException: ThreadPoolExecutor has shut down
at org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor$1.rejectedExecution(DebuggableThreadPoolExecutor.java:60)
at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:767)
at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:658)
at org.apache.cassandra.service.StorageProxy.insertLocal(StorageProxy.java:457)
at org.apache.cassandra.service.StorageProxy.sendToHintedEndpoints(StorageProxy.java:314)
at org.apache.cassandra.service.StorageProxy$2.apply(StorageProxy.java:119)
at org.apache.cassandra.service.StorageProxy.performWrite(StorageProxy.java:260)
at org.apache.cassandra.service.StorageProxy.mutate(StorageProxy.java:193)
at org.apache.cassandra.thrift.CassandraServer.doInsert(CassandraServer.java:637)
at org.apache.cassandra.thrift.CassandraServer.internal_batch_mutate(CassandraServer.java:587)
at org.apache.cassandra.thrift.CassandraServer.batch_mutate(CassandraServer.java:595)
at org.apache.cassandra.thrift.Cassandra$Processor$batch_mutate.getResult(Cassandra.java:3112)
at org.apache.cassandra.thrift.Cassandra$Processor$batch_mutate.getResult(Cassandra.java:3100)
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:32)
at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:34)
at org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:186)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
This means that you've run out of address space to map commitlog segments into.
Best solution: upgrade to a 64bit JVM.
Worse solution: in cassandra.yaml, set commitlog_segment_size_in_mb and commitlog_total_space_in_mb both to 16.
This isn't the first time this has come up; I've opened https://issues.apache.org/jira/browse/CASSANDRA-4422 to improve the defaults.
I have a cassandra cluster of 8 nodes, cassandra 1.0.8.
I'm trying to do lots of small inserts in a loop with batch_mutate(). After some time (~200K inserts) server resets connection with following exception:
org.apache.thrift.transport.TTransportException: java.net.SocketException: Connection reset
at org.apache.thrift.transport.TIOStreamTransport.write(TIOStreamTransport.java:147)
at org.apache.thrift.transport.TFramedTransport.flush(TFramedTransport.java:157)
at org.apache.cassandra.thrift.Cassandra$Client.send_batch_mutate(Cassandra.java:998)
at org.apache.cassandra.thrift.Cassandra$Client.batch_mutate(Cassandra.java:986)
at org.scale7.cassandra.pelops.Mutator$1.execute(Mutator.java:46)
at org.scale7.cassandra.pelops.Mutator$1.execute(Mutator.java:42)
at org.scale7.cassandra.pelops.Operand.tryOperation(Operand.java:56)
at org.scale7.cassandra.pelops.Mutator.execute(Mutator.java:51)
...
Caused by: java.net.SocketException: Connection reset
at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:96)
at java.net.SocketOutputStream.write(SocketOutputStream.java:136)
at org.apache.thrift.transport.TIOStreamTransport.write(TIOStreamTransport.java:145)
... 25 more
Otherwise than it, the cluster works fine. Server logs are clean.
What might be causing this problem?
Thanks!
I've found the cause: batch mutation size was exceeding the frame size of TFramedTransport.
There might two possible solutions: either to increase "thrift_framed_transport_size_in_mb" configuration property in cassandra.yaml, or to use smaller batch size.
i would be thankfull if a sophisticated user could name all possible solutions (best practices) how to fix Hector Client Timeouts like this:
Caused by: me.prettyprint.hector.api.exceptions.HTimedOutException: TimedOutException()
Caused by: TimedOutException()
at org.apache.cassandra.thrift.Cassandra$multiget_slice_result.read(Cassandra.java:9628)
at org.apache.cassandra.thrift.Cassandra$Client.recv_multiget_slice(Cassandra.java:636)
at org.apache.cassandra.thrift.Cassandra$Client.multiget_slice(Cassandra.java:608)
at me.prettyprint.cassandra.service.KeyspaceServiceImpl$10.execute(KeyspaceServiceImpl.java:388)
... 21 more
HECTOR:
Taken from the Hector Documentation :https://github.com/rantav/hector/wiki/User-Guide
I found the following related to timeouts:
1.) cassandraThriftSocketTimeout
CASSANDRA:
1.) rpc_timeout_in_ms: 10000 (in cassandra.yaml)
What other Settings are available related to timeouts both on Hector and on Cassandra side? I have time! So I simply want to wait longer! But I have not found the settings therefor to wait longer.
Thanks
Markus
From the cassandra.thrift API in the Apache Cassandra source tree regarding TimeoutException:
"RPC timeout was exceeded. either a node failed mid-operation, or load was too high, or the requested op was too large."
In short you were asking for too much data. What sort of query were you sending? Can you post a code snippet of such?