we just added new second DC to our cassandra cluster with 7 nodes (per 5 jbods SSD), and after replication of the new DC we got periodical stucked compactions of Opscenter.rollup_state table. When this happens node goes to down state for other but it stays alive itself, nodetool drain also stucked on the node and only reboot of the node helps in this situation.
the log below already after restart of the node. below both nodes are stucked in this state.
DEBUG [CompactionExecutor:14] 2019-09-03 17:03:44,456 CompactionTask.java:154 - Compacting (a43c8d71-ce53-11e9-972d-59ed5390f0df) [/cass-db1/data/OpsCenter/rollup_state-43e776914d2911e79ab41dbbeab1d831/mc-581-big-Data.db:level=0, /cass-db1/data/OpsCenter/rollup_state-43e776914d2911e79ab41dbbeab1d831/mc-579-big-Data.db:level=0, ]
other node
DEBUG [CompactionExecutor:14] 2019-09-03 20:38:22,272 CompactionTask.java:154 - Compacting (a00354f0-ce71-11e9-91a4-3731a2137ea5) [/cass-db2/data/OpsCenter/rollup_state-43e776914d2911e79ab41dbbeab1d831/mc-610-big-Data.db:level=0, /cass-db2/data/OpsCenter/rollup_state-43e776914d2911e79ab41dbbeab1d831/mc-606-big-Data.db:level=0, ]
WARN [CompactionExecutor:14] 2019-09-03 20:38:22,273 LeveledCompactionStrategy.java:273 - Live sstable /cass-db2/data/OpsCenter/rollup_state-43e776914d2911e79ab41dbbeab1d831/mc-606-big-Data.db from level 0 is not on corresponding level in the leveled manifest. This is not a problem per se, but may indicate an orphaned sstable due to a failed compaction not cleaned up properly.
what is the way to solve this problem.
Related
After Reaper failed to run repair on 18 nodes of Cassandra cluster, I ran a full repair of each node to fix the failed repair issue, after the full repair, Reaper executed successfully, but after a few days again the Reaper failed to run, I can see the following error in system.log
ERROR [RMI TCP Connection(33673)-10.196.83.241] 2021-09-01 09:01:18,005 RepairRunnable.java:276 - Repair session 81540931-0b20-11ec-a7fa-8d6977dd3c87 for range [(-606604147644314041,-98440495518284645], (-3131564913406859309,-3010160047914391044]] failed with error Terminate session is called
java.io.IOException: Terminate session is called
at org.apache.cassandra.service.ActiveRepairService.terminateSessions(ActiveRepairService.java:191) ~[apache-cassandra-3.11.0.jar:3.11.0]
INFO [Native-Transport-Requests-2] 2021-09-01 09:02:52,020 Message.java:619 - Unexpected exception during request; channel = [id: 0x1e99a957, L:/10.196.18.230:9042 ! R:/10.254.252.33:62100]
io.netty.channel.unix.Errors$NativeIoException: readAddress() failed: Connection timed out
in nodetool tpstats I can see some pending tasks
Pool Name Active Pending
ReadStage 0 0
Repair#18 3 90
ValidationExecutor 3 3
Also in nodetool compactionstats there are 4 pending tasks:
-bash-4.2$ nodetool compactionstats
pending tasks: 4
- Main.visit: 1
- Main.post: 1
- Main.stream: 2
My question is why even after a full repair, reaper is still failing? and what is the root cause of pending repair?
PS: version of Reaper is 2.2.3, not sure if it is a bug in Reaper!
You most likely don't have enough segments in your Reaper repair definition, or the default timeout (30 mins) is too low for your repair.
Segments (and the associated repair session) get terminated when they reach the timeout, in order to avoid stuck repairs. When tuned inappropriately, this can give the behavior you're observing.
Nodetool doesn't set a timeout on repairs, which explains why it passes there. The good news is that nothing will prevent repair from passing with Reaper once tuned correctly.
We're currently working on adaptive repairs to have Reaper deal with this situation automatically, but in the meantime you'll need to deal with this manually.
Check the list of segments in the UI and apply the following rule:
If you have less than 20% of segments failing, double the timeout by adjusting the hangingRepairTimeoutMins value in the config yaml.
If you have more than 20% of segments failing, double the number of segments.
Once repair passes at least twice, check the maximum duration of segments and further tune the number of segments to have them last at most 15 mins.
Assuming you're not running Cassandra 4.0 yet, now that you ran repair through nodetool, you have sstables which are marked as repaired like incremental repair would. This will create a problem as Reaper's repairs don't mark sstables as repaired and you now have two different sstables pools (repaired and unrepaired), which cannot be compacted together.
You'll need to use the sstablerepairedset tool to mark all sstables as unrepaired to put all sstables back in the same pool. Please read the documentation to learn how to achieve this.
There could be a number of things taking place such as Reaper can't connect to the nodes via JMX (for whatever reason). It isn't possible to diagnose the problem with the limited information you've provided.
You'll need to check the Reaper logs for clues on the root cause.
As a side note, this isn't related to repairs and is a client/driver/app connecting to the node on the CQL port:
INFO [Native-Transport-Requests-2] 2021-09-01 09:02:52,020 Message.java:619 - Unexpected exception during request; channel = [id: 0x1e99a957, L:/10.196.18.230:9042 ! R:/10.254.252.33:62100]
io.netty.channel.unix.Errors$NativeIoException: readAddress() failed: Connection timed out
Cheers!
I'm trying to add new node to our cluster (cassandra 2.1.11, 16 nodes, 32Gb ram, 2x3Tb hdd, 8core cpu, 1 datacenter, 2 racks, about 700Gb of data on each node). After start of new node, data (approx 600Gb total) from 16 existing nodes successfully transfered to new node and building of secondary indexes starts. The process of secondary indexes building looks normal, i see info about successfull completition of some secondary indexes building and some stream tasks:
INFO [StreamReceiveTask:9] 2015-11-22 02:15:23,153 StreamResultFuture.java:180 - [Stream #856adc90-8ddd-11e5-a4be-69bddd44a709] Session with /192.168.21.66 is complete
INFO [StreamReceiveTask:9] 2015-11-22 02:15:23,152 SecondaryIndexManager.java:174 - Index build of [docs.docs_ex_pl_ph_idx, docs.docs_lo_pl_ph_idx, docs.docs_author_login_idx, docs.docs_author_extid_idx, docs.docs_url_idx] complete
Curently 9 out of 16 streams successfully finished, according to logs. Everything looks fine, except one issue: this process already lasts 5 full days. There is no errors in logs, no anything suspicious, except extremely slow progress.
nodetool compactionstats -H
shows
Secondary index build ... docs 882,4 MB 1,69 GB bytes 51,14%
So there is some process of index building and it has some progress, but very slow, 1% in half a hour or so.
The only significant difference between the new node and any of existing nodes is the fact that cassandra java process has 21k open files, in contrast of 300 open files on any existing node, and 80k files in the data dir on new node in contrast of 300-500 files in the data dir on any existing node.
Is it normal? At this speed it looks i'll spend 16 weeks or so to add 16 more nodes.
I know this is an old question, but we ran into this exact issue with 2.1.13 using DTCS. We were able to fix it in our test environment by increasing memtable flush thresholds to 0.7 - which didn't make any sense to us, but may be worth trying.
As part of the upgrade of Datastax Entreprise from 4.6.1 to 4.7, I am upgrading Cassandra from 2.0.12.200 to 2.1.5.469. The overall DB size is about 27GB.
One step in the upgrade process is running nodetool upgradesstables on a seed node. However this command has been running for more than 48 hours now and looking at CPU and IO, this node seems to be effectively idle.
This is the current output from nodetool compactionstats. The numbers don't seem to change over the course of a few hours :
pending tasks: 28
compaction type keyspace table completed total unit progress
Upgrade sstables prod_2_0_7_v1 image_event_index 8731948551 17452496604 bytes 50.03%
Upgrade sstables prod_2_0_7_v1 image_event_index 9121634294 18710427035 bytes 48.75%
Active compaction remaining time : 0h00m00s
And this is the output from nodetool netstats:
Mode: NORMAL
Not sending any streams.
Read Repair Statistics:
Attempted: 0
Mismatch (Blocking): 0
Mismatch (Background): 0
Pool Name Active Pending Completed
Commands n/a 0 0
Responses n/a 0 0
My question is, is this normal behavior? Is there a way for me to check if this command is stuck or if it is actually still doing work?
If it is stuck, what's the best course of action?
For what it's worth this node is an M3-Large instance on Amazon EC2 with all the data on the local ephemeral disk (SSD).
I've put up a test cluster - four nodes. Severely underpowered(!) - ok CPU, only 2 gigs of ram, shared non ssd storage. Hey, it's test :)
I just kept it running for three days. No data going in or out..everything's just idle. Connected with opscenter.
This morning, we found one of the nodes went down around 2 am last night. The OS didn't go down (was responding to pings). The cassandra log around that time is:
INFO [MemtableFlushWriter:114] 2014-07-29 02:07:34,952 Memtable.java:360 - Completed flushing /var/lib/cassandra/system/sstable_activity-5a1ff267ace03f128563cfae6103c65e/system-sstable_activity-ka-107-Data.db (686 bytes) for commitlog position ReplayPosition(segmentId=1406304454537, position=29042136)
INFO [ScheduledTasks:1] 2014-07-29 02:08:24,227 GCInspector.java:116 - GC for ParNew: 276 ms for 1 collections, 648591696 used; max is 1040187392
Next entry is:
INFO [main] 2014-07-29 09:18:41,661 CassandraDaemon.java:102 - Hostname: xxxxx
i.e. when we restarted the node through opscenter.
Does that mean it crashed on GC, or that GC finished and something else crashed? Is there some other log I should be looking at?
Note: In opscenter eventlog, we see this:
7/29/2014, 2:15am Warning Node reported as being down: xxxxxxx
I appreciate the nodes are underpowered, but for being completely idle, it shouldn't crash, should it?
Using 2.1.0-rc4 btw.
My guess is your node was shut down by the OOM killer. Because the Linux system over commits ram, when a heavy stress is on the system it may shut down applications to recover memory for the os. With 2G total ram this can happen very easily.
I'm trying to simultaneously add 4 nodes to my current 2-node DC. I have Vnodes turned off as per Datastax suggestion. Right after the major index build in each node, the following warning is printed several times in the logs:
WARN [SolrSecondaryIndex ks.cf index initializer.] 2014-06-20
09:39:59,904 CassandraUtil.java (line 108) Error Operation timed out -
received only 3 responses. on attempt 1 out of 4 with CL QUORUM...
I understand what it means. But why is Cassandra expecting the nodes to fulfill the CL when these nodes are still bootstrapping? More importantly, how does the warning affect the bootstrap? I noticed that the nodes are not doing any index build or streaming anymore; but they also remained in "Active - Joining" state. Is there any chance that they will finish? What should I do?
I'm using DSE 4.0.3. All existing and new nodes in the DC are Search nodes. I pre-computed the tokens using the python program for MurMur3Partitioner.
EDIT:
Although nodetool compactionstats does not show any on-going index build in the nodes, for some reason, I still see a lot of these lines in the logs:
INFO [IndexPool backpressure thread-0] 2014-06-20 12:30:31,346 IndexPool.java (line 472) Throttling at 26 index requests per second with target total queue size at 40
INFO [IndexPool backpressure thread-0] 2014-06-20 12:30:34,169 IndexPool.java (line 428) Back pressure is active with total index queue size 18586 and average processing time 2770
EDIT:
Interestingly, I found the following lines in each node after digging through the log files:
INFO [main] 2014-06-20 09:39:48,588 StorageService.java (line 1036) Bootstrap completed! for the tokens [node token]
INFO [SolrSecondaryIndex ks.cf index initializer.] 2014-06-20 11:32:07,833 AbstractSolrSecondaryIndex.java (line 411) Reindexing 1417116631 commit log updates for core ks.cf
Based from these lines, I feel a lot safer that the bootstrap actually completed and that the nodes are simply re-indexing their data. I don't know, though, why the re-indexing process is not being shown in nodetool compactionstats.
It appears the bootstrap completed, and the DSE Search system is running normally.
why the re-indexing process is not being shown in nodetool compactionstat
DSE Search is not generally exposed via Cassandra command line tools. The log output should show the indexing as having completed, were you able to verify that?