Failed to add a new node to cassandra cluster - cassandra

I have a cluster with four nodes, every node with 70G data.
When I add a new node to the cluster, it always
warns me about a tombstones problem like this:
WARN 09:38:03 Read 2578 live and 1114 tombstoned cells in xxxtable (see tombstone_warn_threshold).
10000 columns was requested, slices=[-], delInfo={deletedAt=-9223372036854775808,
localDeletion=2147483647, ranges=[FAE69193423616A400258D99B9C0CCCFEC4A9547C1A1FC17BF569D2405705B8E:_-FAE69193423616A400258D99B9C0CCCFEC4A9547C1A1FC17BF569D2405705B8E:!,
deletedAt=1456243983944000, localDeletion=1456243983][FAE69193423616A40EC252766DDF513FBCA55ECDFAF452052E6C95D4BD641201:_-FAE69193423616A40EC252766DDF513FBCA55ECDFAF452052E6C95D4BD641201:!,
deletedAt=1460026357100000, localDeletion=1460026357][FAE69193423616A41BED8E613CD24BF3583FB6C6ABBA13F19C3E2D1824D01EF6:_-FAE69193423616A41BED8E613CD24BF3583FB6C6ABBA13F19C3E2D1824D01EF6:!, deletedAt=1458176745950000, localDeletion=1458176745][FAE69193423616A41BED8E613CD24BF3B06C1306E35B0ACA719D800D254E5930:_-FAE69193423616A41BED8E613CD24BF3B06C1306E35B0ACA719D800D254E5930:!, deletedAt=1458176745556000, localDeletion=1458176745][FAE69193423616A41BED8E613CD24BF3BA2AE7FC8340F96CC440BDDFFBCBE7D0:_-FAE69193423616A41BED8E613CD24BF3BA2AE7FC8340F96CC440BDDFFBCBE7D0:!,
deletedAt=1458176745740000, localDeletion=1458176745][FAE69193423616A41BED8E613CD24BF3E5A681C7ECC09A93429CEE59A76DA131:_-FAE69193423616A41BED8E613CD24BF3E5A681C7ECC09A93429CEE59A76DA131:!,
deletedAt=1458792793219000, localDeletion=
and finally it take a long time to start and throws
java.lang.OutOfMemoryError: Java heap space
Following is the error log:
INFO 20:39:20 ConcurrentMarkSweep GC in 5859ms. CMS Old Gen: 6491794984 -> 6492437040; Par Eden Space: 1398145024 -> 1397906216; Par Survivor Space: 349072992 -> 336156096
INFO 20:39:20 Enqueuing flush of refresh_token: 693 (0%) on-heap, 0 (0%) off-heap
INFO 20:39:20 Pool Name Active Pending Completed Blocked All Time Blocked
INFO 20:39:20 Enqueuing flush of log_user_track: 7047 (0%) on-heap, 0 (0%) off-heap
INFO 20:39:20 CounterMutationStage 0 0 0 0 0
INFO 20:39:20 Enqueuing flush of userinbox: 42819 (0%) on-heap, 0 (0%) off-heap
INFO 20:39:20 Enqueuing flush of messages: 7954 (0%) on-heap, 0 (0%) off-heap
INFO 20:39:20 ReadStage 0 0 0 0 0
INFO 20:39:20 RequestResponseStage 0 0 6 0 0
INFO 20:39:20 Enqueuing flush of sstable_activity: 6567 (0%) on-heap, 0 (0%) off-heap
INFO 20:39:20 ReadRepairStage 0 0 0 0 0
INFO 20:39:20 Enqueuing flush of convmsgs: 2132 (0%) on-heap, 0 (0%) off-heap
INFO 20:39:20 MutationStage 0 0 72300 0 0
INFO 20:39:20 Enqueuing flush of sstable_activity: 1791 (0%) on-heap, 0 (0%) off-heap
INFO 20:39:20 GossipStage 0 0 23655 0 0
INFO 20:39:20 Enqueuing flush of log_user_track: 1165 (0%) on-heap, 0 (0%) off-heap
INFO 20:39:20 AntiEntropyStage 0 0 0 0 0
INFO 20:39:20 Enqueuing flush of sstable_activity: 2388 (0%) on-heap, 0 (0%) off-heap
INFO 20:39:20 CacheCleanupExecutor 0 0 0 0 0
java.lang.OutOfMemoryError: Java heap space
Dumping heap to java_pid17155.hprof ...
When I run nodetool tpstats, I see the task of MemtableFlushWriter and MemtablePostFlush are pending a lot.
Pool Name Active Pending Completed Blocked All time blocked
CounterMutationStage 0 0 0 0 0
ReadStage 0 0 0 0 0
RequestResponseStage 0 0 8 0 0
MutationStage 0 0 1382245 0 0
ReadRepairStage 0 0 0 0 0
GossipStage 0 0 23553 0 0
CacheCleanupExecutor 0 0 0 0 0
AntiEntropyStage 0 0 0 0 0
MigrationStage 0 0 0 0 0
ValidationExecutor 0 0 0 0 0
CommitLogArchiver 0 0 0 0 0
MiscStage 0 0 0 0 0
MemtableFlushWriter 4 7459 220 0 0
MemtableReclaimMemory 0 0 231 0 0
PendingRangeCalculator 0 0 3 0 0
MemtablePostFlush 1 7464 331 0 0
CompactionExecutor 3 3 269 0 0
InternalResponseStage 0 0 0 0 0
HintedHandoff 0 0 4 0 0

Related

High disk I/O (read) on Cassandra nodes

We have 3 nodes Cassandra cluster.
We have an application that uses a keyspace that creates a hightload on disks, on read. The problem has a cumulative effect. The more days we interact with the keyspace, the more disk reading grows. :
hightload read
Reading goes up to > 700 MB/s. Then the storage (SAN) begins to degrade, and then the Сassandra cluster also degrades.
UPD 25.10.2021: "I wrote it a little wrong, through the SAN space is allocated to a virtual machine, like a normal drive"
The only thing that helps is clearing the keyspace.
Output command "tpstats" and "cfstats"
[cassandra-01 ~]$ nodetool tpstats
Pool Name Active Pending Completed Blocked All time blocked
ReadStage 1 1 1837888055 0 0
MiscStage 0 0 0 0 0
CompactionExecutor 0 0 6789640 0 0
MutationStage 0 0 870873552 0 0
MemtableReclaimMemory 0 0 7402 0 0
PendingRangeCalculator 0 0 9 0 0
GossipStage 0 0 18939072 0 0
SecondaryIndexManagement 0 0 0 0 0
HintsDispatcher 0 0 3 0 0
RequestResponseStage 0 0 1307861786 0 0
Native-Transport-Requests 0 0 2981687196 0 0
ReadRepairStage 0 0 346448 0 0
CounterMutationStage 0 0 0 0 0
MigrationStage 0 0 168 0 0
MemtablePostFlush 0 0 8193 0 0
PerDiskMemtableFlushWriter_0 0 0 7402 0 0
ValidationExecutor 0 0 21 0 0
Sampler 0 0 10988 0 0
MemtableFlushWriter 0 0 7402 0 0
InternalResponseStage 0 0 3404 0 0
ViewMutationStage 0 0 0 0 0
AntiEntropyStage 0 0 71 0 0
CacheCleanupExecutor 0 0 0 0 0
Message type Dropped
READ 7
RANGE_SLICE 0
_TRACE 0
HINT 0
MUTATION 5
COUNTER_MUTATION 0
BATCH_STORE 0
BATCH_REMOVE 0
REQUEST_RESPONSE 0
PAGED_RANGE 0
READ_REPAIR 0
[cassandra-01 ~]$ nodetool cfstats box_messages -H
Total number of tables: 73
----------------
Keyspace : box_messages
Read Count: 48847567
Read Latency: 0.055540737801741485 ms
Write Count: 69461300
Write Latency: 0.010656743870327794 ms
Pending Flushes: 0
Table: messages
SSTable count: 6
Space used (live): 3.84 GiB
Space used (total): 3.84 GiB
Space used by snapshots (total): 0 bytes
Off heap memory used (total): 10.3 MiB
SSTable Compression Ratio: 0.23265712113582082
Number of partitions (estimate): 4156030
Memtable cell count: 929912
Memtable data size: 245.04 MiB
Memtable off heap memory used: 0 bytes
Memtable switch count: 92
Local read count: 20511450
Local read latency: 0.106 ms
Local write count: 52111294
Local write latency: 0.013 ms
Pending flushes: 0
Percent repaired: 0.0
Bloom filter false positives: 57318
Bloom filter false ratio: 0.00841
Bloom filter space used: 6.56 MiB
Bloom filter off heap memory used: 6.56 MiB
Index summary off heap memory used: 1.78 MiB
Compression metadata off heap memory used: 1.95 MiB
Compacted partition minimum bytes: 73
Compacted partition maximum bytes: 17084
Compacted partition mean bytes: 3287
Average live cells per slice (last five minutes): 2.0796939751354797
Maximum live cells per slice (last five minutes): 10
Average tombstones per slice (last five minutes): 1.1939751354797576
Maximum tombstones per slice (last five minutes): 2
Dropped Mutations: 5 bytes
(I'm unable to comment and hence posting it as an answer)
As folks mentioned SAN is not going to be the best suite here and one could read through the list of anti-patterns documented here which could also apply to OSS C*.

Not marking nodes down due to local pause of 8478595263 > 5000000000

i have 3 node cassandra cluster in kubernetes.
Deployed cassandra using bitnami/cassandra helm chart.
getting error based on more number of request after sometime later
WARN [GossipTasks:1] 2020-01-09 11:39:33,070 FailureDetector.java:278 - Not marking nodes down due to local pause of 8206335128 > 5000000000
WARN [GossipTasks:1] 2020-01-09 11:39:42,238 FailureDetector.java:278 - Not marking nodes down due to local pause of 6668041401 > 5000000000
WARN [GossipTasks:1] 2020-01-09 11:40:03,341 FailureDetector.java:278 - Not marking nodes down due to local pause of 15041441083 > 5000000000
WARN [PERIODIC-COMMIT-LOG-SYNCER] 2020-01-09 11:41:55,606 NoSpamLogger.java:94 - Out of 1 commit log syncs over the past 0.00s with average duration of 11850.79ms, 1 have exceeded the configured commit interval by an average of 1850.79ms
WARN [GossipTasks:1] 2020-01-09 11:42:20,019 Gossiper.java:783 - Gossip stage has 1 pending tasks; skipping status check (no nodes will be marked down)
NFO [RequestResponseStage-1] 2020-01-09 11:45:36,329 Gossiper.java:1011 - InetAddress /100.96.7.7 is now UP
INFO [RequestResponseStage-1] 2020-01-09 11:45:36,330 Gossiper.java:1011 - InetAddress /100.96.7.7 is now UP
INFO [ScheduledTasks:1] 2020-01-09 11:45:55,931 MessagingService.java:1236 - MUTATION messages were dropped in last 5000 ms: 0 internal and 45 cross node. Mean internal dropped latency: 0 ms and Mean cross-node dropped latency: 2874 ms
INFO [ScheduledTasks:1] 2020-01-09 11:45:55,933 StatusLogger.java:47 - Pool Name Active Pending Completed Blocked All Time Blocked
INFO [ScheduledTasks:1] 2020-01-09 11:45:55,949 StatusLogger.java:51 - MutationStage 0 0 226236 0 0
INFO [ScheduledTasks:1] 2020-01-09 11:45:55,950 StatusLogger.java:51 - ViewMutationStage 0 0 0 0 0
INFO [ScheduledTasks:1] 2020-01-09 11:45:55,950 StatusLogger.java:51 - ReadStage 0 0 244468 0 0
INFO [ScheduledTasks:1] 2020-01-09 11:45:55,951 StatusLogger.java:51 - RequestResponseStage 0 0 341270 0 0
INFO [ScheduledTasks:1] 2020-01-09 11:45:55,952 StatusLogger.java:51 - ReadRepairStage 0 0 5395 0 0
INFO [ScheduledTasks:1] 2020-01-09 11:45:55,953 StatusLogger.java:51 - CounterMutationStage 0 0 0 0 0
INFO [ScheduledTasks:1] 2020-01-09 11:45:55,958 StatusLogger.java:51 - MiscStage 0 0 0 0 0
INFO [ScheduledTasks:1] 2020-01-09 11:45:55,959 StatusLogger.java:51 - CompactionExecutor 0 0 686641 0 0
INFO [ScheduledTasks:1] 2020-01-09 11:45:55,960 StatusLogger.java:51 - MemtableReclaimMemory 0 0 689 0 0
INFO [ScheduledTasks:1] 2020-01-09 11:45:55,962 StatusLogger.java:51 - PendingRangeCalculator 0 0 9 0 0
INFO [ScheduledTasks:1] 2020-01-09 11:45:55,964 StatusLogger.java:51 - GossipStage 0 0 3093860 0 0
INFO [ScheduledTasks:1] 2020-01-09 11:45:55,966 StatusLogger.java:51 - SecondaryIndexManagement 0 0 0 0 0
INFO [ScheduledTasks:1] 2020-01-09 11:45:55,970 StatusLogger.java:51 - HintsDispatcher 0 0 10 0 0
INFO [ScheduledTasks:1] 2020-01-09 11:45:55,973 StatusLogger.java:51 - MigrationStage 0 0 6 0 0
INFO [ScheduledTasks:1] 2020-01-09 11:45:55,973 StatusLogger.java:51 - MemtablePostFlush 0 0 717 0 0
INFO [ScheduledTasks:1] 2020-01-09 11:45:55,974 StatusLogger.java:51 - PerDiskMemtableFlushWriter_0 0 0 689 0 0
:
INFO [ScheduledTasks:1] 2020-01-09 11:45:55,974 StatusLogger.java:51 - PerDiskMemtableFlushWriter_0 0 0 689 0 0
INFO [ScheduledTasks:1] 2020-01-09 11:45:55,974 StatusLogger.java:51 - ValidationExecutor 0 0 0 0 0
INFO [ScheduledTasks:1] 2020-01-09 11:45:55,975 StatusLogger.java:51 - Sampler 0 0 0 0 0
INFO [ScheduledTasks:1] 2020-01-09 11:45:55,975 StatusLogger.java:51 - MemtableFlushWriter 0 0 689 0 0
INFO [ScheduledTasks:1] 2020-01-09 11:45:55,976 StatusLogger.java:51 - InternalResponseStage 0 0 869 0 0
INFO [ScheduledTasks:1] 2020-01-09 11:45:55,977 StatusLogger.java:51 - AntiEntropyStage 0 0 0 0 0
INFO [ScheduledTasks:1] 2020-01-09 11:45:55,978 StatusLogger.java:51 - CacheCleanupExecutor 0 0 0 0 0
INFO
INFO [Service Thread] 2020-01-09 12:11:49,292 GCInspector.java:284 - ParNew GC in 659ms. CMS Old Gen: 2056877512 -> 2057740336; Par Eden Space: 671088640 -> 0; Par Survivor Space: 2636992 -> 6187520
Tried to solved based on some of the reference issue but not given for kubernetes
Cassandra Error message: Not marking nodes down due to local pause. Why?
Pool Name Active Pending Completed Blocked All time blocked
ReadStage 0 0 245904 0 0
MiscStage 0 0 0 0 0
CompactionExecutor 0 0 696906 0 0
MutationStage 0 0 244820 0 0
MemtableReclaimMemory 0 0 697 0 0
PendingRangeCalculator 0 0 9 0 0
GossipStage 0 0 3138625 0 0
SecondaryIndexManagement 0 0 0 0 0
HintsDispatcher 0 0 10 0 0
RequestResponseStage 0 0 364305 0 0
Native-Transport-Requests 0 0 11089339 0 241
ReadRepairStage 0 0 5395 0 0
CounterMutationStage 0 0 0 0 0
MigrationStage 0 0 6 0 0
MemtablePostFlush 0 0 725 0 0
PerDiskMemtableFlushWriter_0 0 0 697 0 0
ValidationExecutor 0 0 0 0 0
Sampler 0 0 0 0 0
MemtableFlushWriter 0 0 697 0 0
InternalResponseStage 0 0 869 0 0
ViewMutationStage 0 0 0 0 0
AntiEntropyStage 0 0 0 0 0
CacheCleanupExecutor 0 0 0 0 0
Message type Dropped
READ 0
RANGE_SLICE 0
_TRACE 0
HINT 0
MUTATION 45
COUNTER_MUTATION 0
BATCH_STORE 0
BATCH_REMOVE 0
REQUEST_RESPONSE 0
PAGED_RANGE 0
READ_REPAIR 0
From the above "tpstats" metrics looks okay but we can see some mutation there so it indicates that you cluster is going overload. Some requests blocked there too. Commitlogs seem not accepting the write there. You should plan cluster expansion or start debugging why nodes are overloading.
Based on the log entries you posted above, the nodes are overloaded making them unresponsive. Mutations get dropped because the commitlog disks cannot keep up with the writes.
You will need to review the size of your cluster as you might need to add more nodes to increase capacity. Cheers!

Native Transport Requests in Cassandra

I got some points about Native Transport Requests in Cassandra using this link : What are native transport requests in Cassandra?
As per my understanding, any query I execute in Cassandra is an Native Transport Requests.
I frequently get Request Timed Out error in Cassandra and I observed the following logs in Cassandra debug log and as well as using nodetool tpstats
/var/log/cassandra# nodetool tpstats
Pool Name Active Pending Completed Blocked All time blocked
MutationStage 0 0 186933949 0 0
ViewMutationStage 0 0 0 0 0
ReadStage 0 0 781880580 0 0
RequestResponseStage 0 0 5783147 0 0
ReadRepairStage 0 0 0 0 0
CounterMutationStage 0 0 14430168 0 0
MiscStage 0 0 0 0 0
CompactionExecutor 0 0 366708 0 0
MemtableReclaimMemory 0 0 788 0 0
PendingRangeCalculator 0 0 1 0 0
GossipStage 0 0 0 0 0
SecondaryIndexManagement 0 0 0 0 0
HintsDispatcher 0 0 0 0 0
MigrationStage 0 0 0 0 0
MemtablePostFlush 0 0 799 0 0
ValidationExecutor 0 0 0 0 0
Sampler 0 0 0 0 0
MemtableFlushWriter 0 0 788 0 0
InternalResponseStage 0 0 0 0 0
AntiEntropyStage 0 0 0 0 0
CacheCleanupExecutor 0 0 0 0 0
Native-Transport-Requests 0 0 477629331 0 1063468
Message type Dropped
READ 0
RANGE_SLICE 0
_TRACE 0
HINT 0
MUTATION 0
COUNTER_MUTATION 0
BATCH_STORE 0
BATCH_REMOVE 0
REQUEST_RESPONSE 0
PAGED_RANGE 0
READ_REPAIR 0
1) What is the All time blocked state?
2) What is this value : 1063468 denotes? How harmful it is?
3) How to tune this?
Each request is taken processed by the NTR stage before being handed off to read/mutation stage but it still blocks while waiting for completion. To prevent being overloaded the stage starts to block tasks being added to its queue to apply back pressure to client. Every time a request is blocked the all time blocked counter is incremented. So 1063468 requests have at one time been blocked for some period of time due to having to many requests backed up.
In situations where the app has spikes of queries this blocking is unnecessary and can cause issues so you can increase this queue limit with something like -Dcassandra.max_queued_native_transport_requests=4096 (default 128). You can also throttle requests on client side but id try increasing queue size first.
There also may be some request thats exceptionally slow that is clogging up your system. If you have monitoring setup, look at high percentile read/write coordinator latencies. You can also use nodetool proxyhistograms. There may be something in your data model or queries that is causing issues.

Cassandra2.1 write slow in a 1TB data table

I am doing some test in a cassandra cluster,and now i have a table with 1TB data per node.When i used ycsb to do more insert operation,i found the throughput was really low(about 10000 ops/sec) comparing to a same,new table in the same cluster(about 80000 ops/sec).While inserting,the cpu usage was about 40%,and almost no disk usege.
I used nodetool tpstats to get task details,it showed :
Pool Name Active Pending Completed Blocked All time blocked
CounterMutationStage 0 0 0 0 0
ReadStage 0 0 102 0 0
RequestResponseStage 0 0 41571733 0 0
MutationStage 384 21949 82375487 0 0
ReadRepairStage 0 0 0 0 0
GossipStage 0 0 247100 0 0
CacheCleanupExecutor 0 0 0 0 0
AntiEntropyStage 0 0 0 0 0
MigrationStage 0 0 6 0 0
Sampler 0 0 0 0 0
ValidationExecutor 0 0 0 0 0
CommitLogArchiver 0 0 0 0 0
MiscStage 0 0 0 0 0
MemtableFlushWriter 16 16 4745 0 0
MemtableReclaimMemory 0 0 4745 0 0
PendingRangeCalculator 0 0 4 0 0
MemtablePostFlush 1 163 9394 0 0
CompactionExecutor 8 29 13713 0 0
InternalResponseStage 0 0 0 0 0
HintedHandoff 2 2 5 0 0
I found there was a large amount of pending MutationStage and MemtablePostFlush
I have read some related articles about cassandra write limitation,but no useful information.I want to know why there is a huge difference about cassandra throughput between two same tables except the data size?
In addition,i use ssd on my server.However,this phenomenon also occur in another cluster using hdd
When cassandra was running,i found the both %user and %nice on cpu utilization are about 10% while only compactiontask running with compaction throughput about 80MB/S.but i have been set nice value to 0 for my cassandra process.
Wild guess: your system is busy compacting the sstable.
Check it out with nodetool compactionstats
BTW, YCSB does not use prepare statement, which make it bad estimator for actual application load.

error while installing cassandra in windows

INFO 19:07:42,273 GC for ParNew: 2182 ms, 27013384 reclaimed leaving 215461536
used; max is 1171062784
INFO 19:07:44,382 Pool Name Active Pending
INFO 19:07:44,960 ReadStage 0 0
INFO 19:07:44,976 RequestResponseStage 0 0
INFO 19:07:44,976 ReadRepairStage 0 0
INFO 19:07:45,007 MutationStage 0 0
INFO 19:07:45,007 GossipStage 0 0
INFO 19:07:45,007 AntiEntropyStage 0 0
INFO 19:07:45,007 MigrationStage 0 0
INFO 19:07:45,007 StreamStage 0 0
INFO 19:07:45,022 MemtablePostFlusher 0 0
INFO 19:07:45,022 FlushWriter 0 0
INFO 19:07:45,022 MiscStage 0 0
INFO 19:07:45,022 FlushSorter 0 0
INFO 19:07:45,038 InternalResponseStage 0 0
INFO 19:07:45,038 HintedHandoff 0 0
INFO 19:07:45,085 CompactionManager n/a 0
INFO 19:07:45,101 MessagingService n/a 0,0
INFO 19:07:45,116 ColumnFamily Memtable ops,data Row cache siz
/cap Key cache size/cap
INFO 19:07:45,288 system.LocationInfo 0,0
0/0 1/1
INFO 19:07:45,304 system.HintsColumnFamily 0,0
0/0 0/1
INFO 19:07:45,319 system.Migrations 0,0
0/0 0/1
INFO 19:07:45,319 system.Schema 0,0
0/0 0/1
INFO 19:07:45,319 system.IndexInfo 0,0
0/0 0/1
After this my installation process does not proceed. It generally hangs showing:
Listening for thrift clients....
Thats exactly what you want to see if your cassandra node/cluster is up and running.
Schildmeijer is correct - this is a normal log output from running Cassandra after a successful installation.
If you are unsure, then try running the Cassandra CLI (see http://wiki.apache.org/cassandra/CassandraCli) and execute some commands to check that the server node responds.
Cassandra runs as a server process - you don't interact directly with it, only via the CLI or another client tool.

Resources