On one of our server, compaction process is hanging. It's stuck at 80%. It was stuck for last 3 days. And today we did a cluster restart (one host at time). And again it is stuck at same 80%. CPU usages are 100% and there seems no IO issue. We are seeing following WARNING in system.log
BatchStatement.java (line 226) Batch of prepared statements for [****, ****] is of size 7557, exceeding specified threshold of 5120 by 2437.
I have tried to stop this compaction using nodetool. But this also does not stop.
Can someone please help?
How much disk space left ? At least 50% disk space available if you are using STCS compaction strategy.Other reason, On large partition compaction is happening for particular key so you need to delete the data for particular key.
Related
I'm running 12 nodes of Cassandra in AWS EC2 instance, 4 of them are using almost 80% of the disk space, so compaction failed on these nodes, since the type of the server is EC2 instance, I can't add mode disk space to the existing data volume on the fly, I can't ask IT team to add more nodes to scale and spread the clustre as disk space of other nodes is less than 40%, before fixing the unbalanced cluster issue, is there any way to free up some disk space?
My question is how can I find unused sstables and move them to another partition to run compaction and free up some space?
Any other suggestion to free up some disk space.
PS: I already dropped all the snapshots and backups.
If you are using vnodes then data sizes difference should not be that much. Before coming to solution we must find the reason for big difference in data sizes on different nodes.
You must look into logs to see if there is corruption of some big SStable which resulted in compaction failures and increase in data sizes. Or you can find something in your logs which points to the reason in increasing of disk sizes.
We faced an issue in Cassandra 2.1.16 due to some bug it happened that even after compaction old sstable files were not removed. We read the logs and identified the files which can be removed. This is an example where we found the reason of increased data size after reading the logs.
So your must identify the reason before solution. If it is a dire state you can identify keyspaces/tables which are not used during your main traffic and move those sstables in backup and remove those sstables. Once your compaction process is over you can bring them back.
Warning :Test any procedure before trying on production.
I have running Cassandra cluster and opscenter too.I found that suddenly came "Key cache save of system.KeyCache" when I ran nodetool compactionstats. Also, found same on opscenter. Is there any performance impact due to this?
The key cache saving reuses the compaction manager so it shows up in the current compaction tasks which will show up in compactionstats (and also opscenter). It shouldn't cause any performance issues but if it takes a real long time it might block regular compactions (if your concurrent compactions are lower) from completing.
This is really just so when a node starts up you don't have to wait for the key cache to warm up to improve read performance so not critical, and with a low hit rate may not be very meaningful. If they take a long time to save it may be that your data model has a ton of small partitions so the keycache has many entries that need to be serialized. In that case I would recommend setting key_cache_keys_to_save in your cassandra.yaml to something like 100, 1000 or something you can tune until your save time is more reasonable.
I have a Cassandra 2.1 cluster using Leveled Compaction Strategy.
Base on my calculation, the cluster will run out of space before compaction kick in automatically when it reaches next level. For that reason, I have a cron job that runs "nodetool compact" every week to run a full (major) compaction to remove tomb stoned data points.
I noticed that full compaction consumes very little CPU/network resources. With bigger data set, full compaction runs for days.
I have tried to "setcompactionthroughput" to higher number (128MB/s instead of 32MB/s by default, even tried to set it to 0 (no limit), but full compaction speed doesn't seem to change at all.
Is there anything I can tune to make it faster? Thanks in advance.
There are very few cases where you should run full compaction via nodetool compact - it causes what you're likely seeing now (a single huge data file, which never naturally compacts with other sstables, even/especially when other deletions have happened).
Recovering from the state your in isn't trivial, but is possible. If you have a lot of cpu/IO to spare, you can try toggling from STCS to LCS, and LeveledCompactionStrategy will naturally split up that huge file into thousands of tiny files, and will be much more aggressive about rewriting those files over time (so tombstones are compacted away much more regularly). This is very much CPU and IO intensive, so don't do it if you're near tipping. Also, it will duplicate all data on disk for a short period, so you'll need to be under 50% disk utilization to do this.
If you're over 50% disk utilization, you've backed yourself into a corner, and you'll probably need to add more disk temporarily in order to recover.
I have a two-node Cassandra cluster, with RF of 2. So both nodes contain 100% of data.
Now, I am running short on disk space. I can remove some old data, since they were aggregated and processed before, and I don't need them anymore.
I tried running a delete query from cqlsh, but I get a timeout. I tried increasing timeouts, but it seems that running a query from cqlsh will take much more time.
How can I disable this timeout for a single query or connection? Is there any other way, besides increasing timeout, to remove some data from a node?
My Cassandra version is 3.11.0.
PS. I increases write_request_timeout_in_ms in cassandra.yaml. Is this the correct one for delete queries?
Deletes really shouldn't timeout unless there is a problem related to something else. Its inserting a tombstone with no reads or anything and should be fast/cheap regardless of what exists already. Reading on other hand can be impacted a lot. I would guess GC related problems related to reads. You could check GC logs and maybe increase heap and reduce CMSInitiatingOccupancyFraction (if using cms and not g1).
So check GC and normal logs for issues (look for WARN, ERROR in system log) and at pause times in gc logs >1 second, there should be none.
After issuing delete you could try to do a force compaction (nodetool compact keyspace table) to see if it helps disk space. The delete by itself will not reduce disk space until the data has been compacted with the tombstone.
write_request_timeout_in_ms is the right setting, but if your hitting it something is wrong and your just masking it. It should really take less than 1 millisecond normal use.
Side note: RF=2 on a 2 node cluster is not how C* is designed to run. You have no availability on a database that sacrificed consistency for high availability.
The problem: Our cassandra's database occupies a lot of disc space. The estimated data size is about 10 Gb while disc space occupied is about 100Gb. We do a lot of writes/deletes. We have two nodes.
Here's what we tried to do (in the order it was done):
Run compaction on both nodes - completed, but zero effect
Set gc_grace to 0.
Run repair on both nodes - one node succeeded, on the other repair 'hang up' - it was alive, but lasted 3 days, after which we cut it off.
Run compaction on both nodes - completed, but still zero effect.
Can someone help with this? What should we do next? :)
I faced similar problem with Cassandra 2.0.9.
I succeeded in clearing space on HDD by using nodetool clearsnapshot on every node. It is possible to remove snapshots only for specified column families. Details on nodetool utility usage could be found here.