I've got a mostly idle (till now) 4 node cassandra cluster that's hooked up with opscenter. There's a table that's had very little writes in the last week or so (test cluster). It's running 2.1.0. Happened to ssh in, and out of curiosity, ran du -sh * on the data directory. Here's what I get:
4.2G commitlog
851M data
188K saved_caches
There's 136 files in the commit log directory. I flushed, and then drained cassandra, stopped and started the service. Those files are still there. What's the best way to get rid of these? Most of the stuff is opscenter related, and I'm inclined to just blow them away as I don't need the test data. Wondering what to do in case this pops up again. Appreciate any tips.
The files in the commit log directory have a fixed size determined by your settings in the cassandra.yaml. All files have a pre-allocated size, so you cannot change it by making flush, drain or other operations on the cluster.
You have to change the configuration if you want to reduce their size.
Look at the configuration settings "commitlog_total_space_in_mb" and "commitlog_segment_size_in_mb" to configure the size of each file and the total space occupied by all of them.
Related
Desired behaviour
I'm trying to configure cassandra cdc in a way that the commitlogsegments are flushed periodically to the cdc_raw directory (let's say every 10 seconds).
Based upon documentation from http://abiasforaction.net/apache-cassandra-memtable-flush/ and from https://docs.datastax.com/en/dse/5.1/dse-admin/datastax_enterprise/config/configCDCLogging.html I found:
memtable_flush_period_in_ms – This is a CQL table property that
specifies the number of milliseconds after which a memtable should be
flushed. This property is specified on table creation.
and
Upon flushing the memtable to disk, CommitLogSegments containing data
for CDC-enabled tables are moved to the configured cdc_raw directory.
Putting those together I would think that by setting memtable_flush_period_in_ms: 10000 cassandra flushes it's CDC changes to disk every 10 seconds, which is what I want to accomplish.
My configuration
Based upon aforementioned and my configuration I would expect that the memtable gets flushed to the cdc_raw directory every 10 seconds. I'm using the following configuration:
cassandra.yaml:
cdc_enabled: true
commitlog_segment_size_in_mb: 1
commitlog_total_space_in_mb: 2
commitlog_sync: periodic
commitlog_sync_period_in_ms: 10000
table configuration:
memtable_flush_period_in_ms = 10000
cdc = true
Problem
The memtable is not flushed periodically to the cdc_raw directory, but instead gets flushed to the commitlogs directory when a certain size threshold is reached.
In detail, the following happens:
When a commitlogsegment reaches 1MB, it's flushed to the commitlog directory. There is a maximum of 2 commitlogs in the commitlog directory (see configuration commitlog_total_space_in_mb: 2). When this threshold is reached, the oldest commitlog file in the commitlog directory is moved to the cdc_raw directory.
Question
How to flush Cassandra CDC changes periodically to disk?
Apache Cassandra's CDC in current version is tricky.
Commit log is 'global', meaning changes to any table go to the same commit log.
Your commit log segment can (and will) contain logs from tables other than the ones with CDC enabled. These include system tables.
Commit log segment is deleted and moved to cdc_raw directory after every logs in the commit log segment are flushed.
So, even you configure your CDC-enabled table to flush every 10 sec, there are logs from other tables still in the same commit log segment, which prevent from moving commit log to CDC directory.
There is no way to change the behavior other than trying to speed up the process by reducing commitlog_segment_size_in_mb (but you need to be careful not to reduce it to the size smaller than your single write requset).
This behavior is improved and will be released in next major version v4.0. You can read your CDC as fast as commit log is synced to disk (so when you are using periodic commit log sync, then you can read your change every commit_log_sync_period_in_ms milliseconds.
See CASSANDRA-12148 for detail.
By the way, you set commitlog_total_space_in_mb to 2, which I definitely do not recommend. What you are seeing right now is that Cassandra flushes every table when your commit log size exceeded this value to make more space. If you cannot reclaim your commit log space, then Cassandra would start throwing error and rejects writes.
everybody
I'm running cassandra2.2.8, and have configured commitlog archive to run automatically.
The commitlog_archiving.properties:
archive_command=/bin/cp %path /data1/backup/%name
But I noticed it always copy the files that have rotated, while not the commitlog that is working currently . for instance, I have a commitlog file CommitLog-5-1533697321883.log is working now, and after it's rotated to another file, CommitLog-5-1533697321884.log, the file CommitLog-5-1533697321883.log will get archived, now all sessions is going to the CommitLog-5-1533697321884.log file, but it's not backuped at all, will lost in a disaster recovery.
My question is, is this the designed behaviour? What can I do to improve this situation? or is there any improvement in the cassandra 3?
Yes, this is designed behaviour - the current commit log is incomplete by design - that's why you get access to archived commit logs. (afaik, this is the same for most databases).
If your data is critical, you may need to consider tuning of consistency levels.
As sstables are immutable and sstable split has to be performed offline ie. with node shutdown. Wouldn't it also be possible to split copies of extreme large sstables offline/in a sideline dir, while keeping a node online then following swap the extreme sstables with a set of splitted sstable files during a short restart of a node to minimize node downtime?
Or would it be better to decommission a node, spreading data over rest of cluster and then rejoin as a new empty node
Eg. having some large sstables which ain't getting into a compaction view any time soon. I'll like to split such offline say in another directory/FS/on another box, just where ever out of scope from running node while still having the node servicing redundancy from original sstable path. Only it seems sstablesplit want to find the configuration or can it be tricked to otherwise do a split out-reach from the running node?
Tried on a copy of a sstable file to split it, but:
on-a-offlinebox$ sstablesplit --debug -s SOME-VALUE-IN-MB mykeyspc-mycf-*-Data.db 16:58:13.197 [main] ERROR o.a.c.config.DatabaseDescriptor - Fatal
configuration error
org.apache.cassandra.exceptions.ConfigurationException: Expecting URI
in variable: [cassandra.config]. Please prefix the file with file:///
for local files or file:/// for remote files. Aborting. If you
are executing this from an external tool, it needs to set
Config.setClientMode(true) to avoid loading configuration.
at org.apache.cassandra.config.YamlConfigurationLoader.getStorageConfigURL(YamlConfigurationLoader.java:73)
~[apache-cassandra-2.1.15.jar:2.1.15]
at org.apache.cassandra.config.YamlConfigurationLoader.loadConfig(YamlConfigurationLoader.java:84)
~[apache-cassandra-2.1.15.jar:2.1.15]
at org.apache.cassandra.config.DatabaseDescriptor.loadConfig(DatabaseDescriptor.java:161)
~[apache-cassandra-2.1.15.jar:2.1.15]
at org.apache.cassandra.config.DatabaseDescriptor.(DatabaseDescriptor.java:136)
~[apache-cassandra-2.1.15.jar:2.1.15]
at org.apache.cassandra.tools.StandaloneSplitter.main(StandaloneSplitter.java:56)
[apache-cassandra-2.1.15.jar:2.1.15] Expecting URI in variable:
[cassandra.config]. Please prefix the file with file:/// for local
files or file:/// for remote files. Aborting. If you are
executing this from an external tool, it needs to set
Config.setClientMode(true) to avoid loading configuration. Fatal
configuration error; unable to start. See log for stacktrace.
If you can afford downtime for the node, just do it (split the tables). Anyway, if you will do this split on another machine/another dir you will need to run repair on the node (due to "offline" time of the rebuilded tables) after reloading sstables.
You can also try to drop this tables data files from your node and running repair, it will be probably minimal downtime for the node:
Stop the node -> Delete big sstables -> Start the node -> Repair.
EDIT: Since Cassandra 3.4, you can run compact command on specific sstables/files. On any earlier version, you can use forceUserDefinedCompaction jmx call. You can use one of these, or make a jmx call by yourself:
http://wiki.cyclopsgroup.org/jmxterm/manual.html
https://github.com/hancockks/cassandra-compact-cf
https://gist.github.com/jeromatron/e238e5795b3e79866b83
Example code with jmxterm:
sudo java -jar jmxterm-1.0-alpha-4-uber.jar -l localhost:7199
bean org.apache.cassandra.db:type=CompactionManager
run forceUserDefinedCompaction YourKeySpaceName_YourFileName.db
Also, if "big tables" problem occurs all the time, consider moving to LCS.
I started to use cassandra 3.7 and always I have problems with the commitlog. When the pc unexpected finished by a power outage for example the cassandra service doesn't restart. I try to start for the command line, but always the error cassandra could not read commit log descriptor in file appears.
I have to delete all the commit logs to start the cassandra service. The problem is that I lose a lot of data. I tried to increment the replication factor to 3, but is the same.
What I can do to decrease amount of lost data?
pd: I only one pc to use cassandra database, it is not possible to add more pcs.
I think your option here is to work around the issue since its unlikely there is a guaranteed solution to prevent commit table files getting corrupted on sudden power outage. Since you only have a single node, it makes it more difficult to recover the data. Increasing the replication factor to 3 on a single node cluster is not going to help.
One thing you can try is to reduce the frequency at which the memtables are flushed. On flush of memtable the entries in the commit log are discarded, therefore reducing the amount of data lost. Details here. This will however not resolve the root issue
I'm running a two node Datastax AMI cluster on AWS. Yesterday, Cassandra started refusing connections from everything. The system logs showed nothing. After a lot of tinkering, I discovered that the commit logs had filled up all the disk space on the allotted mount and this seemed to be causing the connection refusal (deleted some of the commit logs, restarted and was able to connect).
I'm on DataStax AMI 2.5.1 and Cassandra 2.1.7
If I decide to wipe and restart everything from scratch, how do I ensure that this does not happen again?
You could try lowering the commitlog_total_space_in_mb setting in your cassandra.yaml. The default is 8192MB for 64-bit systems (it should be commented-out in your .yaml file... you'll have to un-comment it when setting it). It's usually a good idea to plan for that when sizing your disk(s).
You can verify this by running a du on your commitlog directory:
$ du -d 1 -h ./commitlog
8.1G ./commitlog
Although, a smaller commit log space will cause more frequent flushes (increased disk I/O), so you'll want to keep any eye on that.
Edit 20190318
Just had a related thought (on my 4-year-old answer). I saw that it received some attention recently, and wanted to make sure that the right information is out there.
It's important to note that sometimes the commit log can grow in an "out of control" fashion. Essentially, this can happen because the write load on the node exceeds Cassandra's ability to keep up with flushing the memtables (and thus, removing old commitlog files). If you find a node with dozens of commitlog files, and the number seems to keep growing, this might be your issue.
Essentially, your memtable_cleanup_threshold may be too low. Although this property is deprecated, you can still control how it is calculated by lowering the number of memtable_flush_writers.
memtable_cleanup_threshold = 1 / (memtable_flush_writers + 1)
The documentation has been updated as of 3.x, but used to say this:
# memtable_flush_writers defaults to the smaller of (number of disks,
# number of cores), with a minimum of 2 and a maximum of 8.
#
# If your data directories are backed by SSD, you should increase this
# to the number of cores.
#memtable_flush_writers: 8
...which (I feel) led to many folks setting this value WAY too high.
Assuming a value of 8, the memtable_cleanup_threshold is .111. When the footprint of all memtables exceeds this ratio of total memory available, flushing occurs. Too many flush (blocking) writers can prevent this from happening expediently. With a single /data dir, I recommend setting this value to 2.
In addition to decreasing the commitlog size as suggested by BryceAtNetwork23, a proper solution to ensure it won't happen again will have monitoring of the disk setup so that you are alerted when its getting full and have time to act/increase the disk size.
Seeing as you are using DataStax, you could set an alert for this in OpsCenter. Haven't used this within the cloud myself, but I imagine it would work. Alerts can be set by clicking Alerts in the top banner -> Manage Alerts -> Add Alert. Configure the mounts to watch and the thresholds to trigger on.
Or, I'm sure there are better tools to monitor disk space out there.