I am trying to flush the WAL to SST files using yb-ts-cli --server_address=<> flush_all_tablets. After the flushing is complete, I still see exactly 3G of WAL in two tables. Any ideas knows why that might be the case?
Every yb-tserver has these default flags:
--log_min_seconds_to_retain=900
--log_min_segments_to_retain=2
So even if these logs are not needed because this tablet peer's memtables are flushed to SSTables, the database still retains some logs for situations, for example if another peer of this tablet falls behind by a little bit and needs to catch up.
The above two flags also guide log retention (in addition to what's not been flushed).
Related
Been reading up on a Cassandra, and I get the feeling thats its REALLY not fault tolerant, is it?
I mean, take a very simple scenario, incoming write, you write to to the WAL, to the memtable and then mark in the WAL that the write succeeded and then the server crashes before the memtable gets full so its not flushed to disk as an SSTable, meaning I just lost this write + I wont be able to redo it since its marked as "Done" in the WAL.
Am I missing something here or is it really not fault tolerant? Which seems very weird to me since its used in so many places and for so much data, which makes me think im missing something.
The commit log is written to before the memtable. You just write the mutation, there is no marking the mutation as applied to the memtable. The mutation is not removed from the commitlog until after the memtable has been completely flushed to a new sstable.
Although it is important to know, with some commitlog strategies they dont block the ack from write on the commitlog flush, so you can still have a data loss window that is only protected with RF. So its important to know the consistency levels and replication factors for durability as well in those cases. In 4.0+ I think the group commitlog sync is great option between batch and periodic.
I started to use cassandra 3.7 and always I have problems with the commitlog. When the pc unexpected finished by a power outage for example the cassandra service doesn't restart. I try to start for the command line, but always the error cassandra could not read commit log descriptor in file appears.
I have to delete all the commit logs to start the cassandra service. The problem is that I lose a lot of data. I tried to increment the replication factor to 3, but is the same.
What I can do to decrease amount of lost data?
pd: I only one pc to use cassandra database, it is not possible to add more pcs.
I think your option here is to work around the issue since its unlikely there is a guaranteed solution to prevent commit table files getting corrupted on sudden power outage. Since you only have a single node, it makes it more difficult to recover the data. Increasing the replication factor to 3 on a single node cluster is not going to help.
One thing you can try is to reduce the frequency at which the memtables are flushed. On flush of memtable the entries in the commit log are discarded, therefore reducing the amount of data lost. Details here. This will however not resolve the root issue
Currently I am debugging performance issue with Apache Cassandra. when Memtable for the column family is filled, it is queued to be flushed to SSTable. This flushing happens often when you perform massive writes.
When this queue is filled up, writes are blocked until next successful completion of flush. This indicates that your node cannot handle writes it is receiving.
Is there a matrix in nodetool indicating this behaviour? In other words, I want a data indicating a node cannot keep up with writes it is receiving.
Thanks!!
Thats not really true for a couple years. The active memtable is switched and a new memtable takes its position as live. New mutations occur on this live memtable while the "to be flushed" memtables are included in local reads. The MemtableFlushWriter thread pool has the flush tasks queued on it. So you can see how many are pending there (under tpstats). The mutations backing up you can also see under the MutationStage.
Ultimately
nodetool tpstats
Is likely what your looking for.
I want a data indicating a node cannot keep up with writes it is receiving.
Your issue is likely bound to disk I/O not being able to handle the throughput --> flushes of memtables queue up --> writes are blocked
the command dstat is your friend to investigate I/O issues. Some others linux commands may be also handy. Read this excellent blog post from Amy Tobey: https://tobert.github.io/pages/als-cassandra-21-tuning-guide.html
Is there a matrix in nodetool indicating this behaviour?
nodetool tpstats
I believe you're looking for tp (thread pool) stats.
nodetool tpstats
Typically blocked FlushWriters indicates that your storage system is having trouble keeping up with the write workload. Are you using spinning disks by chance? You'll also want to keep an eye on iostat in this case as well.
Here's the docs for tpstats: https://docs.datastax.com/en/cassandra/2.1/cassandra/tools/toolsTPstats.html
I'm running a two node Datastax AMI cluster on AWS. Yesterday, Cassandra started refusing connections from everything. The system logs showed nothing. After a lot of tinkering, I discovered that the commit logs had filled up all the disk space on the allotted mount and this seemed to be causing the connection refusal (deleted some of the commit logs, restarted and was able to connect).
I'm on DataStax AMI 2.5.1 and Cassandra 2.1.7
If I decide to wipe and restart everything from scratch, how do I ensure that this does not happen again?
You could try lowering the commitlog_total_space_in_mb setting in your cassandra.yaml. The default is 8192MB for 64-bit systems (it should be commented-out in your .yaml file... you'll have to un-comment it when setting it). It's usually a good idea to plan for that when sizing your disk(s).
You can verify this by running a du on your commitlog directory:
$ du -d 1 -h ./commitlog
8.1G ./commitlog
Although, a smaller commit log space will cause more frequent flushes (increased disk I/O), so you'll want to keep any eye on that.
Edit 20190318
Just had a related thought (on my 4-year-old answer). I saw that it received some attention recently, and wanted to make sure that the right information is out there.
It's important to note that sometimes the commit log can grow in an "out of control" fashion. Essentially, this can happen because the write load on the node exceeds Cassandra's ability to keep up with flushing the memtables (and thus, removing old commitlog files). If you find a node with dozens of commitlog files, and the number seems to keep growing, this might be your issue.
Essentially, your memtable_cleanup_threshold may be too low. Although this property is deprecated, you can still control how it is calculated by lowering the number of memtable_flush_writers.
memtable_cleanup_threshold = 1 / (memtable_flush_writers + 1)
The documentation has been updated as of 3.x, but used to say this:
# memtable_flush_writers defaults to the smaller of (number of disks,
# number of cores), with a minimum of 2 and a maximum of 8.
#
# If your data directories are backed by SSD, you should increase this
# to the number of cores.
#memtable_flush_writers: 8
...which (I feel) led to many folks setting this value WAY too high.
Assuming a value of 8, the memtable_cleanup_threshold is .111. When the footprint of all memtables exceeds this ratio of total memory available, flushing occurs. Too many flush (blocking) writers can prevent this from happening expediently. With a single /data dir, I recommend setting this value to 2.
In addition to decreasing the commitlog size as suggested by BryceAtNetwork23, a proper solution to ensure it won't happen again will have monitoring of the disk setup so that you are alerted when its getting full and have time to act/increase the disk size.
Seeing as you are using DataStax, you could set an alert for this in OpsCenter. Haven't used this within the cloud myself, but I imagine it would work. Alerts can be set by clicking Alerts in the top banner -> Manage Alerts -> Add Alert. Configure the mounts to watch and the thresholds to trigger on.
Or, I'm sure there are better tools to monitor disk space out there.
When restarting a Cassandra node a lot of time is spend on replaying the commitlog to achieve consistency. In our application, it is more important to bring the node back up and running fast, than to achieve consistency. Therefore we have set “durable_writes = false” on all our manually created keyspaces to disable the commitlog. (We have not touched the system keyspaces). Nevertheless, when we restart a note it still uses about one hour on replaying the commitlog.
What is left in my commitlog?
Can I in any way investigate the content of the commitlog?
How can the commitlog be turned off (if not durable_writes = false)?
durable_writes is set per keyspace, so if there are any keyspaces with it still enabled there will still be mutations in the commitlogs to replay on startup. You may want to walk output of describe schema.
There are some tables (ie system) that you want to keep durable, but it shouldn't have that much to cause an impact to startup. When starting up it logs out which keyspace/tables its reading so you can check which ones its replaying.
One hour is a very long time and has a certain smell to it, there may be something else going on here and probably warrants additional investigation. Some ideas is to check the logs and make sure it is the commitlog replay thats taking time (not rebuilding index summaries or something). Also check that there are not old commit logs that C* doesn't have permissions to delete or something that would stick around.
do 'nodetool drain' before shutting down the node.This will write all the commitlogs to sstables.