Cassandra row cache eviction policy - cassandra

I have been reading about Cassandra's row cache, and came across this post: Difference between Cassandra Row caching and Partition key caching
In the newer implementation of row cache, the whole partition doesn't need to be saved. Rather you can specify the number of rows one wants to save per partition while creating the table. However, what's the eviction policy when a write request comes? Does it still invalidate the whole partition even if only one row is modified in the given partition?

Row cache not recommended for most cases.
And yes, it still invalidates whole partition.
Tip: Enable a row cache only when the number of reads is much bigger
(rule of thumb is 95%) than the number of writes. Consider using the
operating system page cache instead of the row cache, because writes
to a partition invalidate the whole partition in the cache.
Source:
https://docs.datastax.com/en/cassandra/3.0/cassandra/operations/opsConfiguringCaches.html

Related

Is there any side effect of increasing row_cache_size_in_mb in Cassandra?

In this config :
64 Gb, 16 cores, Linux CentOS with Cassandra 3.1
row_cache_size_in_mb is set to zero now (cassandra.yaml)
It seems working well since the OS Page cache is used for caching read.
So, is there any benefits/risks (JVM heap) to increase this number
vs using Linux page caching?
Row cache is used only for the tables that explicitly enable caching of the rows data, and not used by default. Row cache usually is used only for most read data that doesn't change very often, otherwise, change of the data will lead to an additional performance overhead from invalidating cache data & re-populating of cache entries from disk. You can read more in the following document from the "best practices" series published by DataStax.
Regarding relation between row cache and Linux's buffer cache - the main distinction is that row cache keeps the full rows that potentially could be assembled from multiple SSTables, while buffer cache keeps the chunks of the SSTables, that are often compressed, and Cassandra will need to decompress them again and again. Also, if partition is scattered over multiple SSTables, then Cassandra will need to check them when reading the row.
Its all about the workload and the application query pattern.
If you application frequently reads a small subset of rows (hot) and each row in its entirety, enabling this can bring in a significant performance benefit by avoiding a disk read. There are some row cache hit rate JMX metrics available which can inform about any performance variation between row and key cache sizes for your application load.
If you haven't manually configured row cache a table description should look like below.
Default: { 'keys': 'ALL', 'rows_per_partition': 'NONE' }.
If enabled the size should be proportional to in memory size of a row data and its column values over the hot subset. For a rough estimate use nodetool cfstats, multiply the Row cache size which is the number of rows in the cache, by the Compacted row mean size and sum them.
As with any memory allocation it has impact on garbage collection though there are some partial or complete off heap implementation classes available. From Datastax docs :
row_cache_class_name
Default: disabled. note The classname of the row cache provider to use. Valid values: OHCProvider (fully off-heap) or SerializingCacheProvider (partially off-heap).
As the entire row is cached it can be expensive. One thing to note is if rows are frequently evicted from the row cache (size is set too low or row data frequently change), the garbage collector will definitely have more to do.
Bottomline : For an ideal row cache use, a small set of rows must be hot. Row cache provides benefit when the entire row is accessed at once. If an off-heap implementation is used it poses little risk to heap. In the end do some load testing and capture some latency metrics to determine the size of cache that best fits your need and is adequate.

What are the impacts of high value row cache?

Recently I have gone through a tutorial about key cache and row cache. Can anyone help me with some real time examples where these caches can impact? And what is the impact if we increase these values in the config file?
On using desc table I found this
AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
Your main concern is the memory profile of your application.
This diagram demonstrates how the key cache optimises the readpath, it allows us to skip the partition summary and partition index, and go straight to the compression offset. As for the row cache, if you get a hit, you've got your answer and don't need to go down the read path at all.
Key cache - The key cache is on by default as it only keeps the key of the row. Keys are typically smaller relative to the rest of the row so this cache can hold many entries before it's exhausted.
Row cache - The row cache holds an entire row and is useful when you have a fairly static querying pattern. The argument for the row cache is that if you read the same rows over and over, you can just keep them in memory rather going to the SSTable (storage medium) level and thus bypass an expensive seek on the read path. In practice the memory slow downs caused by usage of the row cache in non-optimal use-cases makes it an unpopular feature.
So what happens if you fill up the cache? Well, there's an eviction policy but if you're constantly kicking stuff out of either cache to make room for new items, then the caches won't exactly be useful as the gc related performance degradation will hurt overall performance.
What about having very high cache values? This is where there are better alternatives, more on this later. Making the row cache huge would just lead to GC issues, which depending on what you're doing exactly, typically leads to an overall net-loss in performance.
One idea I've seen being utilised relatively well is having a caching layer on top of Cassandra, such as Apache Ignite or Memcached. You load hot data in the caching layer to get fast READs and you write with an application that writes to the cache layer then to C* for persistence. These architectures come with many of their own headaches but if you want to cache data for lower query latencies, the C* row cache isn't the best tool for the job.

Is update in place possible in Cassandra?

I have a table in Cassandra where I populate some rows with 1000s of entries (each row is with 10000+ columns). The entries in the rows are very frequently updated, basically just a field (which is an integer) is updated with different values. All other values for the columns remains unmodified. My question is, will the updates be done in-place ? How good is Cassandra for frequent update of entries ?
First of all every update is also a sequential write for cassandra so, as far as cassandra goes it does not make any difference to cassandra whether you update or write.
The real question is how fast do you need to read those writes to be available for reading? As #john suggested, first all the writes are written to a mutable CQL Memtable which resides in memory. So, every update is essentially appended as a new sequential entry to memtable for a particular CQL table. It is concurrently periodically also written to `commitlog' (every 10 seconds) for durability.
When Memtable is full or total size for comittlog is reached, cassandra flushes all the data to immutable Sorted String Table (SSTable). After the flush, compaction is the procedure where all the PK entries for the new column values are kept and all the previous values (before update) are removed.
With flushing frequently comes the overhead on frequent sequential writes to disk and compaction which could take lot of I/O and have a serious impact on cassandra performance.
As far as read goes, first cassandra will try to read from row cache (if its enabled) or from memtable. If it fails there it will go to bloom filter, key cache, partition summary, partition index and finally to SSTable in that order. When the data is collected for all the column values, its aggregate in memory and the column values with latest timestamp are returned to client after aggregation and an entry is made in row cache for that partition key`.
So, yes when you query a partition key, it will scan across all the SSTable for that particular CQL table and the memtable for all the column values that are not being flushed to disk yet.
Initially these updates are stored in an in-memory data structure called Memtable. Memtables are flushed to immutable SSTables at regular intervals.
So a single wide row will be read from various SSTables. It is during a process called 'compacation' the different SSTables will be merged into a bigger SSTable on the disk.
Increasing thresholds for flushing Memtables is one way of optimization. If updates are coming very fast before Memtable is flushed to disk, i think that update should be in-place in memory, not sure though.
Also each read operation checks Memtables first, if data is still there, it will be simply returned – this is the fastest possible access.
Cassandra read path:
When a read request for a row comes in to a node, the row must be combined from all SSTables on that node that contain columns from the row in question
Cassandra write path:
No, in place updates are not possible.
As #john suggested, if you have frequent writes then you should delay the flush process. During the flush, the multiple writes to the same partition that are stored in the MemTable will be written as a single partition in the newly created SSTable.
C* is fine for heavy writes. However, you'll need to monitor the number of SSTables accessed per read. If the # is too high, then you'll need to review your compaction strategy.

Difference between Cassandra Row caching and Partition key caching

What is the difference between row cache and Partition key cache? shall i need to use both for the good performance Perspective.
I have already read the basic definition from dataStax website
The partition key cache is a cache of the partition index for a
Cassandra table. Using the key cache instead of relying on the OS page
cache saves CPU time and memory. However, enabling just the key cache
results in disk (or OS page cache) activity to actually read the
requested data rows.
The row cache is similar to a traditional cache like memcached. When a
row is accessed, the entire row is pulled into memory, merging from
multiple SSTables if necessary, and cached, so that further reads
against that row can be satisfied without hitting disk at all.
Can anyone elaborate the area of uses . do need to have both implement both . ?
TL;DR : You want to use Key Cache and most likely do NOT want row cache.
Key cache helps C* know where a particular partition begins in the SStables. This means that C* does not have to read anything to determine the right place to seek to in the file to begin reading the row. This is good for almost all use cases because it speeds up reads considerably by potentially removing the need for an IOP in the read-path.
Row Cache has a much more limited use case. Row cache pulls entire partitions into memory. If any part of that partition has been modified, the entire cache for that row is invalidated. For large partitions this means the cache can be frequently caching and invalidating big pieces of memory. Because you really need mostly static partitions for this to be useful, for most use cases it is recommended that you do not use Row Cache.

physical disk space management of cassandra

Recently I have been looking into Cassandra from our new project's perspective and learned a lot from this community and its wiki too. But I have not found anything about about how updates are managed in Cassandra in terms of physical disk space management though it seems to be very much similar to record delete management using compaction.
Suppose there are 100 records with 5 column values each so when all changes would be flushed disk all records will be written adjacently and when delete operation is done then its marked in Memory table first and physically record is deleted after some time as set in configuration or when its full. And the compaction process claims the space.
Now question is that at one side being schema less there is no fixed number of columns at the the beginning but on the other side when compaction process takes place then.. does it put records adjacently on disk like traditional RDBMS to speed up the read process as for RDBMS its easy because they have to allocate fixed amount of space as per declaration of columns datatype.
But how Cassandra exactly makes the records placement on disk in compaction process (both for update/delete) to speed up the reads?
One more question related to compaction is that when there is no delete queries but there is an update query which updates an existent record with some variable length data or insert altogether a new column then how compaction makes its space available on disk between already existent data rows?
Rows and columns are stored in sorted order in an SSTable. This allows a compaction of multiple SSTables to output a new, (sorted) SSTable, with only sequential disk IO. This new SSTable will be outputted into a new file and freespace on the disks. This process doesn't depend on the number of rows of columns, just on them being stored in a sorted order. So yes, in all SSTables (even those resulting form compactions) rows and columns will be arranged in a sorted order on disk.
Whats more, as you hint at in your question, updates are no different from inserts - they do not overwrite the value on disk, but instead get buffered in a Memtable, then get flushed into a new SSTable. When the new SSTable eventually gets compacted with the SSTable containing the original value, the newer value will annihilate the old one - ie the old value will not be outputted from the compaction. Timestamps are used to decide which values is newest.
Deletes are handled in the same fashion, effectively inserted an "anti-value", or tombstone. The limitation of this process is that is can require significant space overhead. Deletes are effectively 'lazy, so the space doesn't get freed until some time later. Also, while the output of the compaction can be the same size as the input, the old SSTables cannot be deleted until the new one is completed, so this can reduce disk utilisation to 50%.
In the system described above, new values for an existing key can be a different size to the existing key without padding to some pre-determined length, as the new value does not get written over the old value on update, but to a new SSTable.

Resources