From page 11 of the slide, memtable_allocation_type Cassandra allows to keeping memtables and key cache objects in the native memory, instead of the Java JVM heap. But I found no other evidence that memtable_allocation_type can change the position of key cache.
I'm using apache-cassandra 3.11.3, and are suffering from low key cache hit rate. As increasing key cache size would lead to long gc, is there any way to move key cache to offheap memory?
No, right now the key cache is still in the heap.
I wouldn't say that increase from 1/20th (or 100Mb) to something higher, like 200-300Mb will dramatically increase the garbage collection times...
Related
Is it okay to have large number of partitions in cassandra?
Will heap memory bloat?
Mathematically speaking, Cassandra can support (Murmur3 partitioner) +/- 2^63 partitions. That comes out to a total of about 18.4 quintillion. So no worries there, that’s perfectly fine.
Will heap memory bloat?
No. One, there’s a pre-set size for your heap, and it won’t get any bigger than that. Two, Cassandra doesn’t keep all data or even all keys in memory.
You can configure how many keys/rows are cached on a per-table basis. Just make sure not to cache more than the heap-newgen or the heap can “thrash” (constantly running GC).
In this config :
64 Gb, 16 cores, Linux CentOS with Cassandra 3.1
row_cache_size_in_mb is set to zero now (cassandra.yaml)
It seems working well since the OS Page cache is used for caching read.
So, is there any benefits/risks (JVM heap) to increase this number
vs using Linux page caching?
Row cache is used only for the tables that explicitly enable caching of the rows data, and not used by default. Row cache usually is used only for most read data that doesn't change very often, otherwise, change of the data will lead to an additional performance overhead from invalidating cache data & re-populating of cache entries from disk. You can read more in the following document from the "best practices" series published by DataStax.
Regarding relation between row cache and Linux's buffer cache - the main distinction is that row cache keeps the full rows that potentially could be assembled from multiple SSTables, while buffer cache keeps the chunks of the SSTables, that are often compressed, and Cassandra will need to decompress them again and again. Also, if partition is scattered over multiple SSTables, then Cassandra will need to check them when reading the row.
Its all about the workload and the application query pattern.
If you application frequently reads a small subset of rows (hot) and each row in its entirety, enabling this can bring in a significant performance benefit by avoiding a disk read. There are some row cache hit rate JMX metrics available which can inform about any performance variation between row and key cache sizes for your application load.
If you haven't manually configured row cache a table description should look like below.
Default: { 'keys': 'ALL', 'rows_per_partition': 'NONE' }.
If enabled the size should be proportional to in memory size of a row data and its column values over the hot subset. For a rough estimate use nodetool cfstats, multiply the Row cache size which is the number of rows in the cache, by the Compacted row mean size and sum them.
As with any memory allocation it has impact on garbage collection though there are some partial or complete off heap implementation classes available. From Datastax docs :
row_cache_class_name
Default: disabled. note The classname of the row cache provider to use. Valid values: OHCProvider (fully off-heap) or SerializingCacheProvider (partially off-heap).
As the entire row is cached it can be expensive. One thing to note is if rows are frequently evicted from the row cache (size is set too low or row data frequently change), the garbage collector will definitely have more to do.
Bottomline : For an ideal row cache use, a small set of rows must be hot. Row cache provides benefit when the entire row is accessed at once. If an off-heap implementation is used it poses little risk to heap. In the end do some load testing and capture some latency metrics to determine the size of cache that best fits your need and is adequate.
We have a Redis Cache on Azure Standard 2.5gb. We observe the following behaviour:
Every now and then, we observe large drops in memory usage. It appears that lots of resources are being evicted.
Things to note:
Eviction policy is LRU
Available cache size is 2.5gb
No application code that would evict such large amounts of memory (largest objects are ~80kb and most are significantly smaller)
Observed memory drops represent tens of thousands of keys
We seldom use explicit expiry dates on cached objects, and when we do they are always < 1 hour.
My question is, apart from application logic explicitly evicting keys are there any other circumstances Redis would evict large amounts of keys?
The memory cleanup may not represent evictions.
You say "it appears" that lots of resources are being evicted, but if you are just relying on the reclaimed memory for that appearance, you may be chasing ghosts. Have you checked how this graph overlays with the Total Keys metric available in the Azure Portal? Overlaying the two series should allow you to see whether or not the memory reclamation really is due to eviction or if it's due to another process like Azure perhaps calling MEMORY PURGE periodically on the cache instance to clean up dirty pages?
Can you change your redis eviction policy to noeviction and see if that addresses your problem? Doing so means you will have to manage all content yourself. https://redis.io/topics/lru-cache has more details.
What is the difference between row cache and Partition key cache? shall i need to use both for the good performance Perspective.
I have already read the basic definition from dataStax website
The partition key cache is a cache of the partition index for a
Cassandra table. Using the key cache instead of relying on the OS page
cache saves CPU time and memory. However, enabling just the key cache
results in disk (or OS page cache) activity to actually read the
requested data rows.
The row cache is similar to a traditional cache like memcached. When a
row is accessed, the entire row is pulled into memory, merging from
multiple SSTables if necessary, and cached, so that further reads
against that row can be satisfied without hitting disk at all.
Can anyone elaborate the area of uses . do need to have both implement both . ?
TL;DR : You want to use Key Cache and most likely do NOT want row cache.
Key cache helps C* know where a particular partition begins in the SStables. This means that C* does not have to read anything to determine the right place to seek to in the file to begin reading the row. This is good for almost all use cases because it speeds up reads considerably by potentially removing the need for an IOP in the read-path.
Row Cache has a much more limited use case. Row cache pulls entire partitions into memory. If any part of that partition has been modified, the entire cache for that row is invalidated. For large partitions this means the cache can be frequently caching and invalidating big pieces of memory. Because you really need mostly static partitions for this to be useful, for most use cases it is recommended that you do not use Row Cache.
There is an In-Memory option introduced in the Cassandra by DataStax Enterprise 4.0:
http://www.datastax.com/documentation/datastax_enterprise/4.0/datastax_enterprise/inMemory.html
But with 1GB size limited for an in-memory table.
Anyone know the consideration why limited it as 1GB? And possible extend to a large size of in-memory table, such as 64GB?
To answer your question: today it's not possible to bypass this limitation.
In-Memory tables are stored within the JVM Heap, regardless the amount of memory available on single node allocating more than 8GB to JVM Heap is not recommended.
The main reason of this limitation is that Java Garbage Collector slow down when dealing with huge memory amount.
However if you consider Cassandra as a distributed system 1GB is not the real limitation.
(nodes*allocated_memory)/ReplicationFactor
allocated_memory is max 1GB -- So your table may contains many GB in memory allocated in different nodes.
I think that in future something will improve but dealing with 64GB in memory it could be a real problem when you need to flush data on disk. One more consideration that creates limitation: avoid TTL when working with In-Memory tables. TTL creates tombstones, a tombstone is not deallocated until the GCGraceSeconds period passes -- so considering a default value of 10 days each tombstone will keep the portion of memory busy and unavailable, possibly for long time.
HTH,
Carlo