I'm investigating a production cassandra 1.1 performance problem:
Background: read latencies are going above a second. The ring is spread over 2 data centers, 5 nodes in each, on the east and west coasts. The nodes have 64GB of RAM. Row caching is disabled, the JVM heap size is set to 8GB, key caching is enabled with a max capacity of 2GB.
Problem: the key cache hit rate is abysmal, nearly 0%, and the despite all the misses, the cache is not filling up:
(from "nodetool info", here's the key cache info for 2 of the nodes):
Key Cache : size 172992 (bytes), capacity 2147483616 (bytes), 112226 hits, 81631832 requests, 0.000 recent hit rate, 14400 save period in seconds
Key Cache : size 166896 (bytes), capacity 2147483616 (bytes), 94182 hits, 62270620 requests, 0.000 recent hit rate, 14400 save period in seconds
Has anyone seen this before, where there are lots of key cache misses and lots of room in the key cache, and yet the cache is not being populated? Thanks in advance.
The key cache is designed to speed up access for existing data, not non-existing data. You should look into why non-existing data is not being short-circuited at the bloom filter level instead.
Related
I am designing a Redis datastore with ~3000 sorted set keys, each with 60 - 300 items each around 250 bytes in size.
used_memory_overhead = 1055498028 bytes and used_memory_dataset= 9681332 bytes. This ratio seems way too high. used_memory_dataset_perc is less than 1%. Memory usage is exceeding the max of 1.16G and causing keys to be evicted.
Do sorted sets really have 99% memory overhead? Will I have to just find another solution? I just want a list of values that is sorted by a field in the value.
Here's the output of MEMORY INFO . used_memory_dataset_perc just keeps decreasing until it's <1% and eventually the max memory is exceeded
# Memory
used_memory:399243696
used_memory_human:380.75M
used_memory_rss:493936640
used_memory_rss_human:471.05M
used_memory_peak:1249248448
used_memory_peak_human:1.16G
used_memory_peak_perc:31.96%
used_memory_overhead:390394038
used_memory_startup:4263448
used_memory_dataset:8849658
used_memory_dataset_perc:2.24%
allocator_allocated:399390096
allocator_active:477728768
allocator_resident:499613696
used_memory_lua:37888
used_memory_lua_human:37.00K
used_memory_scripts:0
used_memory_scripts_human:0B
number_of_cached_scripts:0
maxmemory:1248854016
maxmemory_human:1.16G
maxmemory_policy:volatile-lru
allocator_frag_ratio:1.20
allocator_frag_bytes:78338672
allocator_rss_ratio:1.05
allocator_rss_bytes:21884928
rss_overhead_ratio:0.99
rss_overhead_bytes:-5677056
mem_fragmentation_ratio:1.24
mem_fragmentation_bytes:94804256
mem_not_counted_for_evict:0
mem_replication_backlog:0
mem_clients_slaves:0
mem_clients_normal:385555150
mem_aof_buffer:0
mem_allocator:jemalloc-5.1.0
active_defrag_running:0
lazyfree_pending_objects:0
In case it is relevant, I am using AWS Elasticache
I am running 5 node apache cassandra cluster(3.11.4), given 48 GB RAM , 12 GB heap memory and 6 vcpus per each node. I can see a lot of load (18 GB)on the cassandra server nodes even when there is no data processing in cassandra servers.I can a lot of GC pauses, because of which I can see "NoHostAvailable" exceptions when I try to push data to cassandra.
Please suggest me how to reduce this load and how can I avoid connection failures "NoHostAvailable".
ID : a65c8072-636a-480d-8774-2c5704361bec
Gossip active : true
Thrift active : true
Native Transport active: true
Load : 18.07 GiB
Generation No : 1576158587
Uptime (seconds) : 205965
Heap Memory (MB) : 3729.16 / 11980.81
Off Heap Memory (MB) : 12.81
Data Center : dc1
Rack : rack1
Exceptions : 21
Key Cache : entries 2704, size 5.59 MiB, capacity 100 MiB, 1966 hits, 4715 requests, 0.417 recent hit rate, 14400 save period in seconds
Row Cache : entries 0, size 0 bytes, capacity 0 bytes, 0 hits, 0 requests, NaN recent hit
rate, 0 save period in seconds
Counter Cache : entries 0, size 0 bytes, capacity 50 MiB, 0 hits, 0 requests, NaN recent hit rate, 7200 save period in seconds
Chunk Cache : entries 25, size 1.56 MiB, capacity 480 MiB, 4207149 misses, 4342386 requests, 0.031 recent hit rate, NaN microseconds miss latency
Percent Repaired : 34.58708788430304%
Token : (invoke with -T/--tokens to see all 256 tokens)
If you have 48Gb RAM I recommend to get to at least cheap of 16Gb or 20. Make sure that you are using G1 GC (default in Java 8).
But NoHostAvailable may depend on the consistency level that you are using, and other factors..
On other side, you may consider to throttle your application - sometimes pushing slower may lead to better throughput.
My Cassandra application entails primarily counter writes and reads. As such, having a counter cache is important to performance. I increased the counter cache size in cassandra.yaml from 1000 to 3500 and did a cassandra service restart. The results were not what I expected. Disk use went way up, throughput went way down and it appears the counter cache is not being utilized at all based on what I'm seeing in nodetool info (see below). It's been almost two hours now and performance is still very bad.
I saw this same pattern yesterday when I increased the counter cache from 0 to 1000. It went quite awhile without using the counter cache at all and then for some reason it started using it. My question is whether there is something I need to do to activate counter cache utilization?
Here are my settings in cassandra.yaml for the counter cache:
counter_cache_size_in_mb: 3500
counter_cache_save_period: 7200
counter_cache_keys_to_save: (currently left unset)
Here's what I get out of nodetool info after about 90 minutes:
Gossip active : true
Thrift active : false
Native Transport active: false
Load : 1.64 TiB
Generation No : 1559914322
Uptime (seconds) : 6869
Heap Memory (MB) : 15796.00 / 20480.00
Off Heap Memory (MB) : 1265.64
Data Center : WDC07
Rack : R10
Exceptions : 0
Key Cache : entries 1345871, size 1.79 GiB, capacity 1.95 GiB, 67936405 hits, 83407954 requests, 0.815 recent hit rate, 14400 save period in seconds
Row Cache : entries 0, size 0 bytes, capacity 0 bytes, 0 hits, 0 requests, NaN recent hit rate, 0 save period in seconds
Counter Cache : entries 5294462, size 778.34 MiB, capacity 3.42 GiB, 0 hits, 0 requests, NaN recent hit rate, 7200 save period in seconds
Chunk Cache : entries 24064, size 1.47 GiB, capacity 1.47 GiB, 65602315 misses, 83689310 requests, 0.216 recent hit rate, 3968.677 microseconds miss latency
Percent Repaired : 8.561186035383143%
Token : (invoke with -T/--tokens to see all 256 tokens)
Here's a nodetool info on the Counter Cache prior to increasing the size:
Counter Cache : entries 6802239, size 1000 MiB, capacity 1000 MiB,
57154988 hits, 435820358 requests, 0.131 recent hit rate,
7200 save period in seconds
Update:
I've been running for several days now trying various values of the counter cache size on various nodes. It is consistent that the counter cache isn't enabled until it reaches capacity. That's just how it works as far as I can tell. If anybody knows a way to enable the cache before it is full let me know. I'm setting it very high because it seems optimal but that means that the cache is down for several hours while it fills up and while it's down my disks are absolutely maxed out with read requests...
Another update:
Further running shows that occasionally the counter cache does kick in before it fills up. I really don't know why that is. I don't see a pattern yet. I would love to know the criteria for when this does and does not work.
One last update:
While the counter cache is filling up native transport is disabled for the node as well. Setting the counter to 3.5 GB I'm now going 24 hours with the node in this low performance state with native transport disabled.
I have found out a way to 100% of the time avoid the counter cache not being enabled and native transport mode disabled. This approach avoids the serious performance problems I encountered waiting for the counter cache to enable (sometimes for hours in my case since I want a large counter cache):
1. Prior to starting Cassandra, set cassandra.yaml file field counter_cache_size_in_mb to 0
2. After starting cassandra and getting it up and running use node tool commands to set the cache sizes:
Example command:
nodetool setcachecapacity 2000 0 1000
In this example, the first value of 2000 sets the key cache size, the second value of 0 is the row cache size and the third value of 1000 is the counter cache size.
Take measurements and decide if those are the optimal values. If not, you can repeat step two without restarting Cassandra with new values as needed
Further details:
Some things that don't work:
Setting the counter_cache_size_in_mb value if the counter cache is not yet enabled. This is the case where you started Cassandra with a non-zero value in counter_cache_size_in_mb in Cassandra.yaml and you have not yet reached that size threshold. If you do this, the counter cache will never enabled. Just don't do this. I would call this a defect but it is the way things currently work.
Testing that I did:
I tested this on five separate nodes multiple times with multiple values. Both initially when Cassandra is just coming up and after some period of time. This method I have described worked in every case. I guess I should have saved some screenshots of nodetool info to show results.
One last thing: If Cassandra developers are watching could they please consider tweaking the code so that this workaround isn't necessary?
The YCSB Endpoint benchmark would have you believe that Cassandra is the golden child of Nosql databases. However, recreating the results on our own boxes (8 cores with hyperthreading, 60 GB memory, 2 500 GB SSD), we are having dismal read throughput for workload b (read mostly, aka 95% read, 5% update).
The cassandra.yaml settings are exactly the same as the Endpoint settings, barring the different ip addresses, and our disk configuration (1 SSD for data, 1 for a commit log). While their throughput is ~38,000 operations per second, ours is ~16,000 regardless (relatively) of the threads/number of client nodes. I.e. one worker node with 256 threads will report ~16,000 ops/sec, while 4 nodes will each report ~4,000 ops/sec
I've set the readahead value to 8KB for the SSD data drive. I'll put the custom workload file below.
When analyzing disk io & cpu usage with iostat, it seems that the reading throughput is consistently ~200,000 KB/s, which seems to suggest that the ycsb cluster throughput should be higher (records are 100 bytes). ~25-30% of cpu seems to be under %iowait, 10-25% in use by the user.
top and nload stats are not ostensibly bottlenecked (<50% memory usage, and 10-50 Mbits/sec for a 10 Gb/s link).
# The name of the workload class to use
workload=com.yahoo.ycsb.workloads.CoreWorkload
# There is no default setting for recordcount but it is
# required to be set.
# The number of records in the table to be inserted in
# the load phase or the number of records already in the
# table before the run phase.
recordcount=2000000000
# There is no default setting for operationcount but it is
# required to be set.
# The number of operations to use during the run phase.
operationcount=9000000
# The offset of the first insertion
insertstart=0
insertcount=500000000
core_workload_insertion_retry_limit = 10
core_workload_insertion_retry_interval = 1
# The number of fields in a record
fieldcount=10
# The size of each field (in bytes)
fieldlength=10
# Should read all fields
readallfields=true
# Should write all fields on update
writeallfields=false
fieldlengthdistribution=constant
readproportion=0.95
updateproportion=0.05
insertproportion=0
readmodifywriteproportion=0
scanproportion=0
maxscanlength=1000
scanlengthdistribution=uniform
insertorder=hashed
requestdistribution=zipfian
hotspotdatafraction=0.2
hotspotopnfraction=0.8
table=usertable
measurementtype=histogram
histogram.buckets=1000
timeseries.granularity=1000
The key was increasing native_transport_max_threads in the casssandra.yaml file.
Along with the increased settings in the comment (increasing connections in ycsb client as well as concurrent read/writes in cassandra), Cassandra jumped to ~80,000 ops/sec.
I've got pretty unusual latency patterns in my production setup:
the whole cluster (3 machines: 48 gig ram, 7500 rpm disk, 6 cores) shows latency spikes every 10 minutes, all machines at the same time.
See this screenshot.
I checked the logfiles and it seems as there are no compactions taking place at that time.
I've got 2k reads and 5k reads/sec. No optimizations have been made so far.
Caching is set to "ALL", hit rate for row cache is at ~0,7.
Any ideas? Is tuning memtable size an option?
Best,
Tobias