If I give row_cache_size_in_mb =5Gb in cassandra.yaml file, does cassandra reserves 5GB from Heap Memory? - cassandra

I am running my cassandra cluster having memory 32 GB on each node,
And row cache capacity (row_cache_size_in_mb) 5GB,
Just want to know, does 5gb memory ram is reserved for row caching from my heap??

It will let it grow to that size over time. Can use nodetool info to see the current size and limit and nodetool setcachecapacity to change it at runtime. Note that its kinda an estimate though and heap can grow a bit larger. I would be sure to test that the row_cache is actually improving things though since in a lot of cases having no row cache can be faster.

Related

Right Spark executor memory size given certain data size

A lot of the discussions I found on the internet on resource allocation was about the max memory config for --executor-memory, taking into account a few memory overheads.
But I would imagine that for simple job like reading in a 100MB file and then count # of rows, with a cluster of a total 500GB memory available across nodes, I shouldn't ask for # of executors and memory allocation that, with all memory overheads accounted for, could take all 500GB memory, right? Even 1 executor of 3GB or 5GB memory seems to be an overkill. How should I think about the right memory size for a job?
Thank you!

Cassanda, the available RAM is used to a maximum

I want to make a simple query over approximately 10 mio rows.
I have 32GB RAM (20GB is free). And Cassandra is using so much memory, that the available RAM is used to a maximum, and the process is killed.
How can I optimize Cassandra? I have read about "Tuning Java resources" and changing the Java heap sizing, but I still have no solution.
Cassandra will use up as much memory as is available to it on the system. It's a greedy process and will use any available memory for caching, similar to the way the kernel page cache works. Don't worry if Cassandra is using all your hosts memory, it will just be in cache and will be released to other processes if necessary.
If your query is suffering from timeouts this will probably be from reading too much data from a single partition so that the query doesn't return in under read_request_timeout_in_ms. If this is the case you should look at making your partition sizes smaller.

What's to think of when increasing disk size on Cassandra nodes?

I run a 10-node Cassandra cluster in production. 99% writes; 1% reads, 0% deletes. The nodes have 32 GB RAM; C* runs with 8 GB heap. Each node has a SDD for commitlog and 2x4 TB spinning disks for data (sstables). The schema uses key caching only. C* version is 2.1.2.
It can be predicted that the cluster will run out of free disk space in not too long. So its storage capacity needs to be increased. The client prefers increasing disk size over adding more nodes. So a plan is to take the 2x4 TB spinning disks in each node and replace by 3x6 TB spinning disks.
Are there any obvious pitfalls/caveats to be aware of here? Like:
Can C* handle up to 18 TB data size per node with this amount of RAM?
Is it feasible to increase the disk size by mounting a new (larger) disk, copy all SS tables to it, and then mount it on the same mount point as the original (smaller) disk (to replace it)?
I would recommend adding nodes instead of increasing the data size of your current nodes. Adding nodes would take advantage of Cassandra's distribution feature by having small easily replaceable nodes.
Furthermore the recommended size of a single node in a cluster for a spinning disk is around 1 TB. Once you go higher than that, I can only image that performance will decrease significantly.
Not to mention if a node loses its data, it will take a long time to recover it as it has to stream a huge amount of data from the other nodes.
Can C* handle up to 18 TB data size per node with this amount of RAM?
This depends heavily on your workload.
Is it feasible to increase the disk size by mounting a new (larger) disk, copy all SS tables to it, and then mount it on the same mount point as the original (smaller) disk (to replace it)?
I don't see a reason why it would not work.
It's an anti-pattern in Cassandra. Distributed database is key feature of Cassandra

Cassandra out of memory heap

I have 4 cassandra nodes in cluster and one column family which has 10 columns where row cannot grow very wide (maybe max 1000 columns).
I have "peak" writes where I insert up to 500 000 records in 5-10 minutes range.
I use node.js driver: node-cassandra-cql
3 nodes are working fine but one node crashes every time on heavy writes.
All nodes currently have around 1.5 GB data size and problematic node has 1.9 GB data size.
All nodes have max heap space at 1GB (I have 4 GB RAM on machines so default cassandra config file calculated this amount of heap)
I use default cassandra configuration except I increased write/read timeouts.
Question: Does anyone knows what could be reason for this?
Is heap size really that small?
What and how to configure cassandra cluster for this use case (heavy writes at small time range and other time actually doing nothing or just small writes)
I haven't tried to increase heap size manually, first I would like to know if maybe there is something other to configure instead just increasing it.

Cassandra didn't read data from ssttable

Test the cassandra with YCSB and using the workloadc(read100%) .
And iostat always show 0 with read.
Configurations:
data is on sdb, 24G data , 8G heap size, default memtable size,
disable row-cache and key-cache.
As my thought, uniform request would cause the memtable miss, and lookup the data on ssttable,
so the data dir iostat should not be zero.
How could 8G heap's memtable store all the 24G data?
Anybody hit the same problem?
There's no magic going on here. Your request workload must not be as random as you thought.
I happen to have a copy of YCSB checked out and workloadc uses requestdistribution=zipfian which is NOT uniform.
how much total memory on the machine? If you have 32GB or more of RAM on the machine then it could also be the OS page cache - which would be outside of the Cassandra process (e.g. not the heap). In scenrios like that, the OS (assuming its linux) will wind up caching the entire 24GB in memory and you'll get little disk activity.

Resources