We have setup Cassandra single node of 3.11 with JDK 1.8 on ec2 with instance type t2.large which has 2 CPU and 7 GB of RAM.
We facing the issue that Cassandra keeps reaching CPU 100% even we do not have that much load.
We have 7GB of RAM but Cassandra not utilizing that Memory.it only uses 1.7-1.8 GB of RAM.
What configuration needs to change to reduce CPU utilization to not reach to 100%.
what best configuration to get better performance out of Cassandra.
Right now we able to get only about 100-120 read and 50-100 write operation per sec.
Please, some one helps us to understand the issue and what ways to improve performance configuration.
Related
I would like to know about hardware limitations in cluster planning (in TBs) specific to my use case. I have read few threads and documents related to it but some content seem to be over 5 years old. Thought of giving it a shot again:
Use case: Building a time-series cassandra cluster where there is from time-to-time bulk loading from data sources which are in Gigabytes. However, the end-user will majorly be focused in reading the data from the cluster. Quite rarely will be some update or delete on the rows
I have an initial hardware configuration with me to setup Cassandra cluster:
2*12 Cores
128 GB RAM
HDD SAS 3.27 TB
This is the initial plan that I come up with:
When I now speculate over the setup, and after reading the post:
should I further divide my nodes with lesser RAM, vCPUs and HDD?
If yes, what would be the good fit wrt my case?
we re facing OOM error when trying to execute multiple SQL query session via scheduled job .
Detailed error:
The error message is: org.postgresql.util.PSQLException:ERROR: Out of memory (seg6 slice5 sungpmsh0:40002 pid=13610)
Detail: VM protect failed to allocate 65584 bytes from system, VM Protect 5835 MB available
We tried
After reading the pivotal support doc, we are doing basic troubleshoot here
validated two memory parameters here
current setting in GPdb
GPDB vmprotect limit :8 GB
GPB statemen_mem: based on the vmprotect limit.as per reading it is responsible for running the query in the segment.
Test 2 Did Tuning the SQL queries. also, what should I tune here please guide?
Based on source
https://discuss.pivotal.io/hc/en-us/articles/201947018-Pivotal-Greenplum-GPDB-Memory-Configuration
https://discuss.pivotal.io/hc/en-us/articles/204268778-What-are-VM-Protect-failed-to-allocate-d-bytes-d-MB-available-error-
But still getting the same OOM error.
Do we need to increase the vmprotect limit? if Yes, then by which amount should we increase it?
How to handle concurrency at gpdb?
How much swap we need to add here when we are already running with 30 GB RAM.
currently, we have added 15GB swap here? is that ok ?
What is the query to identify host connection with Greenplum database ?
Thanks in advance
Do we need to increase the vmprotect limit? if Yes, then by which amount should we increase it?
There is a nice calculator on setting gp_vmem_protect_limit on Greenplum.org. The setting depends on how much memory, swap, and segments per host you have.
http://greenplum.org/calc/
You can be getting OOM errors for several reasons.
Bad query
Bad table distribution (skew)
Bad settings (like gp_vmem_protect_limit)
Not enough resources (RAM)
How to handle concurrency at gpdb?
More RAM, less segments per host, and workload management to limit the number of concurrent queries running.
How much swap we need to add here when we are already running with 30 GB RAM. currently, we have added 15GB swap here? is that ok ?
Only 30GB of RAM? That is pretty small. You can add more swap but it will slow down the queries compared to real RAM. I wouldn't use much more than 8GB of swap.
I recommend using 256GB of RAM or more especially if you are worried about concurrency.
What is the query to identify host connection with Greenplum database
select * from pg_stat_activity;
I deployed a Cassandra 2.2 ring composed by 4 nodes in the cloud with 8 vCPU and 8GB of ram. I am running some tests now with cassandra-stress and YCSB tools to test its performance. I am mainly interested in read requests with a small amount of write requests (95%/5%).
Running the experiments, I noticed that even setting a high number of threads (or clients) the CPU (and disk) does not saturate, but still always around the 60% of utilisation.
I am trying to figure out where is the bottleneck in my system. From the hardware point of view it seems all ok to me.
dstat
I also looked into the Cassandra configuration file to see if there are some tuning parameters to increase the system throughput. I increase the value of concurrent_read/write parameter, but it doesn't increase the performance.
The log file also does not contain any warning.
What it could be that is limiting my system?
Thanks
You might want to consider running cassandra-stress from outside the cluster and on multiple instances as described in
Usage of the Cassandra tool cassandra-stress
The following is the screenshot of htop on my dev server [arranged by MEM% used]:
I have only one cassandra instance running, but there are so many cassandra processes in htop, which is taking up 16 gb of ram.
The server is not being used in production, hence there are no queries being run on it at the moment.
I don't understand the reason why so many cassandra processes are running on my system, and how can I control this. Any suggestions will be highly appreciated.
Cassandra is a greedy process, It wont leave the RAM unless asked for.
You do not need to worry about the used RAM. If any other process will request for RAM, Cassandra process will leave the RAM.
Cassandra typically can take upto 16 GB RAM, which is the minimum prod recommendation from a performance point of view. Along with Cassandra there are a number of other processes which get the memory allocation like the JVM heap here. And as mentioned above it is a memory intensive technology.
I have successfully installed a multi-node Cassandra cluster with 10nodes,
The nodetool status command shows every node is UP and NORMAL.
but the Performance I am getting is very bad.
here are my results:
Operations /seconds = 4000
Read Latency = 13ms
write Latency = 10ms
I am using YCSB to measure performance
Tuning that I have done till now:
Consistency level = 1
Replication Factor = 3
Heap size = 4GB
My Hardware:
Each node is a VM with CentOS
2GHZ CPU with 8 cores
8GB RAM
1GB/ps N/W
Please let me know what more settings I can tweak to get maximum performance out of my cluster.
If you have 1 system with 10 VMs running on it and 1 disk, the performance of any (not in-memory) database will be bad. Especially with spinning disks (no matter how expensive they are) is going to be a major contention point. With a really good SSD you may be able to pull off a few instances, but performance stress testing will likely always hit either that or a CPU bottleneck (if things configured correctly for system).
Pretty good chance with 4gb heaps and a stress workload you are going to be hitting GC and memory issues, do you have any monitoring around that? Can use visualvm and connect to the ip:7199 (ip set in cassandra-env.sh).
8gb of ram per vm is on the minimum spec end. You want at least 8gb of JVM heap with space for the offheap stuff and OS. A 16gb system is likely sufficient. Once again the shared disk will kill performance so it will only go so far. but should be able to do far better than 4k/sec.