How to enable G1 gc in apache cassandra 3.11.4 - cassandra

I am using apache cassandra 3.11.4 and java 8.I am using 48 GB ram and facing a lot of gc issues.
I wanted to move to G1 from CMS and made below changes in jvm.options, commented all CMS changes.Enabled G1.
### CMS Settings
####-XX:+UseParNewGC
####-XX:+UseConcMarkSweepGC
####-XX:+CMSParallelRemarkEnabled
####-XX:SurvivorRatio=8
####-XX:MaxTenuringThreshold=1
####-XX:CMSInitiatingOccupancyFraction=75
####-XX:+UseCMSInitiatingOccupancyOnly
####-XX:CMSWaitDuration=10000
####-XX:+CMSParallelInitialMarkEnabled
####-XX:+CMSEdenChunksRecordAlways
#### some JVMs will fill up their heap when accessed via JMX, see CASSANDRA-6541
####-XX:+CMSClassUnloadingEnabled
### GC logging options -- uncomment to enable
--XX:+PrintGCDetails
--XX:+PrintGCDateStamps
--XX:+PrintHeapAtGC
--XX:+PrintTenuringDistribution
--XX:+PrintGCApplicationStoppedTime
--XX:+PrintPromotionFailure
--XX:PrintFLSStatistics=1
--Xloggc:/var/log/cassandra/gc.log
--XX:+UseGCLogFileRotation
--XX:NumberOfGCLogFiles=10
--XX:GCLogFileSize=10M
But I am getting CMS as GC. Pleases suggest how to chage from CNS to G1 gc.

So that's happening because CMS is the default garbage collector in Java 8. Even if you comment it out, unless you've added the G1 settings, it's still going to use CMS.
In between the CMS and GC Logging sections, there should be one more section in that file- the G1 section, which looks like this:
### G1 Settings (experimental, comment previous section and uncomment section below to enable)
## Use the Hotspot garbage-first collector.
-XX:+UseG1GC
#
## Have the JVM do less remembered set work during STW, instead
## preferring concurrent GC. Reduces p99.9 latency.
-XX:G1RSetUpdatingPauseTimePercent=5
#
## Main G1GC tunable: lowering the pause target will lower throughput and vise versa.
## 200ms is the JVM default and lowest viable setting
## 1000ms increases throughput. Keep it smaller than the timeouts in cassandra.yaml.
-XX:MaxGCPauseMillis=500
## Optional G1 Settings
# Save CPU time on large (>= 16GB) heaps by delaying region scanning
# until the heap is 70% full. The default in Hotspot 8u40 is 40%.
#-XX:InitiatingHeapOccupancyPercent=70
# For systems with > 8 cores, the default ParallelGCThreads is 5/8 the number of logical cores.
# Otherwise equal to the number of cores when 8 or less.
# Machines with > 10 cores should try setting these to <= full cores.
#-XX:ParallelGCThreads=16
# By default, ConcGCThreads is 1/4 of ParallelGCThreads.
# Setting both to the same value can reduce STW durations.
#-XX:ConcGCThreads=16
If you make sure that UseG1GC, UpdatingPauseTimePercent, and MaxGCPauseMillis are uncommented, then Cassandra should start with G1 as the garbage collector.

Related

Multiple Cassandra node goes down

WE have a 12 node cassandra cluster across 2 different datacenter. We are migrating the data from sql DB to cassandra through a net application and there is another .net app thats reads data from the cassandra. Off recently we are seeing one or the other node going down (nodetool status shows DN and the service is stopped on it). Below is the output of the nodetool status. WE have to start the service to again get it working but it stops again.
https://ibb.co/4P1T453
Path to the log: https://pastebin.com/FeN6uDGv
So in looking through your pastebin, I see a few things that can be adjusted.
First I'm reasonably sure that this is your primary issue:
Unable to lock JVM memory (ENOMEM). This can result in part of the JVM being swapped out,
especially with mmapped I/O enabled. Increase RLIMIT_MEMLOCK or run Cassandra as root.
From GNU Error Codes:
Macro: int ENOMEM
“Cannot allocate memory.” The system cannot allocate more virtual
memory because its capacity is full.
-Xms12G, -Xmx12G, -Xmn3000M,
How much RAM is on your instance? From what I'm seeing your node is dying from an OOM (Out of Memory error). My guess is that you're designating too much RAM to the heap, and there isn't enough for the OS/page-cache. In fact, I wouldn't designate much more than 50%-60% of RAM to the heap.
For example, I mostly build instances on 16GB of RAM, and I've found that a 10GB max heap is about as high as you'd want to go on that.
-XX:+UseParNewGC, -XX:+UseConcMarkSweepGC
In fact, as you're using CMS GC, I wouldn't go higher than 8GB for max heap size.
Maximum number of memory map areas per process (vm.max_map_count) 65530 is too low,
recommended value: 1048575, you can change it with sysctl.
This means you haven't adjusted your limits.conf or sysctl.conf. Check through the guide (DSE 6.0 - Recommended Production Settings), but generally it's a good idea to add the following to these files:
/etc/limits.conf
* - memlock unlimited
* - nofile 100000
* - nproc 32768
* - as unlimited
/etc/sysctl.conf
vm.max_map_count = 1048575
Note: After adjusting sysctl.conf, you'll want to run a sudo sysctl -p or reboot.
Is swap disabled? : false,
You will want to disable swap. If Cassandra starts swapping contents of RAM to disk, things will get really slow. Run a swapoff -a and then edit /etc/fstab and remove any swap entries.
tl;dr; Summary
Set your initial and max heap sizes to 8GB (heap new size is fine).
Modify your limits.conf an sysctl.conf files appropriately.
Disable swap.
It's also a good idea to get on the latest version of 3.11 (3.11.4).
Hope this helps!

cassandra hints are high due to gc pause events - this is creating a cycle

Our clusters were running great from days and all sudden today morning hints are high. And they will not go down naturally.
What we are seeing is, gc is taking longer time than 200ms which is causing nodes to be DOWN and thereby leading to increase in hints. This is a vicious circle, and I am not sure how to fix it.
Machine config: 128GB RAM, 2TB hard disk, 24 core machine. JVM heap size 16Gb, GCpausemillisecods 200ms, parallelgcthreads 8.
Please let me know, how can I tune GC config to break this gc pause to hint loop?
We increased the size of JVM heap to 32GB, and took memtable off heap for write heavy customers.

Why JVM calculated PS Survivor Space size too low for parallel collector

I am using JDK1.6.0_16 JVM for the java application that is hosted on an Linux Intel procesor 80 cores machine.
while starting the Java application I have only two options configured
-Xms2048m -Xmx8000m in the JVM Options (after java command).
I see that PS Old Gen is calculated as 5.21G and PS Eden is calcuated 2.6G but the PS Survivor space is 25MB.
I have exactly same JVM in production and in that PS Survivor Space size is shown as 888MB. I am seeing these sizes in java mission control Memory tab.
The cache size (output of /proc/cpuinfo) is showing 24656 in both UAT and production boxes.
Dont think it will make any difference for JVM but still mentioning that the there was very low load on machine at the time of starting JVM.
Can you please advise what parameters does JVM consider for calculating the PS Survivor Space size?
From oracle gc tuning article 1 and article 2 :
Survivor Space Sizing
You can use the parameter SurvivorRatio can be used to tune the size of the survivor spaces, but this is often not important for performance. For example, -XX:SurvivorRatio=6 sets the ratio between eden and a survivor space to 1:6.
In other words, each survivor space will be one-sixth the size of eden, and thus one-eighth the size of the young generation (not one-seventh, because there are two survivor spaces).
If survivor spaces are too small, copying collection overflows directly into the tenured generation. If survivor spaces are too large, they will be uselessly empty.
The NewSize and MaxNewSize parameters control the new generation’s minimum and maximum size. Regulate the new generation size by setting these parameters equal. The bigger the younger generation, the less often minor collections occur.
NewRatio: The size of the young generation relative to the old generation is controlled by NewRatio. For example, setting -XX:NewRatio=3 means that the ratio between the old and young generation is 1:3, the combined size of eden and the survivor spaces will be fourth of the heap.
As correctly quoted by Peter Lawrey, setting survivor depends on type of your application. From gc tuning article by Oracle, here are the guidelines.
First decide the maximum heap size you can afford to give the virtual machine. Then plot your performance metric against young generation sizes to find the best setting
If the total heap size is fixed, then increasing the young generation size requires reducing the tenured generation size. Keep the tenured generation large enough to hold all the live data used by the application at any given time, plus some amount of slack space (10 to 20% or more).
Subject to the previously stated constraint on the tenured generation: Grant plenty of memory to the young generation and increase the young generation size as you increase the number of processors, because allocation can be parallelized. The default is calculated from NewRatio and the -Xmx setting
Can you please advise what parameters does JVM consider for calculating the PS Survivor Space size?
It must large enough to never actually fill up after a collection of the Eden space, otherwise you will get Full GCs which is undesirable.
What is an optimal Survivor space size depends on your application. I suggest you test you application under realistic loads with a larger Eden and Survivor space than you imagine useful and see how much of that space ever gets used and add 50% to 100% based on what you see is used.
The machine has 256G of physical memory out of which ~200G
The default heap size is 32 GB and I suggest you use this default unless you have a good reason to reduce it.
-XX:SurvivorRatio=1
This is usually a bad idea and having a high survivor ratio like 8 is usually better.
Setting the value to 8 didnt have any effect
Most likely you have a low allocation rate. I usually set a high Young space, alike -Xmn8g or even -Xmn24g but whether this is a good/bad idea depends on your application.

JVM GC settings for a simple use case

I'm struggling with getting the right settings for my JVM.
Here's the use case:
Tomcat is serving requests (300req/s). But they are very fast (key-value lookup) so I don't have any performance problems. Everything would work fine till I have to refresh the data it's serving every 3 hours. You can imagine I have a big HashMap and I'm just doing lookups. During data reload a create a temporary HashMap and then I swap it. I need to load quite a lot of data (~800MB in memory every time).
The problem I have that during those loads from time to time Tomcat stops responding.
Initially the problem was promotion failures and FullGC but I got around those problems by tweaking the settings.
As you might notice I already decreased the value when the CMS collector kicks in. I don't get any promotion failure or anything like that any more. The young generation is reasonably small to make the minor collection fast. I've increased the SurvivorRatio because all the request objects die young and what doesn't should be automatically promoted to old generation.(the data being load).
But I'm still seeing 503 errors in Tomcat during the data load. In gc.log my minor collections started to be slow during this process. They are now in seconds comparing to miliseconds. I've tried slowing down the load process to give a breather to the GC but I doesn't seem to work...
The problem is especially problematic the moment I reach the capacity of old generation. CMS kicks in, frees up memory and then later the allocations are pretty slow. I don't see any errors in gc.log any more.
What can I do differently? I know fragmentation might be a problem but I'm not getting promotion failures. The machine is a 8 core server. Does decreasing the number of GCThread make any sense? Will setting a lower thread priority for the data loading thread make sense?
Is there a way to kick off CMS collector periodically in the background? The data that's being swapped can be actually immediately be garbage collected.
I'm open to any suggestions!
Here are my JVM settings.
-Xms14g
-Xmx14g
-XX:+UseConcMarkSweepGC
-XX:+UseParNewGC
-XX:+AlwaysPreTouch
-XX:MaxNewSize=256m
-XX:NewSize=256m
-XX:MaxPermSize=128m
-XX:PermSize=128m
-XX:SurvivorRatio=24
-XX:+UseCMSInitiatingOccupancyOnly
-XX:CMSInitiatingOccupancyFraction=88
-XX:+UseCompressedStrings
-XX:+DisableExplicitGC
JDK 1.6.33
Tomcat 6
gc.log snippet:
line 7 the data load starts
line 20 it stops
http://safebin.net/9124
Looking at that attached log and seeing those huge increases in minor GC times leads me to belive that your machine is under extremely heavy load from other processes than the JVM.
My reasoning in this is that when your minor GC is taking place, all application threads are stopped. Hence, nothing your application does should be able to affect the minor GC times seeing that your new gen is constant in size.
However, if there are a lot of load from other processes on the machine during this time, the GC threads will compete for execution time and you could see this behavior.
Could you check the CPU usage from other processes when your data load is running?
Edit: Looking a bit more on the logs I come up with another possible explanation.
It seems that the target survivor space is full (ParNew goes down to exactly 10048K each "slow" GC). That would mean that objects are promoted to old gen directly which possibly could slow this down. I would try to increase the size of the New gen and lower the survivor ratio. Even maybe try to run without setting the new gen size or the survivor rate at all and see how the JVM managed to optimize this (although beware that the JVM usually does a poor job for optimizing for bursts like this).
your load lasts about 90s and is interrupted by a GC every 1s or so yet you have a 14G heap which has a steady state occupancy (assuming the surrounding log lines are steady state) of only about 5G which means you have a lot of memory going to waste. I think the previous answer looks to be correct (based on the data presented) when it says your survivor spaces are too small. If it reasonable does nothing but lookups the rest of the time then a perfectly reasonable strategy would be something like
tenuring threshold = 0 (or 1)
eden size > 2x the working set so maybe 1.5-2G (i.e. allow the current live data and the working copy to reside entirely in eden)
tenured = whatever is left
The point here being to try and completely avoid a young collection during the load phase. However a tenured threshold of 0 would mean the previous version would likely be in tenured and you'd eventually see a possibly lengthy collection to clean it up. Another option might be to go the other way round and have tenured big enough to fit 2-3 versions of the data and eden the rest with a view to attempting to minimise the frequency of a young collection and help tenured be collected as quickly as possible.
What works best really depends on what else the app is doing the rest of the time.
The cms trigger seems quite high for a large heap btw, if you only start collecting at 88% then does it have time to finish the job before a fullgc is forced? I suppose it might be quite safe if you're actually doing v little allocation most of the time.

Swappiness in JVM

I stumbled upon some interesting problems last week where I noticed that the one of our production servers, which has Apache HTTP Server over Tomcat running, stopped, reporting an HTTP outage.
On further investigating this issue it seemed like it was due to JVM causing memory pages to be swapped out quickly. This resulted in the swap space being completely populated, causing a memory issue the next time a page is moved to swap.
Investigating further, it seems that there is a JVM swappiness factor that's set to 60% by default with some of the our Linux distributions. Based on some research, it seems that this could be a high value for a web service that has high traffic. Our swap space was set to 2 GB.
The swap details are:
Filename Type Size Used Priority
/dev/sda3 partition 2096472 1261420 -1
From /proc/meminfo
SwapCached: 944668 kB
Our JVM properties are as follows:
-Xmx6g -Xms4g -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:PermSize=512M -XX:MaxPermSize=1024M -XX:NewSize=2g -XX:MaxNewSize=2g -XX:ParallelGCThreads=8
The server runs with 12 GB RAM.
Swappiness does not go well with the JVM's GC process. Thus I tried reducing the swappiness to 0, but that did not change anything. We still see cases where the entire swap space is consumed and resulting in an OutOfMemory error.
How can I tune the JVM performance?
The swappiness factor is actually a system-wide setting in Linux, not JVM-specific.
My personal experience with some kinds of applications is that they need swap to be turned off in order to work without hiccups, period. While I can't say for sure if your application belongs to this category, I have seen apps being swapped out despite enough free RAM being available. As you pointed out, GC and swap don't mix well. Swappiness is just an indication to the OS, so its impact on how much gets swapped out may be larger or smaller depending on circumstances. My suggestion would be to try turning swap completely off.
In order to do this, you need to be root or have sudo access. Comment out the line describing swap in /etc/fstab, which will prevent swap from being turned on after a reboot. Then, in order to turn swap off for the current server run, run swapoff -a. This may take up to a few minutes if there's data in swap that needs to be pulled back into RAM. Then, check the last line of the output of free to ensure that the total size of available swap is 0. After this, observe your app in order to determine if turning swap off solved your issue.
If your physical RAM is 12 GB and your heap is set to only 6 GB (max), then you have plenty of RAM. Is this server dedicated to the Tomcat instance?
Take a look at the verbose GC log files to see what the memory usage is over time. Check your access log files to confirm whether the access requests have increased over time.

Resources