Prevent java.lang.OutOfMemoryError: unable to create new native thread - multithreading

We are encountering OOM errors when our system is low in memory, but there are still some things like 2Go of free memory.
First I was suspecting a thread count limit so I made a little test that creates threads and varies the -XX:ThreadStackSize. The amount of memory dedicated to the stack size seems to affect the maximum of threads but I can't explain how.
example.
by setting -XX:ThreadStackSize=2000k it can create 11000 ( which
seems to be a hard limit).
-XX:ThreadStackSize=2100k = 11000 threads
-XX:ThreadStackSize=2200k = 9000 threads
-XX:ThreadStackSize=2300k = 4000 threads
-XX:ThreadStackSize=2400k = 1 threads
it seems strange to be able to create just one thread of 2400ko while there is still 2Go of free memory.
And the process seems to use at it maximum 160Mo or residual memory, but virtual memory can grow to 9To.
I don't understand what is the limit, what is acceptable or not.
It is a problem when we have to decide if we can add a JVM application on a server.
It seems heap + perm + ThreadStackSize * threads is not enougt.
Because add 1Go application to a server that has 2Go usable memory seems good at first glance.
Why I have out of memory before reaching the maximum memory and how do you calculate the amount of memory a JVM application need?

Related

How to make G1GC garbage collector run more frequently?

I have a Openjdk 11.0.17 + Spring Boot application and this is my GC Configuration --XX:+UseG1GC -XX:MaxGCPauseMillis=1000 -Xms512m -Xmx8400m. When the application is running and the heap usage increases as per the incoming web traffic. But I am noticing that when there is no incoming traffic and no processing going on the total heap usage stays the same (high say 7gigs out of 8gig max heap) for longer times. When I take a heap dump the total usage reduces to less than 2% (say 512m). If the heap usage increases it kills the application with OOM (out of memory) error.
How to make G1GC garbage collector run more frequently ? or Is there a way to tune the GC to not make the application oom killed.
But I am noticing that when there is no incoming traffic and no processing going on the total heap usage stays the same (high say 7gigs out of 8gig max heap) for longer times.
By default G1GC only triggers a collection when some of its regions exceed some thresholds, i.e. only tiggers while its allocating objects. Lowering the InitiatingHeapOccupancyPercent can trigger those earlier.
Alternatively, if you upgrade to JDK12 or newer you can use G1PeriodicGCInterval for time-triggered collections in addition to allocation-triggered ones.
Another option is to switch to ZGC or Shenandoah which have options to trigger idle/background GCs to reduce heap size during idle periods.
or Is there a way to tune the GC to not make the application oom killed.
That problem generally is separate from high heap occupancy during idle periods. The GCs will try to perform emergency collections before issuing an OOM. So even if there was lots of floating garbage before the attempted allocation it would have been collected before throwing that error.
One possibility is that there's enough heap for steady-state operation but some load spike caused a large allocation that exceeded the available memory.
But you should enable GC logging to figure out the exact cause and what behavior lead up to it. There are multiple things that can cause OOMs, a few aren't even heap-related.

node.js heap memory and used heap size [pm2]

I am currently running node.js using pm2.
And recently, I was able to check "custom metrics" using the pm2 monit command.
Here, information such as Heap size, used heap size, and active requests are shown.
I don't know how the heap size is determined. Actually, I checked pm2 running on different servers.
Each was set to 95mib / 55mib, and accordingly, the used heap size was different.
Also, is the heap usage closer to 100% the better?
While searching on "StackOverflow" to find related information, I saw the following article.
What does Heap Usage mean in PM2
Also what means active requests ? It is continuously zero.
Thank you!
[Edit]
env : ubuntu18.04 [ ec2 - t3.micro ]
node version : v10.15
[Additional]
server memory : 1GB [ 40~50% used ]
cpu : vCPU (2) [ 1~2% used ]
The heap is the RAM used by the program you're asking PM2 to manage and monitor. Heap space, in Javascript and similar language runtimes, is allocated when your program creates objects and released upon garbage collection. Your runtime asks your OS for more heap space whenever it needs it: when active allocations exceed the free space. So your heap size will probably grow as your program starts up. That's normal.
Most programs allocate and release lots of objects as they do their work, so you should not try to optimize the % usage of your heap. When your program is running at a steady state – that is, after it has started up — you'll find the % utilization creeping up until garbage collection happens, and then dropping back. For example, a nodejs/express web server allocates req and res objects for each incoming request, then uses them, then drops them so the garbage collector can reclaim their RAM.
If your allocated heap size keeps growing, over minutes or hours, you probably have a memory leak. That is a programming bug: a problem you should do your best to solve. You should look up how that works for your application language. Other than that, don't worry too much about heap usage.
Active requests count work being done via various asynchronous objects like file writers and TCP connections. Unless your program is very busy it stays near zero.
Keep an eye on loop delay if your program does computations. If it creeps up, some computation function is hogging Javascript.

How garbage collector works with Xmx and Xms values

I have some doubts how the JVM garbage collector would work with different values of Xmx and Xms and machine memory size:
How would garbage collector would work in following scenarios:
1. Machine memory size = 7.5GB
Xmx = 1024Mb
Number of processes = 16
Xms = 512Mb
I know 16*512Mb already exceeds the machine memory size. How would the garbage collector would work in this scenario. I think the memory usage would be entire 7.5GB in this case. Will the processes would be able to do anything in this? Or they all will be stuck?
2. Machine memory size = 7.5GB
Xmx = 320MB
Xms is not defined.
Number of Processes = 16
In this, 16*320Mb should be less than 7.5GB. But in my case, memory usage is again reaching 7.5GB. Is it possible? Or I have probably have a memory leak in my application?
So, basically I want to understand when does garbage collector runs? Does it run whenever memory used by the application reached exactly Xmx value? Or they are not related at all?
There's a couple of things to understand here and then consider in your situation.
Each JVM process has its own virtual address space, which is protected from other processes by the operating system. The OS maps physical ranges of addresses (called pages) to the virtual address space of each process. When more physical pages are required than are available, pages that have not been used for a while will be written to disk (called paging) and can then be reused. When the data of these saved pages is required again they are read back to the same or different physical page. By doing this you can easily run 16 or more JVMs all with a heap of 1Gb on a machine with 8Gb of physical memory. The problem is that the more paging to disk that is required the more you are going to degrade the performance of your applications since disk IO is orders of magnitude slower than RAM access. This is also the reason that the heap space of a single JVM should not be bigger than physical memory.
The reason for having -Xms and -Xmx options is so you can specify the initial and maximum size of the heap. As your application runs and requires more heap space the JVM is able to increase the heap size within these bounds. A lot of time these values are set to be the same to eliminate the overhead of having to resize the heap while the application is running. Most operating systems only allocate physical pages when they're required so in your situation making -Xms small won't change the amount of paging that occurs.
The key point here is it's the virtual memory system of the operating system that makes it possible to appear to be using more memory than you physically have in your machine.

Mono 4.2.2 garbage collection really slow/leaking on Linux with multiple threads?

I have an app that processes 3+GB of data into 300MB of data. Run each independent dataset sequentially on the main thread, its memory usage tops out at about 3.5GB and it works fine.
If I run each dataset concurrently on 10 threads, I see the following:
Virtual memory usage climbs steadily until allocations fail and it crashes. I can see GC is trying to run in the stack trace)
CPU utilization is 1000% for periods, then goes down to 100% for minutes, and then cycles back up. The app is easily 10x slower when run with multiple threads, even though they are completely independent.
This is mono 4.2.2 build for Linux with large heap support, running on 128GB RAM with 40 logical processors. I am running mono-sgen and have tried all the custom GC settings I could think of (concurrent mark-sweep, max heap size, etc).
These problems do not happen on Windows. If I rewrite code to do significant object pooling, I get farther in the dataset before running OOM, but the fate is the same. I have verified that I have no memory leaks using multiple tools and good-old printf-debugging.
My best theory is that lots of allocations across lots of threads are a weak case for the GC, and most of that wall-clock time is spent with my work threads suspended.
Does anyone have any experience with this? Is there a way I can help the GC get out of that 100% rut it gets stuck in, and to not run out of memory?

vm/min_free_kbytes - Why Keep Minimum Reserved Memory?

According to this article:
/proc/sys/vm/min_free_kbytes: This controls the amount of memory that is kept free for use by special reserves including “atomic” allocations (those which cannot wait for reclaim)
My question is that what does it mean by "those which cannot wait for reclaim"? In other words, I would like to understand why there's a need to tell the system to always keep a certain minimum amount of memory free and under what circumstances will this memory be used? [It must be used by something; don't see the need otherwise]
My second question: does setting this memory to something higher than 4MB (on my system) leads to better performance? We have a server which occasionally exhibit very poor shell performance (e.g. ls -l takes 10-15 seconds to execute) when certain processes get going and if setting this number to something higher will lead to better shell performance?
(link is dead, looks like it's now here)
That text is referring to atomic allocations, which are requests for memory that must be satisfied without giving up control (i.e. the current thread can not be suspended). This happens most often in interrupt routines, but it applies to all cases where memory is needed while holding an essential lock. These allocations must be immediate, as you can't afford to wait for the swapper to free up memory.
See Linux-MM for a more thorough explanation, but here is the memory allocation process in short:
_alloc_pages first iterates over each memory zone looking for the first one that contains eligible free pages
_alloc_pages then wakes up the kswapd task [..to..] tap into the reserve memory pools maintained for each zone.
If the memory allocation still does not succeed, _alloc pages will either give up [..] In this process _alloc_pages executes a cond_resched() which may cause a sleep, which is why this branch is forbidden to allocations with GFP_ATOMIC.
min_free_kbytes is unlikely to help much with the described "ls -l takes 10-15 seconds to execute"; that is likely caused by general memory pressure and swapping rather than zone exhaustion. The min_free_kbytes setting only needs to allow enough free pages to handle immediate requests. As soon as normal operation is resumed, the swapper process can be run to rebalance the memory zones. The only time I've had to increase min_free_kbytes is after enabling jumbo frames on a network card that didn't support dma scattering.
To expand on your second question a bit, you will have better results tuning vm.swappiness and the dirty ratios mentioned in the linked article. However, be aware that optimizing for "ls -l" performance may cause other processes to become slower. Never optimize for a non-primary usecase.
All linux systems will attempt to make use of all physical memory available to the system, often through the creation of a filesystem buffer cache, which put simply is an I/O buffer to help improve system performance. Technically this memory is not in use, even though it is allocated for caching.
"wait for reclaim", in your question, refers to the process of reclaiming that cache memory that is "not in use" so that it can be allocated to a process. This is supposed to be transparent but in the real world there are many processes that do not wait for this memory to become available. Java is a good example, especially where a large minimum heap size has been set. The process tries to allocate the memory and if it is not instantly available in one large contiguous (atomic?) chunk, the process dies.
Reserving a certain amount of memory with min_free_kbytes allows this memory to be instantly available and reduces the memory pressure when new processes need to start, run and finish while there is a high memory load and a full buffer cache.
4MB does seem rather low because if the buffer cache is full, any process that wants an immediate allocation of more than 4MB will likely fail. The setting is very tunable and system-specific, but if you have a few GB of memory available it can't hurt to bump up the reserve memory to 128MB. I'm not sure what effect it will have on shell interactivity, but likely positive.
This memory is kept free from use by normal processes. As #Arno mentioned, the special processes that can run include interrupt routines, which must be run now (as it's an interrupt), and finish before any other processes can run (atomic). This can include things like swapping out memory to disk when memory is full.
If the memory is filled an interrupt (memory management) process runs to swap some memory into disk so it can free some memory for use by normal processes. But if vm.min_free_kbytes is too small for it to run, then it locks up the system. This is because this interrupt process must run first to free memory so others can run, but then it's stuck because it doesn't have enough reserved memory vm.min_free_kbytes to do its task resulting in a deadlock.
Also see:
https://www.linbit.com/en/kernel-min_free_kbytes/ and
https://askubuntu.com/questions/41778/computer-freezing-on-almost-full-ram-possibly-disk-cache-problem (where the memory management process has so little memory to work with it takes so long to swap little by little that it feels like a freeze.)

Resources