Is Linux RSS not equivalant to java Xmx + MaxMetaspaceSize? [duplicate] - linux

This question already has answers here:
Java process memory usage (jcmd vs pmap)
(3 answers)
Relation between memory host and memory arguments xms and xmx from Java
(1 answer)
Closed 5 years ago.
This is my ps -eo snapshot some process occupy 2.1GB memory.
Max size of its heap is 768mb and max size of its metaspace size is 256mb.
And I guess the process will cannot occupy over 1024mb(768+256). But It isn`t.
What is included in "RSS" except heap and metaspace? And how can I monitoring inside of "RSS" like heap stack analzer?

the RSS is the size of all the memory used for any purpose including the JVM, Shared libraries, thread stacks, direct memory, memory mapped files, native memory use, native GFX components. The heap and meta space are just two memory regions.
Note the virtual memory size is 15 GB.
To see what the memory is used for you can dump /proc/{pid}/smaps which shows all the memory regions (and there will be hundreds) and how much of each one is resident. (IntelliJ running on my machine has 403 memory regions)

Related

How garbage collector works with Xmx and Xms values

I have some doubts how the JVM garbage collector would work with different values of Xmx and Xms and machine memory size:
How would garbage collector would work in following scenarios:
1. Machine memory size = 7.5GB
Xmx = 1024Mb
Number of processes = 16
Xms = 512Mb
I know 16*512Mb already exceeds the machine memory size. How would the garbage collector would work in this scenario. I think the memory usage would be entire 7.5GB in this case. Will the processes would be able to do anything in this? Or they all will be stuck?
2. Machine memory size = 7.5GB
Xmx = 320MB
Xms is not defined.
Number of Processes = 16
In this, 16*320Mb should be less than 7.5GB. But in my case, memory usage is again reaching 7.5GB. Is it possible? Or I have probably have a memory leak in my application?
So, basically I want to understand when does garbage collector runs? Does it run whenever memory used by the application reached exactly Xmx value? Or they are not related at all?
There's a couple of things to understand here and then consider in your situation.
Each JVM process has its own virtual address space, which is protected from other processes by the operating system. The OS maps physical ranges of addresses (called pages) to the virtual address space of each process. When more physical pages are required than are available, pages that have not been used for a while will be written to disk (called paging) and can then be reused. When the data of these saved pages is required again they are read back to the same or different physical page. By doing this you can easily run 16 or more JVMs all with a heap of 1Gb on a machine with 8Gb of physical memory. The problem is that the more paging to disk that is required the more you are going to degrade the performance of your applications since disk IO is orders of magnitude slower than RAM access. This is also the reason that the heap space of a single JVM should not be bigger than physical memory.
The reason for having -Xms and -Xmx options is so you can specify the initial and maximum size of the heap. As your application runs and requires more heap space the JVM is able to increase the heap size within these bounds. A lot of time these values are set to be the same to eliminate the overhead of having to resize the heap while the application is running. Most operating systems only allocate physical pages when they're required so in your situation making -Xms small won't change the amount of paging that occurs.
The key point here is it's the virtual memory system of the operating system that makes it possible to appear to be using more memory than you physically have in your machine.

Why does a JVM report more committed memory than the linux process resident set size?

When running a Java app (in YARN) with native memory tracking enabled (-XX:NativeMemoryTracking=detail see https://docs.oracle.com/javase/8/docs/technotes/guides/vm/nmt-8.html and https://docs.oracle.com/javase/8/docs/technotes/guides/troubleshoot/tooldescr007.html), I can see how much memory the JVM is using in different categories.
My app on jdk 1.8.0_45 shows:
Native Memory Tracking:
Total: reserved=4023326KB, committed=2762382KB
- Java Heap (reserved=1331200KB, committed=1331200KB)
(mmap: reserved=1331200KB, committed=1331200KB)
- Class (reserved=1108143KB, committed=64559KB)
(classes #8621)
(malloc=6319KB #17371)
(mmap: reserved=1101824KB, committed=58240KB)
- Thread (reserved=1190668KB, committed=1190668KB)
(thread #1154)
(stack: reserved=1185284KB, committed=1185284KB)
(malloc=3809KB #5771)
(arena=1575KB #2306)
- Code (reserved=255744KB, committed=38384KB)
(malloc=6144KB #8858)
(mmap: reserved=249600KB, committed=32240KB)
- GC (reserved=54995KB, committed=54995KB)
(malloc=5775KB #217)
(mmap: reserved=49220KB, committed=49220KB)
- Compiler (reserved=267KB, committed=267KB)
(malloc=137KB #333)
(arena=131KB #3)
- Internal (reserved=65106KB, committed=65106KB)
(malloc=65074KB #29652)
(mmap: reserved=32KB, committed=32KB)
- Symbol (reserved=13622KB, committed=13622KB)
(malloc=12016KB #128199)
(arena=1606KB #1)
- Native Memory Tracking (reserved=3361KB, committed=3361KB)
(malloc=287KB #3994)
(tracking overhead=3075KB)
- Arena Chunk (reserved=220KB, committed=220KB)
(malloc=220KB)
This shows 2.7GB of committed memory, including 1.3GB of allocated heap and almost 1.2GB of allocated thread stacks (using many threads).
However, when running ps ax -o pid,rss | grep <mypid> or top it shows only 1.6GB of RES/rss resident memory. Checking swap says none in use:
free -m
total used free shared buffers cached
Mem: 129180 99348 29831 0 2689 73024
-/+ buffers/cache: 23633 105546
Swap: 15624 0 15624
Why does the JVM indicate 2.7GB memory is committed when only 1.6GB is resident? Where did the rest go?
I'm beginning to suspect that stack memory (unlike the JVM heap) seems to be precommitted without becoming resident and over time becomes resident only up to the high water mark of actual stack usage.
Yes, at least on linux mmap is lazy unless told otherwise. Anonymous pages are only backed by physical memory once they're written to (reads are not sufficient due to the zero-page optimization)
GC heap memory effectively gets touched by the copying collector or by pre-zeroing (-XX:+AlwaysPreTouch), so it'll always be resident. Thread stacks otoh aren't affected by this.
For further confirmation you can use pmap -x <java pid> and cross-reference the RSS of various address ranges with the output from the virtual memory map from NMT.
Reserved memory has been mmaped with PROT_NONE. Which means the virtual address space ranges have entries in the kernel's vma structs and thus will not be used by other mmap/malloc calls. But they will still cause page faults being forwarded to the process as SIGSEGV, i.e. accessing them is an error.
This is important to have contiguous address ranges available for future use, which in turn simplifies pointer arithmetic.
Committed-but-not-backed-by-storage memory has been mapped with - for example - PROT_READ | PROT_WRITE but accessing it still causes a page fault. But that page fault is silently handled by the kernel by backing it with actual memory and returning to execution as if nothing happened.
I.e. it's an implementation detail/optimization that won't be noticed by the process itself.
To give a breakdown of the concepts:
Used Heap: the amount of memory occupied by live objects according to the last GC
Committed: Address ranges that have been mapped with something other than PROT_NONE. They may or may not be backed by physical or swap due to lazy allocation and paging.
Reserved: The total address range that has been pre-mapped via mmap for a particular memory pool.
The reserved − committed difference consists of PROT_NONE mappings, which are guaranteed to not be backed by physical memory
Resident: Pages which are currently in physical ram. This means code, stacks, part of the committed memory pools but also portions of mmaped files which have recently been accessed and allocations outside the control of the JVM.
Virtual: The sum of all virtual address mappings. Covers committed, reserved memory pools but also mapped files or shared memory. This number is rarely informative since the JVM can reserve very large address ranges in advance or mmap large files.

How is total memory in Java calculated

If I have 8GB RAM and I use the following on a 64-bit JVM
max heap size 6144MB
max perm gen space 2048MB
stack size 2MB
Q1 : Is perm gen space allocated from the max heap or a separate?
Q2 : if seperate then will the jvm with above settings get started or it will give error as heap + permgen + stack + program data would be above the total RAM?
First of all remember that the parameter you set with -Xmx (since that's the way I suppose you are setting your heap size) is the size of heap available to your Java code, not the amount of memory the JVM will consume. The difference comes from housekeeping structures that the JVM keeps (garbage collector structures, JIT overhead etc.), sometimes memory allocated by native code, buffers, and so on. The size of this additional memory depends on JVM version, the app you are running, and other factors, but I've seen JVMs allocate twice as much RAM as the heap size visible to the application. For the average case, I usually consider 50% to be a safe margin, with 20-30% acceptable. If you set your heap size to be close to amount of RAM in your machine, you will hit the swap and performance will suffer.
Now for the enumerated questions:
Perm gen is a separate space from the heap at least in Oracle's JDK 6. It is separate because it undergoes completely different memory management rules than the regular heap. By the way, 2 GB of pergen space is huge - are you sure you really need it?
Regarding the second question, see above. If this is Oracle's JDK, you are likely to run into trouble since perm and heap sums up but there will be additional memory, usually on the order of 20-50% of your 6 GB heap, and together with heap and perm space this will be more than your RAM. At first try this setup may work, but once both the heap and perm gen space usages come close to their configured limits, you could run out of memory.
heap and permgen are different memory parts of JVM. As such you will be consuming virtually all the memory on system. It is always better to leave 20% ram to be free for os/other tasks to execute properly.
Also, 2 gb for perm space is a huge figure. Have you looked at jar optimisation meaning that only relevant classes are present in the classpath?
This depends on the JVM and the version of the JVM.
In Hotspot Java 6, PermGen space is independent from the max heap size argument (-Xmx and -Xms control only the Young/OldGen sizes). The PermGen space size is given by the -XX:PermSize and -XX:MaxPermSize. See Java SE 6 HotSpot[tm] Virtual Machine Garbage Collection Tuning
UPDATE: In Hotspot Java 8, there is no PermGen space anymore and the objects reside in the Young/Old Generation spaces.

Linux stack resident memory not reclaimed after stack unwind

Linux doesn't reclaim memory when it's not used anymore, if allocated on stack.
Heap
I dynamically allocate (malloc/mmap) 1GB on heap.
Before the allocation:
$ top
virtual memory 1GB
resident memory ~ 0
memset 1GB
$ top
virtual memory 1GB
resident memory 1GB
deallocate (free/munmap) of 1GB - reclaimed as expected
$ top
virtual memory 1GB
resident memory ~ 0
Stack
I dynamically allocate 1GB on stack.
Before:
$ top
virtual memory 1GB
resident memory ~ 0
memset 1GB
$ top
virtual memory 1GB
resident memory 1GB
deallocate (stack unwind) of 1GB - resident memory is still 1GB, even after deallocating! Why?
$ top
virtual memory 1GB
resident memory 1GB
Why, when the stack unwind the resident memory (physical pages are still in use)?
The heap segment allocation is done with mmap and the stack segment allocation is done with mmap - so why there is difference in the behavior of reclaim?
Because the OS thinks that once you have use that much stack, you probably will do that again. The OS can't really know [from outside your application] what your application is about to do in the future. It would be rather difficult to figure out when it's OK to free some of the stack, and you get all sorts of interesting race-conditions in the OS where you have to stop the application from running simply to reduce it's stack - and then it suddenly needs it again, so it needs to be allocated.
Using mmap, on the other hand, there is a distinct munmap to tell the OS "I have no interest in this memory". So it gets freed then and there [as part of the munmap call itself - specifically, in zap_pte_range the pages themselves are freed and given back to the OS.
It shouldn't really be a big issue, unless the following conditions are fulfilled:
1. You are running on an embedded system that doesn't have swap.
2. Your application runs for a long period of time after it has returned for using a lot of stack (assuming you actually do need this much memory as stack, you will have to have that memory available WHEN it's needed, so it's obviously only a problem if the application then doesn't need the stack later on and that period is long - whatever your definition of long is).
3. Your system doesn't have enough RAM to fulful other RAM needs in other applications.
The reason I say that is that although the stack is using that much memory, if the application isn't using the ram for a long time, and the system is running low on memory, it will swap it out to disk - to be swapped in at a later stage IF it's needed.
I would also say that using such large amounts of stackspace is generally considered a bad idea. Running out of space on stack [either hitting the limit or "there just isn't enough memory available"] is nearly always fatal.
So whilst I often suggest using stack-space to store temporary variables, I think 1GB of stack is quite excessive. A few megabytes should be acceptable, but hundreds of megabytes or more is probably a sign of "you should probably store things in another way".

Increase of virtual memory without increse of VmSize

I searched for my problem in Google and at this site but i still don't understand the solution.
I have piece of MPI program which RECV some data. Program crashes on big arrays with error of insufficient virtual memory, and so i started to consider /proc/self/status file.
Before MPI_RECV it was:
Name: model.exe
VmPeak: 841640 kB
VmSize: 841640 kB
VmHWM: 15100 kB
VmRSS: 15100 kB
VmData: 760692 kB
And after:
Name: model.exe
VmPeak: 841640 kB
VmSize: 841640 kB
VmHWM: 719980 kB
VmRSS: 719980 kB
VmData: 760692 kB
I test it on Ubuntu and through System Monitor i saw this memory increasing. But i was confused that there are no changes in VmSize(and VmPeak) parameters.
And the question is - what is the indicator of real memory usage?
Does it mean, that true indicator is VmRSS? (and VmSize is only allocated but still not used memory)
(The possible solution to your problem is the last paragraph)
Memory allocation on most modern operating systems with virtual memory is a two-phase process. First, a portion of the virtual address space of the process is reserved and the virtual memory size of the process (VmSize) increases accordingly. This creates entries in the so-called process page table. Pages are initially not associated with phyiscal memory frames, i.e. no physical memory is actually used. Whenever some part of this allocated portion is actually read from or written to, a page fault occurs and the operating system installs (maps) a free page from the physical memory. This increases the resident set size of the process (VmRSS). When some other process needs memory, the OS might store the content of some infrequently used page (the definition of "infrequently used page" is highly implementation-dependent) to some persistent storage (hard drive in most cases, or generally to the swap device) and then unmap up. This process decreases the RSS but leaves VmSize intact. If this page is later accessed, a page fault would again occur and it will be brought back. The virutal memory size only decreases when virtual memory allocations are freed. Note that VmSize also counts for memory mapped files (i.e. the executable file and all shared libraries it links to or other explicitly mapped files) and shared memory blocks.
There are two generic types of memory in a process - statically allocated memory and heap memory. The statically allocated memory keeps all constants and global/static variables. It is part of the data segment, whose size is shown by the VmData metric. The data segment also hosts part of the program heap, where dynamic memory is being allocated. The data segment is continuous, i.e. it starts at a certain location and grows upwards towards the stack (which starts at a very high address and then grows downwards). The problem with the heap in the data segment is that it is managed by a special heap allocator that takes care of subdividing the contiguous data segment into smaller memory chunks. On the other side, in Linux dynamic memory can also be allocated by directly mapping virtual memory. This is usually done only for large allocations in order to conserve memory, since it only allows memory in multiples of the page size (usually 4 KiB) to be allocated.
The stack is also an important source of heavy memory usage, especially if big arrays are allocated in the automatic (stack) storage. The stack starts near the very top of the usable virtual address space and grows downwards. In some cases it could reach the top of the data segment or it could reach the end of some other virtual allocation. Bad things happen then. The stack size is accounted in the VmStack metric and also in the VmSize.
One can summarise it as so:
VmSize accounts for all virtual memory allocations (file mappings, shared memory, heap memory, whatever memory) and grows almost every time new memory is being allocated. Almost, because if the new heap memory allocation is made in the place of a freed old allocation in the data segment, no new virtual memory would be allocated. It decreses whenever virtual allocations are being freed. VmPeak tracks the max value of VmSize - it could only increase in time.
VmRSS grows as memory is being accessed and decreases as memory is paged out to the swap device.
VmData grows as the data segment part of the heap is being utilised. It almost never shrinks as current heap allocators keep the freed memory in case future allocations need it.
If you are running on a cluster with InfiniBand or other RDMA-based fabrics, another kind of memory comes into play - the locked (registered) memory (VmLck). This is memory which is not allowed to be paged out. How it grows and shrinks depends on the MPI implementation. Some never unregister an already registered block (the technical details about why are too complex to be described here), others do so in order to play better with the virtual memory manager.
In your case you say that you are running into a virtual memory size limit. This could mean that this limit is set too low or that you are running into an OS-imposed limits. First, Linux (and most Unixes) have means to impose artificial restrictions through the ulimit mechanism. Running ulimit -v in the shell would tell you what the limit on the virtual memory size is in KiB. You can set the limit using ulimit -v <value in KiB>. This only applies to processes spawned by the current shell and to their children, grandchilren and so on. You need to instruct mpiexec (or mpirun) to propagate this value to all other processes, if they are to be launched on remote nodes. if you are running your program under the control of some workload manager like LSF, Sun/Oracle Grid Engine, Torque/PBS, etc., there are job parameters which control the virtual memory size limit. And last but not least, 32-bit processes are usually restricted to 2 GiB of usable virtual memory.

Resources