What is the difference between heap and swap memory in Ubuntu/Any OS? How does this affect in choosing Cassandra?
Heap memory is what the jvm uses, swap is what OS uses to push things not used often onto disk and save memory. Its very recommended to disable swap on C* hosts, as the old gen objects in jvm may get pushed onto disk, and when a GC occurs and it gets touched it will be very slow. If it can C* will pin its memory to prevent itself from being swapped, but you should disable it anyway.
Related
I have some doubts how the JVM garbage collector would work with different values of Xmx and Xms and machine memory size:
How would garbage collector would work in following scenarios:
1. Machine memory size = 7.5GB
Xmx = 1024Mb
Number of processes = 16
Xms = 512Mb
I know 16*512Mb already exceeds the machine memory size. How would the garbage collector would work in this scenario. I think the memory usage would be entire 7.5GB in this case. Will the processes would be able to do anything in this? Or they all will be stuck?
2. Machine memory size = 7.5GB
Xmx = 320MB
Xms is not defined.
Number of Processes = 16
In this, 16*320Mb should be less than 7.5GB. But in my case, memory usage is again reaching 7.5GB. Is it possible? Or I have probably have a memory leak in my application?
So, basically I want to understand when does garbage collector runs? Does it run whenever memory used by the application reached exactly Xmx value? Or they are not related at all?
There's a couple of things to understand here and then consider in your situation.
Each JVM process has its own virtual address space, which is protected from other processes by the operating system. The OS maps physical ranges of addresses (called pages) to the virtual address space of each process. When more physical pages are required than are available, pages that have not been used for a while will be written to disk (called paging) and can then be reused. When the data of these saved pages is required again they are read back to the same or different physical page. By doing this you can easily run 16 or more JVMs all with a heap of 1Gb on a machine with 8Gb of physical memory. The problem is that the more paging to disk that is required the more you are going to degrade the performance of your applications since disk IO is orders of magnitude slower than RAM access. This is also the reason that the heap space of a single JVM should not be bigger than physical memory.
The reason for having -Xms and -Xmx options is so you can specify the initial and maximum size of the heap. As your application runs and requires more heap space the JVM is able to increase the heap size within these bounds. A lot of time these values are set to be the same to eliminate the overhead of having to resize the heap while the application is running. Most operating systems only allocate physical pages when they're required so in your situation making -Xms small won't change the amount of paging that occurs.
The key point here is it's the virtual memory system of the operating system that makes it possible to appear to be using more memory than you physically have in your machine.
I have read this official post on the Hazelcast High Density Memory.
Am I right in assuming that this HD memory still consumes memory from the JVM (in which the application is running and not creating another JVM in the server and using it solely for hz instance)?
And that the only difference in this native memory configuration is that, the memory is allocated off heap rather than the default on-heap memory allocation?
HDMS or Hazelcast high Density Memory Store allocates memory into the same process space as the Java heap. That means the process still owns all the memory but the Java heap is otherwise independent and the Hazelcast allocated space (off-heap / non-Java-heap) is not target to Garbage Collection. Values are serialized and the resulting bytestream is copied to the native memory and when reading it is copied back into the Java heap area and sent to the requestor.
Imagine HDMS as a fancy malloc implementation :)
HDMS or High Density Memory Store is part of Hazelcast Enterprise HD offering. HDMS is a way for Java software to access multiple terabytes of memory per node without struggling with long and unpredictable garbage collection pauses. This memory store provides the benefits of "off-heap" memory using of many high-performance memory management techniques. HDMS solves problems related with garbage collection limitations so that applications can utilizes hardware memory more efficiently without the need of extra clusters. It is designed as a plug-gable memory manager which enables multiple memory stores for different data structures like IMap and JCache.
I am very new to Linux memory management. While reading some documents on the topic, I had some basic questions.
Below is my config:
vm.swappiness=10
vm.vfs_cache_pressure=140
vm.min_free_kbytes=2013265
My understanding is, if free memory falls below vm.min_free_kbytes, then the OS will reclaim memory.
Is Memory reclaim a deletion of unwanted files or copying to Swap memory from RAM?
If it's copying to Swap memory from RAM, then if I am not using Swap memory, what will happen?
Is swappiness always greater than the vm.min_free_kbytes?
What is the significance of vm.vfs_cache_pressure?
Memory reclaim is the mechanism of creating more free RAM pages, by throwing somewhere else the data residing in them. It has nothing to do with files. When more RAM is needed, data is dropped from RAM (trashed away, if it can be refetched) or copied to the swap file (so the data will be refetchable).
If there is not a swap file, but some data should be saved to the (non existent) swap area, then an out-of-memory error happens. Typically, this is notified to the process which is trying to get the memory (via alloc() and similar) - the alloc() fails and returns NULL. The process can choose what to do, or even crash. If the memory is needed by the kernel itself (normally quite rare), a PANIC happens and the system locks completely.
swappiness is, in percentage, the tendency of the kernel to use the swap, even if not strictly needed, in order to have plenty of ram ready for memory requests. Simply put, a 100% swappiness means the kernel tries to always swap, a swappiness of 0 means the kernel tries to not do swap (there are some special values however). min_free_kbytes indicates real kilobytes, it is not a percentage, and it is the minimum amount that should always be free in order to let the kernel to work well. Even starting a memory reclaim could require some more ram to do the job: it would be catastrophic if, to get some memory, you need just a little memory but you don't have it! :-)
vfs_cache_pressure is again a percentage. It indicates how much the kernel tries to get rid of (memory) cache used for the file system (vfs=virtual file system). The cache for the filesystem is quite a good candidate to throw away, because it keeps information easily readable from the disk. Unfortunately, if the computer needs frequently to use the file system, it has to read, and read again, and read again always the same data. Caching is a big performance boost. Of course, if a system does little disk I/O, then this cache is the best candidate to throw away when memory hungry.
All this things are succintly explained here: https://www.kernel.org/doc/Documentation/sysctl/vm.txt
If I have 8GB RAM and I use the following on a 64-bit JVM
max heap size 6144MB
max perm gen space 2048MB
stack size 2MB
Q1 : Is perm gen space allocated from the max heap or a separate?
Q2 : if seperate then will the jvm with above settings get started or it will give error as heap + permgen + stack + program data would be above the total RAM?
First of all remember that the parameter you set with -Xmx (since that's the way I suppose you are setting your heap size) is the size of heap available to your Java code, not the amount of memory the JVM will consume. The difference comes from housekeeping structures that the JVM keeps (garbage collector structures, JIT overhead etc.), sometimes memory allocated by native code, buffers, and so on. The size of this additional memory depends on JVM version, the app you are running, and other factors, but I've seen JVMs allocate twice as much RAM as the heap size visible to the application. For the average case, I usually consider 50% to be a safe margin, with 20-30% acceptable. If you set your heap size to be close to amount of RAM in your machine, you will hit the swap and performance will suffer.
Now for the enumerated questions:
Perm gen is a separate space from the heap at least in Oracle's JDK 6. It is separate because it undergoes completely different memory management rules than the regular heap. By the way, 2 GB of pergen space is huge - are you sure you really need it?
Regarding the second question, see above. If this is Oracle's JDK, you are likely to run into trouble since perm and heap sums up but there will be additional memory, usually on the order of 20-50% of your 6 GB heap, and together with heap and perm space this will be more than your RAM. At first try this setup may work, but once both the heap and perm gen space usages come close to their configured limits, you could run out of memory.
heap and permgen are different memory parts of JVM. As such you will be consuming virtually all the memory on system. It is always better to leave 20% ram to be free for os/other tasks to execute properly.
Also, 2 gb for perm space is a huge figure. Have you looked at jar optimisation meaning that only relevant classes are present in the classpath?
This depends on the JVM and the version of the JVM.
In Hotspot Java 6, PermGen space is independent from the max heap size argument (-Xmx and -Xms control only the Young/OldGen sizes). The PermGen space size is given by the -XX:PermSize and -XX:MaxPermSize. See Java SE 6 HotSpot[tm] Virtual Machine Garbage Collection Tuning
UPDATE: In Hotspot Java 8, there is no PermGen space anymore and the objects reside in the Young/Old Generation spaces.
By reading "understanding linux network internals" and "understanding linux kernel" the two books as well as other references, I am quite confused and need some clarifications about the "memory cache" and "memory pool" techniques.
1) Are they the same or different techniques?
2) If not the same, what makes the difference, or the distinct goals?
3) Also, how does the Slab Allocator come in?
Regarding the slab allocator:
So imagine memory is flat that is you have a block of 4 gigs contiguous memory. Then one of your programs reqeuests a 256 bytes of memory so what the memory allocator has to do is choose a suitable block of 256 bytes from this 4 gigs. So now you your memory looks something like
<============256bytes=======================>
(each = is a contiguous block of memory). Some time passes and a lot of programs operating with the memory require more 256 blocks or more or less so in the end your memory might look like:
<==256==256=256=86=68=121===>
so it gets fragmented and then there is no trace of your beautiful 4gig block of memory - this is fragmentation. Now, what the slab allocator would do is keep track of allocated objects and once they are not used anymore it will say that the memory is free when in fact it will be retained in some sort of List (You might wanna read about FreeLists).
So now imagine that the first program relinquish the 256 bytes allocated and then a new would like to have 256 bytes so instead of allocating a new chunk of the main memory it might re-use the lastly freed 256 bytes without having to go through the burden of searching the physical memory for appropriate contiguous block of space. This is how you essentially implement the memory cache. This is done so that memory fragmentation is reduced overall because you might end up in situation where memory is so fragmented that it is unusable and the memory-manager has to do some magic to get you block of appropriate size. Where as using a slab allocator pro-actively combats (but doesn't eliminate) the problem.
Linux memory allocator A.K.A slab allocator maintains the frequently used list/pool of memory objects of similar or approximate size. slab is giving extra flexibility to programmer to create their own pool of frequently used memory objects of same size and label it as programmer want,allocate, deallocate and finally destroy it.This cache is known to your driver and private to it.But there is a problem, during memory pressure there are high chances of allocation failures which could be not acceptable in some drivers, then what to do better always reserve some memory handy so that we never feel the memory crunch, since kmem cache is more generic pool mechanism we need some one who can always maintain minimum required memory and that's our buddy memory pool .
Lookaside Caches - The cache manager in the Linux kernel is sometimes called the slab allocator. You might end up allocating many objects of the same size over and over so by using this mechanism you just can allocate many objects in the same size and then use them later, without the need to allocate many objects over and over.
Memory Pool is just a form of lookaside cache that tries to always keep a list of memory around for use in emergencies, so when the memory pool is created, the allocation functions (slab allocators) create a pool of preallocated objects so you can acquire them when you need.