Can h2o allow to allocate more memory to standalone cluster? - python-3.x

I want to increase the h2o cluster memory up to 64gb. Can I do that yes or no? If no then it should be equal or less to my system memory? or if yes then how much I can allocate?
import h2o
h2o.init(nthreads=-1,max_mem_size='16g')
Thanks

The max_mem_size parameter goes straight to the Xmx parameter for the Java heap allocated to the h2o backend process.
Because java is a garbage collected language, you never want to make the java heap size larger than about 90% of physical memory or you run the risk of uncontrollable swapping.

Related

When does Node garbage collect?

I have a NodeJS server running on a small VM with 256MB of RAM and I notice the memory usage keeps growing as the server receives new requests. I read that an issue on small environments is that Node doesn't know about the memory constraints and therefore doesn't try to garbage collect until much later (so for instance, maybe it would only want to start garbage collecting once it reaches 512MB of used RAM), is it really the case?
I also tried using various flags such as --max-old-space-size but didn't see much change so I'm not sure if I have an actual memory leak or if Node just doesn't GC as soon as possible?
This might not be a complete answer, but it's coming from experience and might provide some pointers. Memory leak in NodeJS is one of the most challenging bugs that most developers could ever face.
But before we talk about memory leak, to answer your question - unless you explicitly configure --max-old-space-size, there are default memory limits that would take over. Since certain phases of Garbage collection in node are expensive (and sometimes blocking) steps, depending upon how much memory is available to it, it would delay (e.g. mark-sweep collection) some of the expensive GC cycles. I have seen that in a Machine with 16 GB of memory it would easily let the memory go as high as 800 MB before significant Garbage Collections would happen. But I am sure that doesn't make ~800 MB any special limit. It would really depend on how much available memory it has and what kind of application are you running. E.g. it is totally possible that if you have some complex computations, caches (e.g. big DB Connection Pools) or buggy logging libraries - they would themselves always take high memory.
If you are monitoring your NodeJs's memory footprint - sometime after the the server starts-up, everything starts to warm up (express loads all the modules and create some startup objects, caches warm up and all of your high memory consuming modules became active), it might appear as if there is a memory leak because the memory would keep climbing, sometimes as high as ~1 gb. Then you would see that it stabilizes (this limit used to be lesser in <v8 versions).
But sometimes there are actual memory leaks (which might be hard to spot if there is no specific pattern to it).
In your case, 256 MB seems to be meeting just the minimum RAM requirements for nodejs and might not really be enough. Before you start getting anxious of memory leak, you might want to pump it up to 1.5 GB and then monitor everything.
Some good resources on NodeJS's memory model and memory leak.
Node.js Under the Hood
Memory Leaks in NodeJS
Can garbage collection happen while the main thread is
busy?
Understanding and Debugging Memory Leaks in Your Node.js Applications
Some debugging tools to help spot the memory leaks
Node inspector |
Chrome
llnode
gcore

How garbage collector works with Xmx and Xms values

I have some doubts how the JVM garbage collector would work with different values of Xmx and Xms and machine memory size:
How would garbage collector would work in following scenarios:
1. Machine memory size = 7.5GB
Xmx = 1024Mb
Number of processes = 16
Xms = 512Mb
I know 16*512Mb already exceeds the machine memory size. How would the garbage collector would work in this scenario. I think the memory usage would be entire 7.5GB in this case. Will the processes would be able to do anything in this? Or they all will be stuck?
2. Machine memory size = 7.5GB
Xmx = 320MB
Xms is not defined.
Number of Processes = 16
In this, 16*320Mb should be less than 7.5GB. But in my case, memory usage is again reaching 7.5GB. Is it possible? Or I have probably have a memory leak in my application?
So, basically I want to understand when does garbage collector runs? Does it run whenever memory used by the application reached exactly Xmx value? Or they are not related at all?
There's a couple of things to understand here and then consider in your situation.
Each JVM process has its own virtual address space, which is protected from other processes by the operating system. The OS maps physical ranges of addresses (called pages) to the virtual address space of each process. When more physical pages are required than are available, pages that have not been used for a while will be written to disk (called paging) and can then be reused. When the data of these saved pages is required again they are read back to the same or different physical page. By doing this you can easily run 16 or more JVMs all with a heap of 1Gb on a machine with 8Gb of physical memory. The problem is that the more paging to disk that is required the more you are going to degrade the performance of your applications since disk IO is orders of magnitude slower than RAM access. This is also the reason that the heap space of a single JVM should not be bigger than physical memory.
The reason for having -Xms and -Xmx options is so you can specify the initial and maximum size of the heap. As your application runs and requires more heap space the JVM is able to increase the heap size within these bounds. A lot of time these values are set to be the same to eliminate the overhead of having to resize the heap while the application is running. Most operating systems only allocate physical pages when they're required so in your situation making -Xms small won't change the amount of paging that occurs.
The key point here is it's the virtual memory system of the operating system that makes it possible to appear to be using more memory than you physically have in your machine.

Used and Cached Memory In Spark

I would like to know if spark uses the linux cached memory or the linux used memory when we use the cache/persist method.
I'm asking this because I we have a custer and we see that the machines are used only at 50% used memory and 50% cached memory even when we have long jobs.
Thank you in advance,
Cached/buffered memory is memory that Linux uses for disk caching. When you read a file it is always read into memory cache. You can consider cached memory as free memory. JVM process of spark executor doesn't take directly cached memory. If you see that only 50% of memory is used on your machine, it means that spark executor definitely doesn't take more than 50% of memory. You can use top or ps utils to see how much memory spark executor actually takes. Usually it is a little bit more than current size of heap.

What is Hazelcast HD Memory? - on/off heap?

I have read this official post on the Hazelcast High Density Memory.
Am I right in assuming that this HD memory still consumes memory from the JVM (in which the application is running and not creating another JVM in the server and using it solely for hz instance)?
And that the only difference in this native memory configuration is that, the memory is allocated off heap rather than the default on-heap memory allocation?
HDMS or Hazelcast high Density Memory Store allocates memory into the same process space as the Java heap. That means the process still owns all the memory but the Java heap is otherwise independent and the Hazelcast allocated space (off-heap / non-Java-heap) is not target to Garbage Collection. Values are serialized and the resulting bytestream is copied to the native memory and when reading it is copied back into the Java heap area and sent to the requestor.
Imagine HDMS as a fancy malloc implementation :)
HDMS or High Density Memory Store is part of Hazelcast Enterprise HD offering. HDMS is a way for Java software to access multiple terabytes of memory per node without struggling with long and unpredictable garbage collection pauses. This memory store provides the benefits of "off-heap" memory using of many high-performance memory management techniques. HDMS solves problems related with garbage collection limitations so that applications can utilizes hardware memory more efficiently without the need of extra clusters. It is designed as a plug-gable memory manager which enables multiple memory stores for different data structures like IMap and JCache.

How is total memory in Java calculated

If I have 8GB RAM and I use the following on a 64-bit JVM
max heap size 6144MB
max perm gen space 2048MB
stack size 2MB
Q1 : Is perm gen space allocated from the max heap or a separate?
Q2 : if seperate then will the jvm with above settings get started or it will give error as heap + permgen + stack + program data would be above the total RAM?
First of all remember that the parameter you set with -Xmx (since that's the way I suppose you are setting your heap size) is the size of heap available to your Java code, not the amount of memory the JVM will consume. The difference comes from housekeeping structures that the JVM keeps (garbage collector structures, JIT overhead etc.), sometimes memory allocated by native code, buffers, and so on. The size of this additional memory depends on JVM version, the app you are running, and other factors, but I've seen JVMs allocate twice as much RAM as the heap size visible to the application. For the average case, I usually consider 50% to be a safe margin, with 20-30% acceptable. If you set your heap size to be close to amount of RAM in your machine, you will hit the swap and performance will suffer.
Now for the enumerated questions:
Perm gen is a separate space from the heap at least in Oracle's JDK 6. It is separate because it undergoes completely different memory management rules than the regular heap. By the way, 2 GB of pergen space is huge - are you sure you really need it?
Regarding the second question, see above. If this is Oracle's JDK, you are likely to run into trouble since perm and heap sums up but there will be additional memory, usually on the order of 20-50% of your 6 GB heap, and together with heap and perm space this will be more than your RAM. At first try this setup may work, but once both the heap and perm gen space usages come close to their configured limits, you could run out of memory.
heap and permgen are different memory parts of JVM. As such you will be consuming virtually all the memory on system. It is always better to leave 20% ram to be free for os/other tasks to execute properly.
Also, 2 gb for perm space is a huge figure. Have you looked at jar optimisation meaning that only relevant classes are present in the classpath?
This depends on the JVM and the version of the JVM.
In Hotspot Java 6, PermGen space is independent from the max heap size argument (-Xmx and -Xms control only the Young/OldGen sizes). The PermGen space size is given by the -XX:PermSize and -XX:MaxPermSize. See Java SE 6 HotSpot[tm] Virtual Machine Garbage Collection Tuning
UPDATE: In Hotspot Java 8, there is no PermGen space anymore and the objects reside in the Young/Old Generation spaces.

Resources