Does NodeJS respect Docker virtualization and resource limits?

Does NodeJS respect Docker virtualization and resource limits? - node.js

It is known that some applications aren't aware of Linux kernel isolation and virtualization features such as cgroups. This includes system utils like top, free and ps, but also platforms like Java.
I've recently read an article which suggests that when running JVMs in Kubernetes, you should enforce manual limits on the Java heap size to avoid errors.
I cannot find anywhere whether this is also true for NodeJS. Do I need to implement something like above to set --max_old_space_size=XXX on my NodeJS application in Kubernetes?

A NodeJS process will try an allocate memory regardless of the container limits, just like Java.
Setting a limit on the process will help stop the OS from killing the process, particularly in constrained environments where Node might try allocate past the memory limit even though Node could probably run inside the limit.
If you are running an app that is close to using the memory limit then adding the memory limit settings just changes the failure scenario. NodeJS and the JVM will have a chance to exit with an out of memory error (OOM) rather than be killed by the operating system. The process will likely slow to a crawl as it nears the memory limit and the garbage collector tries the best it can to keep the process below the limit.
Note that the old space is only one of multiple memory spaces in NodeJS. Only the new space (semi spaces) and old space can be limited.
--max_semi_space_size (max size of a semi-space (in MBytes), the new space consists of two semi-spaces)
type: int default: 0
--max_old_space_size (max size of the old space (in Mbytes))
type: int default: 0
The other heap spaces are generally small and static enough not to worry about.
Modules that run native code can allocate memory outside the heap and can't be limited by an option.

Related

node.js heap memory and used heap size [pm2]

I am currently running node.js using pm2.
And recently, I was able to check "custom metrics" using the pm2 monit command.
Here, information such as Heap size, used heap size, and active requests are shown.
I don't know how the heap size is determined. Actually, I checked pm2 running on different servers.
Each was set to 95mib / 55mib, and accordingly, the used heap size was different.
Also, is the heap usage closer to 100% the better?
While searching on "StackOverflow" to find related information, I saw the following article.
What does Heap Usage mean in PM2
Also what means active requests ? It is continuously zero.
Thank you!
[Edit]
env : ubuntu18.04 [ ec2 - t3.micro ]
node version : v10.15
[Additional]
server memory : 1GB [ 40~50% used ]
cpu : vCPU (2) [ 1~2% used ]

The heap is the RAM used by the program you're asking PM2 to manage and monitor. Heap space, in Javascript and similar language runtimes, is allocated when your program creates objects and released upon garbage collection. Your runtime asks your OS for more heap space whenever it needs it: when active allocations exceed the free space. So your heap size will probably grow as your program starts up. That's normal.
Most programs allocate and release lots of objects as they do their work, so you should not try to optimize the % usage of your heap. When your program is running at a steady state – that is, after it has started up — you'll find the % utilization creeping up until garbage collection happens, and then dropping back. For example, a nodejs/express web server allocates req and res objects for each incoming request, then uses them, then drops them so the garbage collector can reclaim their RAM.
If your allocated heap size keeps growing, over minutes or hours, you probably have a memory leak. That is a programming bug: a problem you should do your best to solve. You should look up how that works for your application language. Other than that, don't worry too much about heap usage.
Active requests count work being done via various asynchronous objects like file writers and TCP connections. Unless your program is very busy it stays near zero.
Keep an eye on loop delay if your program does computations. If it creeps up, some computation function is hogging Javascript.

Apache Yarn - Allocating more memory than the Physical memory or RAM

I was considering changing yarn.nodemanager.resource.memory-mb to a value higher than the RAM available on my machine. Doing a quick search revealed that not many people are doing this.
Many long lived applications on yarn, are bound to have a jvm heap space allocation in which some of their memory is more frequently used and some of it is rarely used. In this case, it would make perfect sense for such applications to have some of their infrequently used memory portions swapped to disk and reallocating the available physical memory to other applications that need it.
Given the above background, can someone either please corroborate my reasoning or offer an alternate perspective? Also, can you please also clarify how the parameter yarn.nodemanager.vmem-pmem-ratio would work in the above case?

This is not a good idea. Trying to use more memory than what is available will eventually crash your Node Manager hosts.
There already is a feature called opportunistic containers which uses spare memory not used by the NMs and adds more containers to those hosts. Refer to:
YARN-1011 [Umbrella] Schedule containers based on utilization of currently allocated containers
In addition, Pepperdata has a product that does almost the same thing if you can't wait for YARN-1011.
https://www.pepperdata.com/products/capacity-optimizer/
As for yarn.nodemanager.vmem-pmem-ratio, don't enable this as it's not recommended anymore.
YARN-782 vcores-pcores ratio functions differently from vmem-pmem ratio in misleading way

How to determine Node --max_old_space_size for a given memory limit?

We run Node processes inside Docker containers with hard memory caps of 1GB, 2GB, or 4GB. Each container generally just runs a single Node process (plus maybe a tiny shell script wrapper). Let's assume for the purposes of this question that the Node process never forks more processes.
For our larger containers, if we don't set --max_old_space_size ourselves, then in the version of Node we use (on a 64-bit machine) it defaults to 1400MB. (This will change to 2048MB in a later version of Node.)
Ideally we want our Node process to use as much of the container as possible without going over and running out of memory. The question is — what number should we use? My understanding is that this particular flag tunes the size of one of the largest pools of memory used by Node, but it's not the only pool — eg, there's a "non-old" part of the heap, there's stack, etc. How much should I subtract from the container's size when setting this flag in order to stay away from the cgroup memory limit but still make maximal use of the amount of memory allowed in this container?
I do note that from the same place where kMaxOldSpaceSizeHugeMemoryDevice is defined, it looks like the default "max semi space" is 16MB and the default "max executable size" is 512MB. So I suspect this means I should subtract at least 528 from the container's memory limit when determining the value for this flag. But surely there are other ways that Node uses memory?
(To be more specific, we are a hosting service that sells containers of particular sizes to our users, most of which use them for Node processes. We'd like to be able to advise our customers as to what flag to set so that they neither are killed by our limits nor pay us for capacity that Node's configuration doesn't let them actually use.)

There is, unfortunately, no particularly satisfactory answer to this question.
The constants you've found control the size of the garbage-collected heap, but as you've already guessed, there are many ways to consume memory that's not part of that heap:
For example, big strings and big TypedArrays are typically managed by the embedder (i.e. node and its modules, not V8 itself), and outside the GC'ed heap.
Node modules, in general, can consume whatever memory they want. Presumably you don't want to restrict what modules your customers can run, but that implies that you also can't predict how much memory those modules are going to require.
V8 also uses temporary memory outside the GC'ed heap for parsing and compilation. Numbers depend on the code that's being run, from a few kilobytes up to a gigabyte or more (e.g. for huge asm.js codebases) anything is possible. These are relatively short-lived memory consumption peaks, so on the one hand you probably don't want to limit long-lived heap memory to account for them, but on the other hand that means they can make your processes run into the system limit.

Erlang garbage collection

I need your help in investigation of issue with Erlang memory consumption. How typical, isn't it?
We have two different deployment schemes.
In first scheme we running many identical nodes on small virtual machines (in Amazon AWS),
one node per machine. Each machine has 4Gb of RAM.
In another deployment scheme we running this nodes on big baremetal machines (with 64 Gb of RAM), with many nodes per machine. In this deployment nodes are isolated in docker containers (with memory limit set to 4 Gb).
I've noticed, that heap of processes in dockerized nodes are hogging up to 3 times much more RAM, than heaps in non-dockerized nodes with identical load. I suspect that garbage collection in non-dockerized nodes is more aggressive.
Unfortunately, I don't have any garbage collection statistics, but I would like to obtain it ASAP.
To give more information, I should say that we are using HiPE R17.1 on Ubuntu 14.04 with stock kernel. In both schemes we are running 8 schedulers per node, and using default fullsweep_after flag.
My blind suggestion is that Erlang default garbage collection relies (somehow) on /proc/meminfo (which is not actual in dockerized environment).
I am not C-guy and not familiar with emulator internals, so could someone point me to places in Erlang sources that are responsible for garbage collection and some emulator options which I can use to tweak this behavior?

Unfortunately VMs often try to be smarter with memory management than necessary and that not always plays nicely with the Erlang memory management model. Erlang tends to allocate and release a large number of small chunks of memory, which is very different to normal applications, which usually allocate and release a small number of big chunks of memory.
One of those technologies is Transparent Huge Pages (THP), which some OSes enable by default and which causes Erlang nodes running in such VMs to grow (until they crash).
https://access.redhat.com/solutions/46111
https://www.digitalocean.com/company/blog/transparent-huge-pages-and-alternative-memory-allocators/
https://docs.mongodb.org/manual/tutorial/transparent-huge-pages/
So, ensuring THP is switched off is first thing you can check.
The other is trying to tweak the memory options used when starting the Erlang VM itself, for example see this post:
Erlang: discrepancy of memory usage figures
Resulting options that worked for us:
-MBas aobf -MBlmbcs 512 -MEas aobf -MElmbcs 512
Some more theory about memory allocators:
http://www.erlang-factory.com/static/upload/media/139454517145429lukaslarsson.pdf
And more detailed description of memory allocator flags:
http://erlang.org/doc/man/erts_alloc.html

First thing to know, is that garbage collection i Erlang is process based. Each process is GC in their own time, and independently from each other. So garbage collection in your system is only dependent on data in your processes, not operating system itself.
That said, there could be some differencess between memory consumption from Eralang point of view, and System point of view. That why comparing erlang:memory to what your system is saying is always a good idea (it could show you some binary leaks, or other memory problems).
If you would like to understand little more about Erlang internals I would recommend those two talks:
https://www.youtube.com/watch?v=QbzH0L_0pxI
https://www.youtube.com/watch?v=YuPaX11vZyI
And from little better debugging of your memory management I could reccomend starting with http://ferd.github.io/recon/

Allocating memory for process in linux

Dear all, I am using Redhat linux ,How to set maximum memory for particular process. For eg i have to allocate maximum memory usage to eclipse alone .Is it possible to allocate like this.Give me some solutions.

ulimit -v 102400
eclipse
...gives eclipse 100MiB of memory.

You can't control memory usage; you can only control virtual memory size, not the amount of actual memory used, as that is extremely complicated (perhaps impossible) to know for a single process on an operating system which supports virtual memory.
Not all memory used appears in the process's virtual address space at a given instant, for example kernel usage, and disc caching. A process can change which pages it has mapped in as often as it likes (e.g. via mmap() ). Some of a process's address space is also mapped in, but not actually used, or is shared with one or more other processes. This makes measuring per-process memory usage a fairly unachievable goal in practice.
And putting a cap on the VM size is not a good idea either, as that will result in the process being killed if it attempts to use more.
The right way of doing this in this case (for a Java process) is to set the heap maximum size (via various well-documented JVM startup options). However, experience suggests that you should not set it less than 1Gb.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string