I have an application written in Node using many features such as the cluster module.
I need to know the memory usage of my app on a specific time, what I am thinking of is looping through the active workers and sum the output of all of them but I don't know if the output value will be correct. any one here can help me please?
In fact I can't seem also to know the true meanings of the three "rss","heapTotal","heapUsed" mean.. I googled it and what I found is what important to monitor is "heapTotal" & "heapUsed", is this correct?
RSS is the resident set size, the portion of the process's memory held in RAM (as opposed to the swap space or the part held in the filesystem).
The heap is the portion of memory from which newly allocated objects will come from (think of malloc in C, or new in JavaScript).
Good Tutorial
More about heap Wikipedia.
Happy Helping!
Related
Above is a picture summarizing my understanding on memoryHeap and their memoryTypes generated by Vulkan for a given system setup. Thanks to the answers on this topics shared by #NicolBolas 1, 2, 3 and an answer by #krOoze 4.
Still, I have a few outstanding questions that I like help on and I have indicated them in red and elaborated below per comment of #NicolBolas.
Questions
Why are there 9 memoryType in sysRam when there are only 4x RAMs?
What is the physical meaning of each memoryType? How to use each of
these memoryType?
Why are there 2 memory types for GPU RAM? Does this mean each
memoryType of the GPU RAM is 6144MB/2 = 3072MB?
Is there a size limit to each memoryTypes? If yes, how to discover
their limits?
Why are the free memory reported by Vulkan and cat /proc/meminfo
different?
Thanks for your help in advance.
Why are there 9 memoryType in sysRam when there are only 4x RAMs? What is the physical meaning of each memoryType? How to use each of these memoryType?
Why are there 2 memory types for GPU RAM?
I don't know what you mean by "4x RAMs"; I suspect you're talking about how many physical memory sticks are in your machine. Memory types (or heaps for that matter) don't care about such things.
As for the rest, it is always important to remember how memory works in Vulkan. Heaps represent actual physical RAM to one degree or another. Memory types represent ways of allocating that memory. But uses of memory have their own memory type restrictions.
For example, if an image has the color attachment usage parameter, the implementation can force you to use a specific memory type for the memory backing that image. And images that don't have color attachment can be restricted to using other memory types, but not that one. And so forth.
Apparently, NVIDIA does this for certain combinations of usage and formats. Simply querying the available memory types isn't enough to know how to go about allocating memory. You have to figure out what buffers and images (complete with format and usage parameters) you will use. And then you have to query what restrictions the implementation imposes on them.
Your application must adapt to these restrictions.
Is there a size limit to each memoryTypes?
It wouldn't make sense for there to be such a thing. Memory types define how memory is allocated, not how much storage is available. The latter is the job of memory heaps.
Why are the free memory reported by Vulkan and cat /proc/meminfo different?
Vulkan has no API to report free memory, only total memory. Asking for the amount of free memory is folly. Memory (or at least, virtual pages in your application) are shared by all threads in your application. And GPU memory especially is shared among all processes on the machine. By the time you get an answer back, the amount of memory may have changed. So when you go to allocate memory based on what you were told was available, it may not be available anymore.
Better to allocate first and deal with failure to allocate if it happens.
You can ask for the total memory so that you can decide on how you want to allocate chunks of memory. But that's how you determine what is and is not available, not by querying a size.
[metaquestion] Why is X in Vulkan?
Because it is allowed by the Vulkan specification. Rest is implementation detail, and only the implementer\vendor knows for sure, and may depend on how well he slept.
Why are there 9 memoryType in sysRam when there are only 4x RAMs? What is the physical meaning of each memoryType? How to use each of these memoryType?
Answered in Why does vkGetPhysicalDeviceMemoryProperties return multiple identical memory types?. One for VkBuffers, one for VkImages, and one per depth format (i.e. 7). Equals 9; mystery solved.
Why are there 2 memory types for GPU RAM? Does this mean each memoryType of the GPU RAM is 6144MB/2 = 3072MB?
Likely similar reason as 1. I speculate one for VkBuffers, one for VkImages. Someone with NVIDIA could test with vkGetXMemoryRequirements.
It does not neccessarily mean RAM/2. It is not completely out of the question, but then again implementer should instead expose separate Heap if that is so.
Is there a size limit to each memoryTypes? If yes, how to discover their limits?
Roughly the Heap size. You may get significantly less due to fragmentation. And due to other processes sharing the same. Your impl may also allocate some itself for its internal needs.
You discover the limit when you get VK_ERROR_OUT_OF_DEVICE_MEMORY. (BTW mostly works the same as on CPU side, where you get bad_alloc).
There is limit to size of single allocation (not recommended to allocate > 4 GB), and to the count of allocations too (maxMemoryAllocationCount).
Why are the free memory reported by Vulkan and cat /proc/meminfo different?
AFAIK Vulkan does not report free memory. The VkMemoryHeap shows total memory:
size is the total memory size in bytes in the heap.
You don't know anything about the memory types in Vulkan until you ask the driver.
I think the biggest misunderstanding you have is that the memory types are physically separate. As shown, you have two memory heaps, assume 0 is CPU memory, 1 is GPU. Within those heaps, you have different memory types. Each memory type occupies space within its own heap, and can use all the heap space or share it with other types. For each type you'll have different internal allocation methods with different alignment requirements and different allowed uses. There are multiple queries related to memory types including vkGetBufferMemoryRequirements, vkGetImageMemoryRequirements, and others. It all depends on what you're using the memory for.
Also, those memory types are driver dependent, and will vary between vendors (that looks like the current nVidia layout).
I was considering changing yarn.nodemanager.resource.memory-mb to a value higher than the RAM available on my machine. Doing a quick search revealed that not many people are doing this.
Many long lived applications on yarn, are bound to have a jvm heap space allocation in which some of their memory is more frequently used and some of it is rarely used. In this case, it would make perfect sense for such applications to have some of their infrequently used memory portions swapped to disk and reallocating the available physical memory to other applications that need it.
Given the above background, can someone either please corroborate my reasoning or offer an alternate perspective? Also, can you please also clarify how the parameter yarn.nodemanager.vmem-pmem-ratio would work in the above case?
This is not a good idea. Trying to use more memory than what is available will eventually crash your Node Manager hosts.
There already is a feature called opportunistic containers which uses spare memory not used by the NMs and adds more containers to those hosts. Refer to:
YARN-1011 [Umbrella] Schedule containers based on utilization of currently allocated containers
In addition, Pepperdata has a product that does almost the same thing if you can't wait for YARN-1011.
https://www.pepperdata.com/products/capacity-optimizer/
As for yarn.nodemanager.vmem-pmem-ratio, don't enable this as it's not recommended anymore.
YARN-782 vcores-pcores ratio functions differently from vmem-pmem ratio in misleading way
We run Node processes inside Docker containers with hard memory caps of 1GB, 2GB, or 4GB. Each container generally just runs a single Node process (plus maybe a tiny shell script wrapper). Let's assume for the purposes of this question that the Node process never forks more processes.
For our larger containers, if we don't set --max_old_space_size ourselves, then in the version of Node we use (on a 64-bit machine) it defaults to 1400MB. (This will change to 2048MB in a later version of Node.)
Ideally we want our Node process to use as much of the container as possible without going over and running out of memory. The question is — what number should we use? My understanding is that this particular flag tunes the size of one of the largest pools of memory used by Node, but it's not the only pool — eg, there's a "non-old" part of the heap, there's stack, etc. How much should I subtract from the container's size when setting this flag in order to stay away from the cgroup memory limit but still make maximal use of the amount of memory allowed in this container?
I do note that from the same place where kMaxOldSpaceSizeHugeMemoryDevice is defined, it looks like the default "max semi space" is 16MB and the default "max executable size" is 512MB. So I suspect this means I should subtract at least 528 from the container's memory limit when determining the value for this flag. But surely there are other ways that Node uses memory?
(To be more specific, we are a hosting service that sells containers of particular sizes to our users, most of which use them for Node processes. We'd like to be able to advise our customers as to what flag to set so that they neither are killed by our limits nor pay us for capacity that Node's configuration doesn't let them actually use.)
There is, unfortunately, no particularly satisfactory answer to this question.
The constants you've found control the size of the garbage-collected heap, but as you've already guessed, there are many ways to consume memory that's not part of that heap:
For example, big strings and big TypedArrays are typically managed by the embedder (i.e. node and its modules, not V8 itself), and outside the GC'ed heap.
Node modules, in general, can consume whatever memory they want. Presumably you don't want to restrict what modules your customers can run, but that implies that you also can't predict how much memory those modules are going to require.
V8 also uses temporary memory outside the GC'ed heap for parsing and compilation. Numbers depend on the code that's being run, from a few kilobytes up to a gigabyte or more (e.g. for huge asm.js codebases) anything is possible. These are relatively short-lived memory consumption peaks, so on the one hand you probably don't want to limit long-lived heap memory to account for them, but on the other hand that means they can make your processes run into the system limit.
I am writing small process monitor script in Perl by reading values from Proc file system. Right now I am able to fetch number of threads, process state, number of bytes read and write using /proc/[pid]/status and /proc/[pid]/io files. Now I want to calculate the memory usage of a process. After searching, I came to know memory usage will be present /proc/[pid]/statm. But I still can't figure out what are necessary fields needed from that file to calculate the memory usage. Can anyone help me on this? Thanks in advance.
You likely want resident or size. From kernel.org.
size total program size
This is the whole program, including stuff never swapped in
resident resident set size
Stuff in RAM at the current moment (this does not include pages swapped out)
It is extremely difficult to know what the "memory usage" of a process is. VM size and RSS are known, measurable values.
But what you probably want is something else. In practice, "VM size" seems too high and RSS often seems too low.
The main problems are:
Multiple processes can share the same pages. You can add up the RSS of all running processes, and end up with much more than the physical memory of your machine (this is before kernel data structures are counted)
Private pages belonging to the process can be swapped out. Or they might not be initialised yet. Do they count?
How exactly do you count memory-mapped file pages? Dirty ones? Clean ones? MAP_SHARED or MAP_PRIVATE ones?
So you really need to think about what counts as "memory usage".
It seems to me that logically:
Private pages which are not shared with any other processes (NB: private pages can STILL be copy-on-write!) must count even if swapped out
Shared pages should count divided by the number of processes they're shared by e.g. a page shared by two processes counts half
File-backed pages which are resident can count in the same way
File-backed non-resident pages can be ignored
If the same page is mapped more than once into the address-space of the same process, it can be ignored the 2nd and subsequent time. This means that if proc 1 has page X mapped twice, and proc 2 has page X mapped once, they are both "charged" half a page.
I don't know of any utility which does this. It seems nontrivial though, and involves (at least) reading /proc/pid/pagemap and possibly some other /proc interfaces, some of which are root-only.
Another (less simple, but more precise) possibility would be to parse the /proc/123/maps file, perhaps by using the pmap utility. It gives you information about the "virtual memory" (i.e. the address space of the process).
I need to find out how many pages of memory a process allocates?
Each page is 4096, the process memory usage I'm having some problems locating the correct value. When I'm looking in the gome-system-monitor there are a few values to choose from under memory map.
Thanks.
The point of this is to divide the memory usage by the page count and verify the page size.
It's hard to figure exact amount of memory allocated correctly: there are pages shared with other processes (r/o parts of libraries), never used memory allocated by brk and anonymous mmap, mmaped file which are not fetched from disk completely due to efficient processing algorithms which touch only small part of file etc, swapped out pages, dirty pages to-be-written-on-disk etc.
If you want to deal with all this complexity and figure out True count of pages, the detailed information is available at /proc/<pid>/smaps, and there are tools, like mem_usage.py or smem.pl (easily googlable) to turn it into more-or-less usable summary.
This would be the "Resident Set Size", assuming you process doesn't use swap.
Note that a process may allocate far more memory ("Virtual Memory Size"), but as long as it don't writes to the memory, it is not represented by physical memory, be it in RAM or on the disk.
Some system tools, like top, display a huge value for "swap" for each process - this is of course completly wrong, the value is the difference between VMS and RSS and most likely those unused, but allocated, memory pages.