Vxworks memory allocation failure even though there is enough memory - malloc

I am rather new to vxworks, and I am building an RTP application, which needs to allocate some memory dynamically. I have configured the kernel for a memory size of 750MB.
I am allocating memory in blocks 10 numbers each of size 32MB in the very beginning of the program, but after the 5th or 6th block allocation, I get an allocation failure with message memPartAlloc: block too big 15912260 bytes (0x10 aligned) in partition 0xe004608 on the console.
How could memory allocation be failing when there is enough memory available? I do not think memory had fragmented enough for allocation to fail right in the beginning of my program and as per output of memShow(), there is indeed enough free memory to satisfy the request.
If memory has indeed fragmented due to any strange reason, is there some way to compact free space and continue in Vxworks?

This is an old question, so this answer may be moot now, and is to an extent based on speculation based on the limited information in the question.
Whilst the kernel maybe configured to support 750MB, this will be the total memory available. Some of this will be used by the OS image, although we wont expect much, and we can assume that at least 700MB should be available for use.
Some extra memory will be used to provide the stacks for each task - how much is very application dependant, as it is specified in the taskSpawn. You can check this, but again, is unlikely to make significant difference.
Lets be generous, and assume that you really only have 650MB. This should, in theory, be plenty.
And yet we have this error:
memPartAlloc: block too big 15912260 bytes (0x10 aligned) in partition 0xe004608
What can be happening? And what does this mean?
This error tells you that the memory allocator could not allocate memory, as the request was too large. Interestingly, the request is 15912260, which is not 32MB, it is actually a shade over 15MB. So it would be worth checking what you are actually requesting.
Secondly, this error message is coming from memPartAlloc. Are you using allocating memory using malloc() or memPartAlloc()? The distinction matters, since malloc will allocate memory from the system memory partition, whereas memPartAlloc allocates memory from a user-specifed, and created, partition.
If you are using memPartAlloc, ensure that you are allocating memory from the correct partition, and that it has been created with enough memory to fulfill the request.
EDIT:
As it appears that this was an RTP, you should also confirm that the RTP has a large enough heap allocated. This is specified via an environment variable, as this answer describes.

Related

What happens to allocated pages that are mostly empty?

If a process initially has a number of pages allocated to it in the heap, but a lot of the data in the pages has been deallocated, is there some sort of optimization that the OS does to consolidate the data into one page so that the other pages can be freed?
In general, nothing happens, the heap will continue to have "holes" in it.
Since the (virtual) memory addresses known by a process must remain valid, the operating system cannot perform "heap compaction" on its own. However, some runtimes like .Net do it.
If you are using C or C++, all you can hope for by default is that malloc() will be able to reuse previously deallocated chunks. But if your usage pattern is "allocate a lot of small objects then deallocate half of them at random," the memory utilization will probably not decrease much from the peak.
If a process initially has a number of pages allocated to it in the heap
A process will not initially have pages allocates in a heap.
is there some sort of optimization that the OS does to consolidate the data into one page so that the other pages can be freed
The operating system has no knowledge of user heaps. It allocates pages to the process. What that process does with those pages is up to it (i.e., use them for a heap, stack, code, etc.).
A process's heap manager can consolidate freed chunks of memory. When this occurs, it is normally done to fight heap fragmentation. However, I have never seen a heap manager on a paging system that unmaps pages once they are mapped by the operating system.
The heap of a process never has holes on it. The heap is part of the data segment allocated to a process, that grows dynamically upwards to the top of the stack segment, basically with the use of the sbrk(2) system call (that fixes a new size to the data segment) so the heap is a continuous segment (at least in terms of virtual address space) of allocated pages. malloc(3) never returns the heap space (or part of it) to the system. See malloc(3) for info about this. While there are memory allocators that allow a process to have several heaps (by means of allocating new memory segments, by use of the mmap(2) system call) the segments allocated by a memory allocator are commonly never returned back to the system.
What happens is that the memory allocator reuses the heap space allocated with sbrk(2) and mmap(2) and manages memory for being reused, but it is never returned back to the system.
But don't fear, as this is handled in a good and profitable way by the system, anyway.
That should not affect the overall system management, except from the fact that it consumes virtual address space, and probably page contents will end in the swap device if you don't use them until the process references them again and makes the system to reload them from the swap device(s). If your process doesn't reuse the holes it creates in the heap, the most probable destination is for the system to move them to the swap device and continue reusing it for other processes.
At this moment, I don't know if the system optimices swap allocation by not swapping out zeroed pages, as it does, for example, with text segments of executables (they never go to a swap device, because their contents are already swapped off in the executable file ---this was the reason you couldn't erase in ancient unices a program executable, or the reason there's not need anymore to use the sticky bit in frequently used programs---) but I think it doesn't (and the reason is that it's most improbable the unused pages will be zeroed by the application)
Be warned only in the case you have a 15Gb single process' heap use in your system and 90% of heap use is not in use most of the time. But think better in optimising the allocation resources because a process that consumes 15Gb of heap while most of the time 90%+ is unused, seems to be a poor design. If you have no other chance, simply provide enough swap space to your system to afford that.

Vulkan memoryHeaps and their memoryTypes

Above is a picture summarizing my understanding on memoryHeap and their memoryTypes generated by Vulkan for a given system setup. Thanks to the answers on this topics shared by #NicolBolas 1, 2, 3 and an answer by #krOoze 4.
Still, I have a few outstanding questions that I like help on and I have indicated them in red and elaborated below per comment of #NicolBolas.
Questions
Why are there 9 memoryType in sysRam when there are only 4x RAMs?
What is the physical meaning of each memoryType? How to use each of
these memoryType?
Why are there 2 memory types for GPU RAM? Does this mean each
memoryType of the GPU RAM is 6144MB/2 = 3072MB?
Is there a size limit to each memoryTypes? If yes, how to discover
their limits?
Why are the free memory reported by Vulkan and cat /proc/meminfo
different?
Thanks for your help in advance.
Why are there 9 memoryType in sysRam when there are only 4x RAMs? What is the physical meaning of each memoryType? How to use each of these memoryType?
Why are there 2 memory types for GPU RAM?
I don't know what you mean by "4x RAMs"; I suspect you're talking about how many physical memory sticks are in your machine. Memory types (or heaps for that matter) don't care about such things.
As for the rest, it is always important to remember how memory works in Vulkan. Heaps represent actual physical RAM to one degree or another. Memory types represent ways of allocating that memory. But uses of memory have their own memory type restrictions.
For example, if an image has the color attachment usage parameter, the implementation can force you to use a specific memory type for the memory backing that image. And images that don't have color attachment can be restricted to using other memory types, but not that one. And so forth.
Apparently, NVIDIA does this for certain combinations of usage and formats. Simply querying the available memory types isn't enough to know how to go about allocating memory. You have to figure out what buffers and images (complete with format and usage parameters) you will use. And then you have to query what restrictions the implementation imposes on them.
Your application must adapt to these restrictions.
Is there a size limit to each memoryTypes?
It wouldn't make sense for there to be such a thing. Memory types define how memory is allocated, not how much storage is available. The latter is the job of memory heaps.
Why are the free memory reported by Vulkan and cat /proc/meminfo different?
Vulkan has no API to report free memory, only total memory. Asking for the amount of free memory is folly. Memory (or at least, virtual pages in your application) are shared by all threads in your application. And GPU memory especially is shared among all processes on the machine. By the time you get an answer back, the amount of memory may have changed. So when you go to allocate memory based on what you were told was available, it may not be available anymore.
Better to allocate first and deal with failure to allocate if it happens.
You can ask for the total memory so that you can decide on how you want to allocate chunks of memory. But that's how you determine what is and is not available, not by querying a size.
[metaquestion] Why is X in Vulkan?
Because it is allowed by the Vulkan specification. Rest is implementation detail, and only the implementer\vendor knows for sure, and may depend on how well he slept.
Why are there 9 memoryType in sysRam when there are only 4x RAMs? What is the physical meaning of each memoryType? How to use each of these memoryType?
Answered in Why does vkGetPhysicalDeviceMemoryProperties return multiple identical memory types?. One for VkBuffers, one for VkImages, and one per depth format (i.e. 7). Equals 9; mystery solved.
Why are there 2 memory types for GPU RAM? Does this mean each memoryType of the GPU RAM is 6144MB/2 = 3072MB?
Likely similar reason as 1. I speculate one for VkBuffers, one for VkImages. Someone with NVIDIA could test with vkGetXMemoryRequirements.
It does not neccessarily mean RAM/2. It is not completely out of the question, but then again implementer should instead expose separate Heap if that is so.
Is there a size limit to each memoryTypes? If yes, how to discover their limits?
Roughly the Heap size. You may get significantly less due to fragmentation. And due to other processes sharing the same. Your impl may also allocate some itself for its internal needs.
You discover the limit when you get VK_ERROR_OUT_OF_DEVICE_MEMORY. (BTW mostly works the same as on CPU side, where you get bad_alloc).
There is limit to size of single allocation (not recommended to allocate > 4 GB), and to the count of allocations too (maxMemoryAllocationCount).
Why are the free memory reported by Vulkan and cat /proc/meminfo different?
AFAIK Vulkan does not report free memory. The VkMemoryHeap shows total memory:
size is the total memory size in bytes in the heap.
You don't know anything about the memory types in Vulkan until you ask the driver.
I think the biggest misunderstanding you have is that the memory types are physically separate. As shown, you have two memory heaps, assume 0 is CPU memory, 1 is GPU. Within those heaps, you have different memory types. Each memory type occupies space within its own heap, and can use all the heap space or share it with other types. For each type you'll have different internal allocation methods with different alignment requirements and different allowed uses. There are multiple queries related to memory types including vkGetBufferMemoryRequirements, vkGetImageMemoryRequirements, and others. It all depends on what you're using the memory for.
Also, those memory types are driver dependent, and will vary between vendors (that looks like the current nVidia layout).

What is memory reclaim in linux

I am very new to Linux memory management. While reading some documents on the topic, I had some basic questions.
Below is my config:
vm.swappiness=10
vm.vfs_cache_pressure=140
vm.min_free_kbytes=2013265
My understanding is, if free memory falls below vm.min_free_kbytes, then the OS will reclaim memory.
Is Memory reclaim a deletion of unwanted files or copying to Swap memory from RAM?
If it's copying to Swap memory from RAM, then if I am not using Swap memory, what will happen?
Is swappiness always greater than the vm.min_free_kbytes?
What is the significance of vm.vfs_cache_pressure?
Memory reclaim is the mechanism of creating more free RAM pages, by throwing somewhere else the data residing in them. It has nothing to do with files. When more RAM is needed, data is dropped from RAM (trashed away, if it can be refetched) or copied to the swap file (so the data will be refetchable).
If there is not a swap file, but some data should be saved to the (non existent) swap area, then an out-of-memory error happens. Typically, this is notified to the process which is trying to get the memory (via alloc() and similar) - the alloc() fails and returns NULL. The process can choose what to do, or even crash. If the memory is needed by the kernel itself (normally quite rare), a PANIC happens and the system locks completely.
swappiness is, in percentage, the tendency of the kernel to use the swap, even if not strictly needed, in order to have plenty of ram ready for memory requests. Simply put, a 100% swappiness means the kernel tries to always swap, a swappiness of 0 means the kernel tries to not do swap (there are some special values however). min_free_kbytes indicates real kilobytes, it is not a percentage, and it is the minimum amount that should always be free in order to let the kernel to work well. Even starting a memory reclaim could require some more ram to do the job: it would be catastrophic if, to get some memory, you need just a little memory but you don't have it! :-)
vfs_cache_pressure is again a percentage. It indicates how much the kernel tries to get rid of (memory) cache used for the file system (vfs=virtual file system). The cache for the filesystem is quite a good candidate to throw away, because it keeps information easily readable from the disk. Unfortunately, if the computer needs frequently to use the file system, it has to read, and read again, and read again always the same data. Caching is a big performance boost. Of course, if a system does little disk I/O, then this cache is the best candidate to throw away when memory hungry.
All this things are succintly explained here: https://www.kernel.org/doc/Documentation/sysctl/vm.txt

How much memory did Linux give to malloc()?

This is a Linux system question, not a coding question. When I use "top" to check the memory usage of my program, it reports a value 3-4 times as large as the actual heap allocation as given by Valgrind's Massif, a memory profiler. It's a large program, and the difference is hundreds of megabytes. The Valgrind manual gives only a partial explanation:
(Massif) does not directly measure memory allocated with
lower-level system calls such as mmap, mremap, and brk.
Heap allocation functions such as malloc are built on top of these
system calls. For example, when needed, an allocator will typically
call mmap to allocate a large chunk of memory, and then hand over
pieces of that memory chunk to the client program in response to calls
to malloc et al. Massif directly measures only these higher-level
malloc et al calls, not the lower-level system calls.
Fine, but how much memory am I really taking away from the system? I need to be able to run as many instances of this program as possible on one machine, so I need to know how much of that memory is still available. Page alignment etc. cannot explain a difference of hundreds of megabytes in reported memory usage.
Also, what determines the block size of the underlying mmap() call? I'm seeing blocks of 64MB at a time being taken according to top, which seems bizarrely large.
Any malloc implementation will be optimised for applications with huge memory requirements, because apps with low requirements run just fine anyway, and virtual memory is cheap.
For example, you will find malloc implementations that use a block of memory for up to 1024 mallocs of up to 16 bytes, another block for up to 1024 mallocs of up to 32 bytes, and so on. With a few mallocs this is inefficient but still cheap. With gazillions of mallocs, it makes malloc very efficient.
So saying "4 times as much" can be completely pointless. Tell us how many megabytes more than you thought.

Difference between "memory cache" and "memory pool"

By reading "understanding linux network internals" and "understanding linux kernel" the two books as well as other references, I am quite confused and need some clarifications about the "memory cache" and "memory pool" techniques.
1) Are they the same or different techniques?
2) If not the same, what makes the difference, or the distinct goals?
3) Also, how does the Slab Allocator come in?
Regarding the slab allocator:
So imagine memory is flat that is you have a block of 4 gigs contiguous memory. Then one of your programs reqeuests a 256 bytes of memory so what the memory allocator has to do is choose a suitable block of 256 bytes from this 4 gigs. So now you your memory looks something like
<============256bytes=======================>
(each = is a contiguous block of memory). Some time passes and a lot of programs operating with the memory require more 256 blocks or more or less so in the end your memory might look like:
<==256==256=256=86=68=121===>
so it gets fragmented and then there is no trace of your beautiful 4gig block of memory - this is fragmentation. Now, what the slab allocator would do is keep track of allocated objects and once they are not used anymore it will say that the memory is free when in fact it will be retained in some sort of List (You might wanna read about FreeLists).
So now imagine that the first program relinquish the 256 bytes allocated and then a new would like to have 256 bytes so instead of allocating a new chunk of the main memory it might re-use the lastly freed 256 bytes without having to go through the burden of searching the physical memory for appropriate contiguous block of space. This is how you essentially implement the memory cache. This is done so that memory fragmentation is reduced overall because you might end up in situation where memory is so fragmented that it is unusable and the memory-manager has to do some magic to get you block of appropriate size. Where as using a slab allocator pro-actively combats (but doesn't eliminate) the problem.
Linux memory allocator A.K.A slab allocator maintains the frequently used list/pool of memory objects of similar or approximate size. slab is giving extra flexibility to programmer to create their own pool of frequently used memory objects of same size and label it as programmer want,allocate, deallocate and finally destroy it.This cache is known to your driver and private to it.But there is a problem, during memory pressure there are high chances of allocation failures which could be not acceptable in some drivers, then what to do better always reserve some memory handy so that we never feel the memory crunch, since kmem cache is more generic pool mechanism we need some one who can always maintain minimum required memory and that's our buddy memory pool .
Lookaside Caches - The cache manager in the Linux kernel is sometimes called the slab allocator. You might end up allocating many objects of the same size over and over so by using this mechanism you just can allocate many objects in the same size and then use them later, without the need to allocate many objects over and over.
Memory Pool is just a form of lookaside cache that tries to always keep a list of memory around for use in emergencies, so when the memory pool is created, the allocation functions (slab allocators) create a pool of preallocated objects so you can acquire them when you need.

Resources