linux swap space never release memory - linux

I am using Linux kernel 2.6.38, and I am running a process that allocates 4GB of memory, and I have a 4GB of ram available, so when I run my application it allocates around 0.5GB from swap space. however, my application runs for a very long time and accesses data on the swap space several times.
(Edited)
To clarify what I am doing:
I am running Linux 2.6.38, with 4 GB of RAM.
without running any applications, the system is occupying around 500MB of RAM.
I created a simple application that allocates 4GB of memory and seeks across allocated memory and changes the values of that memory many times (loop of 10 iterations).
it is obvious that I will need the swap space in order for application to run.
when I run my application, the swap space keeps accumulating and becomes full after few iterations, and the process is killed.
after the process is killed the swap space remains full as well.
I tested my application on a more recent kernels and it works fine, the swap space does not accumulate.
is this a bug on this kernel version (2.6.38)? is there a fix to it?

There's no memory leak.
You're assuming that when your application needs more memory than what's available, parts of it is written to swap. This is not necessarily true.
The system may (and generally will) write other, completely unrelated processes to swap, because they're not currently in use.
Since this swap space does not belong to your application, it will remain in use after your application exits.
This swap space may further stay in use for a long time since Linux doesn't preemptively load them back when there's free RAM.

I'm not sure my response will answer your question but I asked myself a similar question a while back.
To summarise when Linux allocates memory (RAM/SWAPP) it only frees it when it's needed. That means even after the process has terminated the allocated memory will remain until another process needs the space.
However if you want to free the SWAPP you can do it manually
sudo swapoff -a
Do not forget to turn it back on
sudo swapon -a
You can find more information at that link and that one

Related

Allocate memory to a process that other process can not use in linux

To limit memory resource for particular process we can use ulimit as well as cgroup.
I want to understand that if using cgroup, I have allocated say ~700 MB of memory to process A, on system having 1 GB of RAM, and some other process say B, requires ~400 MB of memory. What will happen in this case?
If process A is allocated ~750 MB of memory but using only 200 MB of memory, will process B can use memory that allocated to A?
If no then how to achieve the scenario when "fix amount of memory is assigned to a process that other process can not use"?
EDIT
Is it possible to lock physical memory for process? Or only VM can be locked so that no other process can access it?
There is one multimedia application that must remain alive and can use maximum system resource in terms of memory, I need to achieve this.
Thanks.
Processes are using virtual memory (not RAM) so they have a virtual address space. See also setrlimit(2) (called by ulimit shell builtin). Perhaps RLIMIT_RSS & RLIMIT_MEMLOCK are relevant. Of course, you could limit some other process e.g. using RLIMIT_AS or RLIMIT_DATA, perhaps thru pam_limits(8) & limits.conf(5)
You could lock some virtual memory into RAM using mlock(2), this ensures that the RAM is kept for the calling process.
If you want to improve performance, you might also use madvise(2) & posix_fadvise(2).
See also ionice(1) & renice(1)
BTW, you might consider using hypervisors like Xen, they are able to reserve RAM.
At last, you might be wrong in believing that your manual tuning could do better than a carefully configured kernel scheduler.
What other processes will run on the same system, and what kind of thing do you want to happen if the other multimedia program needs memory that other processes are using?
You could weight the multimedia process so the OOM killer only picks it as a last choice after every other non-essential process. You might see a dropped frame if the kernel takes some time killing something to free up memory.
According to this article, adjust the oom-killer weight of a process by writing to /proc/pid/oom_adj. e.g. with
echo -17 > /proc/2592/oom_adj

vm/min_free_kbytes - Why Keep Minimum Reserved Memory?

According to this article:
/proc/sys/vm/min_free_kbytes: This controls the amount of memory that is kept free for use by special reserves including “atomic” allocations (those which cannot wait for reclaim)
My question is that what does it mean by "those which cannot wait for reclaim"? In other words, I would like to understand why there's a need to tell the system to always keep a certain minimum amount of memory free and under what circumstances will this memory be used? [It must be used by something; don't see the need otherwise]
My second question: does setting this memory to something higher than 4MB (on my system) leads to better performance? We have a server which occasionally exhibit very poor shell performance (e.g. ls -l takes 10-15 seconds to execute) when certain processes get going and if setting this number to something higher will lead to better shell performance?
(link is dead, looks like it's now here)
That text is referring to atomic allocations, which are requests for memory that must be satisfied without giving up control (i.e. the current thread can not be suspended). This happens most often in interrupt routines, but it applies to all cases where memory is needed while holding an essential lock. These allocations must be immediate, as you can't afford to wait for the swapper to free up memory.
See Linux-MM for a more thorough explanation, but here is the memory allocation process in short:
_alloc_pages first iterates over each memory zone looking for the first one that contains eligible free pages
_alloc_pages then wakes up the kswapd task [..to..] tap into the reserve memory pools maintained for each zone.
If the memory allocation still does not succeed, _alloc pages will either give up [..] In this process _alloc_pages executes a cond_resched() which may cause a sleep, which is why this branch is forbidden to allocations with GFP_ATOMIC.
min_free_kbytes is unlikely to help much with the described "ls -l takes 10-15 seconds to execute"; that is likely caused by general memory pressure and swapping rather than zone exhaustion. The min_free_kbytes setting only needs to allow enough free pages to handle immediate requests. As soon as normal operation is resumed, the swapper process can be run to rebalance the memory zones. The only time I've had to increase min_free_kbytes is after enabling jumbo frames on a network card that didn't support dma scattering.
To expand on your second question a bit, you will have better results tuning vm.swappiness and the dirty ratios mentioned in the linked article. However, be aware that optimizing for "ls -l" performance may cause other processes to become slower. Never optimize for a non-primary usecase.
All linux systems will attempt to make use of all physical memory available to the system, often through the creation of a filesystem buffer cache, which put simply is an I/O buffer to help improve system performance. Technically this memory is not in use, even though it is allocated for caching.
"wait for reclaim", in your question, refers to the process of reclaiming that cache memory that is "not in use" so that it can be allocated to a process. This is supposed to be transparent but in the real world there are many processes that do not wait for this memory to become available. Java is a good example, especially where a large minimum heap size has been set. The process tries to allocate the memory and if it is not instantly available in one large contiguous (atomic?) chunk, the process dies.
Reserving a certain amount of memory with min_free_kbytes allows this memory to be instantly available and reduces the memory pressure when new processes need to start, run and finish while there is a high memory load and a full buffer cache.
4MB does seem rather low because if the buffer cache is full, any process that wants an immediate allocation of more than 4MB will likely fail. The setting is very tunable and system-specific, but if you have a few GB of memory available it can't hurt to bump up the reserve memory to 128MB. I'm not sure what effect it will have on shell interactivity, but likely positive.
This memory is kept free from use by normal processes. As #Arno mentioned, the special processes that can run include interrupt routines, which must be run now (as it's an interrupt), and finish before any other processes can run (atomic). This can include things like swapping out memory to disk when memory is full.
If the memory is filled an interrupt (memory management) process runs to swap some memory into disk so it can free some memory for use by normal processes. But if vm.min_free_kbytes is too small for it to run, then it locks up the system. This is because this interrupt process must run first to free memory so others can run, but then it's stuck because it doesn't have enough reserved memory vm.min_free_kbytes to do its task resulting in a deadlock.
Also see:
https://www.linbit.com/en/kernel-min_free_kbytes/ and
https://askubuntu.com/questions/41778/computer-freezing-on-almost-full-ram-possibly-disk-cache-problem (where the memory management process has so little memory to work with it takes so long to swap little by little that it feels like a freeze.)

Memory Allocation for Threads and Processes

If I have a process, which has been allocated with some space in RAM. If the process creates a thread (it has too, in fact), the thread will also need some space for its execution. Won't it?
So will it increase the size of the space which has been allocated to that process, or space for thread will be created somewhere else? IF Yes, where on RAM, need it to be contigious with the space that has been possessed by the process?
There'll be some overhead in the scheduler (in the kernel) somewhere since it needs to maintain information about the thread.
There'll also be some overhead in the process-specific area as well since you'll need a stack for each thread and you don't want to go putting stuff into the kernel-specific space when the user code needs to get at.
All modern operating systems and for quite some time now, separate between the memory needed by a process and the memory physically allocated on the RAM.
The OS created a large virtual address space for each process. That address space is independent of how many threads are created inside each process.
In Windows for example, and for optimization reasons, part of that address space is reserved for OS and kernel libraries and is shared amongst all processes for efficiency.
The other part is dedicated to the application user code and libraries.
Once a process logistics and resources are created, the process now is ready to start and that will happen through starting the first thread in the process that will start executing the process main entry point.
For a thread to start execute, it needs a stack amongst other requirements. In Windows, the default size of that stack is about 1 MB. It means, if not changed, each thread will require about 1 MB of memory for its own housekeeping. (stack, TLS, etc....)
When the process needs memory to be allocated, the OS decides the how this memory is going to be allocated physically on the RAM. The process/ application does not see in physical RAM addresses. It only sees virtual addresses from the virtual space assigned to each process.
The OS uses a page file located on the disk to assist with memory requests in addition to the RAM. Less RAM means more pressure on the Page file. When the OS tries to find a piece of memory that's not in the RAM, it will try to find in the page file, and in this case they call it a page miss.
This topic is very extensive but tried to give an overview as much as I can.

Why the process is getting killed at 4GB?

I have written a program which works on huge set of data. My CPU and OS(Ubuntu) both are 64 bit and I have got 4GB of RAM. Using "top" (%Mem field), I saw that the process's memory consumption went up to around 87% i.e 3.4+ GB and then it got killed.
I then checked how much memory a process can access using "uname -m" which comes out to be "unlimited".
Now, since both the OS and CPU are 64 bit and also there exists a swap partition, the OS should have used the virtual memory i.e [ >3.4GB + yGB from swap space ] in total and only if the process required more memory, it should have been killed.
So, I have following ques:
How much physical memory can a process access theoretically on 64 bit m/c. My answer is 2^48 bytes.
If less than 2^48 bytes of physical memory exists, then OS should use virtual memory, correct?
If ans to above ques is YES, then OS should have used SWAP space as well, why did it kill the process w/o even using it. I dont think we have to use some specific system calls which coding our program to make this happen.
Please suggest.
It's not only the data size that could be the reason. For example, do ulimit -a and check the max stack size. Have you got a kill reason? Set 'ulimit -c 20000' to get a core file, it shows you the reason when you examine it with gdb.
Check with file and ldd that your executable is indeed 64 bits.
Check also the resource limits. From inside the process, you could use getrlimit system call (and setrlimit to change them, when possible). From a bash shell, try ulimit -a. From a zsh shell try limit.
Check also that your process indeed eats the memory you believe it does consume. If its pid is 1234 you could try pmap 1234. From inside the process you could read the /proc/self/maps or /proc/1234/maps (which you can read from a terminal). There is also the /proc/self/smaps or /proc/1234/smaps and /proc/self/status or /proc/1234/status and other files inside your /proc/self/ ...
Check with  free that you got the memory (and the swap space) you believe. You can add some temporary swap space with swapon /tmp/someswapfile (and use mkswap to initialize it).
I was routinely able, a few months (and a couple of years) ago, to run a 7Gb process (a huge cc1 compilation), under Gnu/Linux/Debian/Sid/AMD64, on a machine with 8Gb RAM.
And you could try with a tiny test program, which e.g. allocates with malloc several memory chunks of e.g. 32Mb each. Don't forget to write some bytes inside (at least at each megabyte).
standard C++ containers like std::map or std::vector are rumored to consume more memory than what we usually think.
Buy more RAM if needed. It is quite cheap these days.
In what can be addressed literally EVERYTHING has to fit into it, including your graphics adaptors, OS kernel, BIOS, etc. and the amount that can be addressed can't be extended by SWAP either.
Also worth noting that the process itself needs to be 64-bit also. And some operating systems may become unstable and therefore kill the process if you're using excessive RAM with it.

What happens if memory leaks on rootfs?

I have a linux totally on rootfs ( which as I understand is an instance of ramfs ). There's no hard disk and no swap. And I got a process that leaks memory continuously. The virutal memory eventually grows to 4 times the size of physical memory, shown with top. I can't understand what's happening. rootfs is supposed to take RAM only, right ? If I have no disk to swap to, how does the Virtual Memory grows to 4 times the physical memory ?
Not all allocated memory has to be backed by a block device; the glibc-people consider this behavior a bug:
BUGS
By default, Linux follows an optimistic memory allocation
strategy. This means that when malloc() returns non-NULL
there is no guarantee that the memory really is available.
This is a really bad bug. In case it turns out that the
system is out of memory, one or more processes will be killed
by the infamous OOM killer. In case Linux is employed under
circumstances where it would be less desirable to suddenly
lose some randomly picked processes, and moreover the kernel
version is sufficiently recent, one can switch off this
overcommitting behavior using a command like:
# echo 2 > /proc/sys/vm/overcommit_memory
See also the kernel Documentation directory, files
vm/overcommit-accounting and sysctl/vm.txt.

Resources