GNU c++: how to declare that heap can be swapped out when inactive?

GNU c++: how to declare that heap can be swapped out when inactive? - gnu

My code is built in GNU c++. It's heavily multi-threaded and allocates a fairly large amount of RAM. Problem is, I routinely have a dozen of different versions of the code "running" at the same time - the "running" being in quotes because only one is actually running at any given time: all the others are suspended, with all of their threads suspended.
With that many copies of the code in the RAM, I sometimes run out of memory (with disastrous consequences)... while my page file always stays at a measly 2-3% of use. I would like to make my code swappable to the page file, thus keeping space in actual RAM available for other programs as soon as I suspend it... but I have no idea of how to do that in g++.
Any idea how to do that? Many thanks!!

Related

Following memory allocation in gdb

Why is memory consumption jumping unpredictably as I step through a program in the gdb debugger? I'm trying to use gdb to find out why a program is using far more memory than it should, and it's not cooperating.
I step through the source code while monitoring process memory usage, but I can't find what line(s) allocate the memory for two reasons:
Reported memory usage only jumps up in increments of (usually, but not always exactly) 64 MB. I suspect I'm seeing the effects of some memory manager I don't know about which reserves 64 MB at a time and masks multiple smaller allocations.
The jump doesn't happen at a consistent location in code. Not only does it occur on different lines during different gdb runs; it also sometimes happens in illogical places like the closing bracket of a (c++) function. Is it possible that gdb itself is affecting memory allocations?
Any ideas/suggestions for more effective tools to help me drill down to the code lines that are really responsible for these memory allocations?
Here's some relevant system info: I'm running x86_64-redhat-linux-gnu version 7.2-64.el6-5.2 on a virtual CentOS Linux machine under Windows. The program is built on a remote server via a complicated build script, so tracking down exactly what options were used at any point is itself a bit of a chore. I'm monitoring memory usage both with the top utility ("virt" or virtual memory column) and by reading the real-time monitoring file /proc/<pid>/status, and they agree. Since this program uses a large suite of third-party libraries, there may be one or more overridden malloc() functions involved somewhere that I don't know about--hunting them down is part of this task.

gdb, left to its own devices, will not affect the memory use of your program, though a run under gdb may differ from a standalone run for other reasons.
However, this also depends on the way you use gdb. If you are just setting simple breakpoints, stepping, and printing things, then you are ok. But sometimes, to evaluate an expression, gdb will allocate memory in the inferior. For example, if you have a breakpoint condition like strcmp(arg, "string") == 0, then gdb will allocate memory for that string constant. There are other cases like this as well.

This answer is in several parts because there were several things going on:
Valgrind with the Massif module (a memory profiler) was much more helpful than gdb for this problem. Sometimes a quick look with the debugger works, sometimes it doesn't. http://valgrind.org/docs/manual/ms-manual.html
top is a poor tool for profiling memory usage because it only reports virtual memory allocations, which in this case were about 3x the actual heap memory usage. Virtual memory is mapped and made available by the Unix kernel when a process asks for a memory block, but it's not necessarily used. The underlying system call is mmap(). I still don't know how to check the block size. top can only tell you what the Unix kernel knows about your memory consumption, which isn't enough to be helpful. Don't use it (or the memory files under /proc/) to do detailed memory profiling.
Memory allocation when stepping out of a function was caused by autolocks--that's a thread lock class whose destructor releases the lock when it goes out of scope. Then a different thread goes into action and allocates some memory, leaving the operator (me) mystified. Non-repeatability is probably because some threads were waiting for external resources like Internet connections.

How to stop page cache for disk I/O in my linux system?

Here is my system based on Linux2.6.32.12:
1 It contains 20 processes which occupy a lot of usr cpu
2 It needs to write data on rate 100M/s to disk and those data would not be used recently.
What I expect:
It can run steadily and disk I/O would not affect my system.
My problem:
At the beginning, the system run as I thought. But as the time passed, Linux would cache a lot data for the disk I/O, that lead to physical memory reducing. At last, there will be not enough memory, then Linux will swap in/out my processes. It will cause I/O problem that a lot cpu time was used to I/O.
What I have try:
I try to solved the problem, by "fsync" everytime I write a large block.But the physical memory is still decreasing while cached increasing.
How to stop page cache here, it's useless for me
More infomation:
When Top show free 46963m, all is well including cpu %wa is low and vmstat shows no si or so.
When Top show free 273m, %wa is so high which affect my processes and vmstat shows a lot si and so.

I'm not sure that changing something will affect overall performance.
Maybe you might use posix_fadvise(2) and sync_file_range(2) in your program (and more rarely fsync(2) or fdatasync(2) or sync(2) or syncfs(2), ...). Also look at madvise(2), mlock(2) and munlock(2), and of course mmap(2) and munmap(2). Perhaps ionice(1) could help.
In the reader process, you might perhaps use readhahead(2) (perhaps in a separate thread).
Upgrading your kernel (to a 3.6 or better) could certainly help: Linux has improved significantly on these points since 2.6.32 which is really old.

To drop pagecache you can do the following:
"echo 1 > /proc/sys/vm/drop_caches"
drop_caches are usually 0. And, can be changed as per need. As you've identified yourself, that you need to free pagecache, so this is how to do it. You can also take a look at dirty_writeback_centisecs (and it's related tunables)(http://lxr.linux.no/linux+*/Documentation/sysctl/vm.txt#L129) to make quick writeback, but note it might have consequences, as it calls up kernel flasher thread to write out dirty pages. Also, note the uses of dirty_expire_centices, which defines how much time some data needs to be eligible for writeout.

Reclaim memory after program exit

Here is my problem: after running a suite of programs, free tells me that after execution there is about 1 GB less memory free. After some searches I found SO: What really happens when you dont free after malloc which (as I understand it) makes clear that missing memory deallocations should not be the problem... (is that correct?)
top does not show any processes that use significant amounts of memory.
How can I find out 'what happend' to the memory, i.e. which program allocated it and why it is not free after program execution?
Where does free collect its information?
(I am running a recent Ubuntu version)

Yes, memory used by your program is freed after your program exits.
The statistics in "free" are confusing, but the fact is that the memory IS available to other programs:
http://kevinclosson.wordpress.com/2009/11/17/linux-free-memory-is-it-free-or-reclaimable-yes-when-i-want-free-memory-i-want-free-memory/
http://sourcefrog.net/weblog/software/linux-kernel/free-mem.html
Here's an event better link:
http://www.linuxatemyram.com/

free (1) is a misnomer, it should more correctly be called unused, because that's what it shows. Or maybe it should be called physicalfree (or, more precisely, the "free" column in the output should be named "unused").
You'll note that "buffers" and "cached" tends to go up as "free" goes down. Memory does not disappear, it just gets assigned to a different "bucket".
The difference between free memory and unused memory is that while both are "free", the unused memory is truly so (no physical memory in use) whereas the simply "free" memory is often moved into the buffer cache. That is for example the case for all executable images and libraries, anything that is read-only or read-execute. If the same file is loaded again later, the "free" page is mapped into the process again and no data must be loaded.
Note that "unused" is actually a bad thing, although it is not immediately obvious (it sounds good, doesn't it?). Free (but physically used) memory serves a purpose, whereas free (unused) memory means you could as well have saved on money for RAM. Therefore, having unused memory (e.g. by purging pages) is exactly what you don't want.
Stunningly, under Windows there exists a lot of "memory optimizer" tools which cost real money and which do just that...
About reclaiming memory, the way this works is easy: The OS simply removes the references to all pages in the working set. If a page is shared with another process, nothing spectacular happens. If it belongs to a non-anonymous mapping and is not writeable (or writeable and not written), it goes into the buffer cache. Otherwise, it goes zap poof.
This removes any memory allocated with malloc as well as the memory used by executables and file mappings, and (since all memory is based on pages) everything else.

It is probably your OS using up that space for its own purposes.
For example, many modern OS's will keep programs loaded in memory after they terminate, in case you want to start them up again. If their guess is right, it saves a lot of time at the cost of some memory that wasn't being used anyway. Some OS's will even speculatively load some commonly used programs.
CPU utilization works the same way. Often your OS will speculatively do some work when the CPU would otherwise be "idle".

Does an Application memory leak cause an Operating System memory leak?

When we say a program leaks memory, say a new without a delete in c++, does it really leak? I mean, when the program ends, is that memory still allocated to some non-running program and can't be used, or does the OS know what memory was requested by each program, and release it when the program ends? If I run that program a lot of times, will I run out of memory?

No, in all practical operating systems, when a program exits, all its resources are reclaimed by the OS. Memory leaks become a more serious issue in programs that might continue running for an extended time and/or functions that may be called often from the same program.

On operating systems with protected memory (Mac OS 10+, all Unix-clones such as Linux, and NT-based Windows systems meaning Windows 2000 and younger), the memory gets released when the program ends.
If you run any program often enough without closing it in between (running more and more instances at the same time), you will eventually run out of memory, regardless of whether there is a memory leak or not, so that's also true of programs with memory leaks. Obviously, programs leaking memory will fill the memory faster than an identical program without memory leaks, but how many times you can run it without filling the memory depends much rather on how much memory that program needs for normal operation than whether there's a memory leak or not. That comparison is really not worth anything unless you are comparing two completely identical programs, one with a memory leak and one without.
Memory leaks become the most serious when you have a program running for a very long time. Classic examples of this is server software, such as web servers. With games or spreadsheet programs or word processors, for instance, memory leaks aren't nearly as serious because you close those programs eventually, freeing up the memory. But of course memory leaks are nasty little beasts which should always be tackled as a matter of principle.
But as stated earlier, all modern operating systems release the memory when the program closes, so even with a memory leak, you won't fill up the memory if you're continuously opening and closing the program.

Leaked memory is returned by the OS after the execution has stopped.
That's why it isn't always a big problem with desktop applications, but its a big problem with servers and services (they tend to run long times.).
Lets look at the following scenario:
Program A ask memory from the OS
The OS marks the block X as been used by A and returns it to the program.
The program should have a pointer to X.
The program returns the memory.
The OS marks the block as free. Using the block now results in a access violation.
Program A ends and all memory used by A is marked unused.
Nothing wrong with that.
But if the memory is allocated in a loop and the delete is forgotten, you run into real problems:
Program A ask memory from the OS
The OS marks the block X as been used by A and returns it to the program.
The program should have a pointer to X.
Goto 1
If the OS runs out of memory, the program probably will crash.

No. Once the OS finishes closing the program, the memory comes back (given a reasonably modern OS). The problem is with long-running processes.

When the process ends, the memory gets cleared as well. The problem is that if a program leaks memory, it will requests more and more of the OS to run, and can possibly crash the OS.

It's more leaking in the sense that the code itself has no more grip on the piece of memory.

The OS can release the memory when the program ends. If a leak exists in a program then it is just an issue whilst the program is running. This is a problem for long running programs such as server processes. Or for example, if your web browser had a memory leak and you kept it running for days then it would gradually consume more memory.

As far as I know, on most OS when a program is started it receives a defined segment of memory which will be completely liberated once the program is ended.
Memory leaks are one of the main reason why garbage collector algorithms were invented since, once plugged into the runtime, they become responsible in reclaiming the memory that is no longer accessible by a program.

Memory leaks don't persist past end of execution so a "solution" to any memory leak is to simply end program execution. Obviously this is more of an issue on certain types of software. Having a database server which needs to go offline every 8 hours due to memory leaks is more of an issue than a video game which needs to be restarted after 8 hours of continual play.
The term "leak" refers to the fact that over time memory consumption will grow without any increased benefit. The "leaked" memory is memory neither used by the program nor usable by the OS (and other programs).
Sadly memory leaks are very common in unmanaged code. I have had firefox running for a couple days now and memory usage is 424MB despite only having 4 tabs open. If I closed firefox and re-opened the same tabs memory usage would likely be <100MB. Thus 300+ MB has "leaked".

Preallocate memory for a program in Linux before it gets started

I have a program that repeatedly solves large systems of linear equations using cholesky decomposition. Characterising is that I sometimes need to store the complete factorisation which can exceed about 20 GB of memory. The factorisation happens inside a library that I call. Furthermore, this matrix and the resulting factorisation changes quite frequently and as such the memory requirements as well.
I am not the only person to use this compute-node. Therefore, is there a way to start the program under Linux and preallocate free memory for the process?
Something like: $: prealloc -m 25G ./program

I'll stick my neck out and say that I don't think that there is such a way under Linux. I think that the philosophy of Linux (and every other multi-tasking o/s I've used or heard of) is to provide the programmer (and the program) with the illusion that they have the whole of the computer's memory to play with and to make it very difficult indeed for a programmer to interfere with the o/s.
Instead, I think that you should plan to modify your program to grab the memory it will (or may) require when it starts up, that is, do the memory management yourself in whatever your chosen language is. How easy this might be for you, considering calls to a library, I don't know.

I've never heard of such a way. Usually it would be bad for other users on the node if one program went ahead and hogged all available memory. It's not good practice.
But opinions aside, I would probably write my program in such a way that it acts like a small environment that is able to make multiple runs of the routine in question without ending. It would allocate lots of memory on startup, then wait for user commands (through a minimal shell) and make the runs requested with the allocated memory pool. It would hold on to the pool until the user requests termination.
Of course this requires you to have an interactive session on the node, which you may not have.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string