Multithreaded 64 bit process goes "Out of memory" - multithreading

I have memory and CPU intensive application which runs several hundred concurrent threads that do text processing. There is a lot of going on in background, processing files, logging to disk, and so on. Application is compiled for x64 platform, under XE2.
Occassionaly, it crashes, and I've been trying to debug this issue for a few days, but without success. Here is the bug report: http://pastebin.com/raw.php?i=sSUXCznT
I tried running it under debugger and it reports Out of Memory exception after a while. At the point of crash, it was using 670mb of RAM, and machine has 32gb total RAM.
I was thinking it may be fragmentation, but if I'm understanding this bug report correctly, it says largest free block : 8185.75 GB, which indicates that fragmentation isn't the issue here.
Application isn't leaking memory anywhere (atleast that I know of), I have ReportMemoryLeaksOnShutdown enabled and it works fine.
Since I don't have any other ideas why it would crash with Out of memory exception, I would like to get some hints so I can get on the right path to fix this.

Try setting a breakpoint in System.pas procedure Error(errorCode: TRuntimeError);. Your Application should stop there when the out of memory happens. When you get there, skip the ErrorAt function (by using Debug->"Set next statement" in the context menu). That will silently ignore the exception so you can debug the call stack easier. Leave the functions with F7 until you have a useful stack trace.

Related

How to inspect nodejs stack & code segment memory usage

I am experiencing a 'slow' memory increase in my node process which runs for longer periods of time (~1GB in over 2 months), however the heap stays constant (which implies that my code/stack is growing). I also tried to manually call the garbage collector but memory usage remains the same.
How can I investigate this further ? I want to confirm my theory and figure out why is my code segment / stack part growing.
I am using node v8 LTS (I know this is EOL from this year, I just need to know if there's a way to figure this out)
(V8 developer here.)
Code generated by V8 is on the heap, so if the heap isn't growing, that means that code isn't growing either.
The stack size is limited by the operating system, usually to 1-8 MB. Since operating systems simply kill processes that run into the stack limit, V8 imposes an even lower limit (a little less than a megabyte, I think it's 984KB currently) onto itself, and will throw a RangeError if that's ever exceeded. So a growing stack can't be your problem either.
Since you say that the heap memory reported by Node/V8 remains constant, that also means that most "how to debug memory leaks in Node" tutorials don't apply to your situation; and that probably also means that the leak is not in your (JavaScript) code.
That leaves C++ "heap memory" (which is very different from V8's managed "heap"!) as the most likely culprit. Node itself as well as native extensions can freely allocate memory that they manage themselves. Maybe something doesn't get cleaned up properly there. That could simply be an upstream bug; or it could be that something in your code is accidentally holding on to some embedder memory.
What I would try first is to update Node and any native extensions you have installed. Maybe the leak has already been found and fixed.
If that doesn't help, then you could try to investigate where the memory is going. For instance, you could compile everything from source with LSan enabled, and see if that reports anything. It would probably be helpful to construct a stress-test, e.g. a fake client that floods (a test instance of) your server with real-looking requests, to try to trigger inspectable instances of the leak in seconds or minutes rather than months. Crafting such a fake client might also help narrow down where things go wrong (e.g.: maybe you'll notice that one type of request does not trigger the leak but another type of request does).

Following memory allocation in gdb

Why is memory consumption jumping unpredictably as I step through a program in the gdb debugger? I'm trying to use gdb to find out why a program is using far more memory than it should, and it's not cooperating.
I step through the source code while monitoring process memory usage, but I can't find what line(s) allocate the memory for two reasons:
Reported memory usage only jumps up in increments of (usually, but not always exactly) 64 MB. I suspect I'm seeing the effects of some memory manager I don't know about which reserves 64 MB at a time and masks multiple smaller allocations.
The jump doesn't happen at a consistent location in code. Not only does it occur on different lines during different gdb runs; it also sometimes happens in illogical places like the closing bracket of a (c++) function. Is it possible that gdb itself is affecting memory allocations?
Any ideas/suggestions for more effective tools to help me drill down to the code lines that are really responsible for these memory allocations?
Here's some relevant system info: I'm running x86_64-redhat-linux-gnu version 7.2-64.el6-5.2 on a virtual CentOS Linux machine under Windows. The program is built on a remote server via a complicated build script, so tracking down exactly what options were used at any point is itself a bit of a chore. I'm monitoring memory usage both with the top utility ("virt" or virtual memory column) and by reading the real-time monitoring file /proc/<pid>/status, and they agree. Since this program uses a large suite of third-party libraries, there may be one or more overridden malloc() functions involved somewhere that I don't know about--hunting them down is part of this task.
gdb, left to its own devices, will not affect the memory use of your program, though a run under gdb may differ from a standalone run for other reasons.
However, this also depends on the way you use gdb. If you are just setting simple breakpoints, stepping, and printing things, then you are ok. But sometimes, to evaluate an expression, gdb will allocate memory in the inferior. For example, if you have a breakpoint condition like strcmp(arg, "string") == 0, then gdb will allocate memory for that string constant. There are other cases like this as well.
This answer is in several parts because there were several things going on:
Valgrind with the Massif module (a memory profiler) was much more helpful than gdb for this problem. Sometimes a quick look with the debugger works, sometimes it doesn't. http://valgrind.org/docs/manual/ms-manual.html
top is a poor tool for profiling memory usage because it only reports virtual memory allocations, which in this case were about 3x the actual heap memory usage. Virtual memory is mapped and made available by the Unix kernel when a process asks for a memory block, but it's not necessarily used. The underlying system call is mmap(). I still don't know how to check the block size. top can only tell you what the Unix kernel knows about your memory consumption, which isn't enough to be helpful. Don't use it (or the memory files under /proc/) to do detailed memory profiling.
Memory allocation when stepping out of a function was caused by autolocks--that's a thread lock class whose destructor releases the lock when it goes out of scope. Then a different thread goes into action and allocates some memory, leaving the operator (me) mystified. Non-repeatability is probably because some threads were waiting for external resources like Internet connections.

Linux memory usage history

I had a problem in which my server began failing some of its normal processes and checks because the server's memory was completely full and taken.
I looked in the logging history and found that what it killed were some Java processes.
I used the "top" command to see what processes were taking up the most memory right now(after the issue was fixed) and it was a Java process. So in essence, I can tell what processes are taking up the most memory right now.
What I want to know is if there is a way to see what processes were taking up the most memory at the time when the failures started happening? Perhaps Linux keeps track or a log of the memory usage at particular times? I really have no idea but it would be great if I could see that kind of detail.
#Andy has answered your question. However, I'd like to add that for future reference use a monitoring tool. Something like these. These will give you what happened during a crash since you obviously cannot monitor all your servers all the time. Hope it helps.
Are you saying the kernel OOM killer went off? What does the log in dmesg say? Note that you can constrain a JVM to use a fixed heap size, which means it will fail affirmatively when full instead of letting the kernel kill something else. But the general answer to your question is no: there's no way to reliably run anything at the time of an OOM failure, because the system is out of memory! At best, you can use a separate process to poll the process table and log process sizes to catch memory leak conditions, etc...
There is no history of memory usage in linux be default, but you can achieve it with some simple command-line tool like sar.
Regarding your problem with memory:
If it was OOM-killer that did some mess on machine, then you have one great option to ensure it won't happen again (of course after reducing JVM heap size).
By default linux kernel allocates more memory than it has really. This, in some cases, can lead to OOM-killer killing the most memory-consumptive process if there is no memory for kernel tasks.
This behavior is controlled by vm.overcommit sysctl parameter.
So, you can try setting it to vm.overcommit = 2 is sysctl.conf and then run sysctl -p.
This will forbid overcommiting and make possibility of OOM-killer doing nasty things very low. Also you can think about adding a little-bit of swap space (if you don't have it already) and setting vm.swappiness to some really low value (like 5, for example. default value is 60), so in normal workflow your application won't go into swap, but if you'll be really short on memory, it will start using it temporarily and you will be able to see it even with df
WARNING this can lead to processes receiving "Cannot allocate memory" error if you have your server overloaded by memory. In this case:
Try to restrict memory usage by applications
Move part of them to another machine

Does an Application memory leak cause an Operating System memory leak?

When we say a program leaks memory, say a new without a delete in c++, does it really leak? I mean, when the program ends, is that memory still allocated to some non-running program and can't be used, or does the OS know what memory was requested by each program, and release it when the program ends? If I run that program a lot of times, will I run out of memory?
No, in all practical operating systems, when a program exits, all its resources are reclaimed by the OS. Memory leaks become a more serious issue in programs that might continue running for an extended time and/or functions that may be called often from the same program.
On operating systems with protected memory (Mac OS 10+, all Unix-clones such as Linux, and NT-based Windows systems meaning Windows 2000 and younger), the memory gets released when the program ends.
If you run any program often enough without closing it in between (running more and more instances at the same time), you will eventually run out of memory, regardless of whether there is a memory leak or not, so that's also true of programs with memory leaks. Obviously, programs leaking memory will fill the memory faster than an identical program without memory leaks, but how many times you can run it without filling the memory depends much rather on how much memory that program needs for normal operation than whether there's a memory leak or not. That comparison is really not worth anything unless you are comparing two completely identical programs, one with a memory leak and one without.
Memory leaks become the most serious when you have a program running for a very long time. Classic examples of this is server software, such as web servers. With games or spreadsheet programs or word processors, for instance, memory leaks aren't nearly as serious because you close those programs eventually, freeing up the memory. But of course memory leaks are nasty little beasts which should always be tackled as a matter of principle.
But as stated earlier, all modern operating systems release the memory when the program closes, so even with a memory leak, you won't fill up the memory if you're continuously opening and closing the program.
Leaked memory is returned by the OS after the execution has stopped.
That's why it isn't always a big problem with desktop applications, but its a big problem with servers and services (they tend to run long times.).
Lets look at the following scenario:
Program A ask memory from the OS
The OS marks the block X as been used by A and returns it to the program.
The program should have a pointer to X.
The program returns the memory.
The OS marks the block as free. Using the block now results in a access violation.
Program A ends and all memory used by A is marked unused.
Nothing wrong with that.
But if the memory is allocated in a loop and the delete is forgotten, you run into real problems:
Program A ask memory from the OS
The OS marks the block X as been used by A and returns it to the program.
The program should have a pointer to X.
Goto 1
If the OS runs out of memory, the program probably will crash.
No. Once the OS finishes closing the program, the memory comes back (given a reasonably modern OS). The problem is with long-running processes.
When the process ends, the memory gets cleared as well. The problem is that if a program leaks memory, it will requests more and more of the OS to run, and can possibly crash the OS.
It's more leaking in the sense that the code itself has no more grip on the piece of memory.
The OS can release the memory when the program ends. If a leak exists in a program then it is just an issue whilst the program is running. This is a problem for long running programs such as server processes. Or for example, if your web browser had a memory leak and you kept it running for days then it would gradually consume more memory.
As far as I know, on most OS when a program is started it receives a defined segment of memory which will be completely liberated once the program is ended.
Memory leaks are one of the main reason why garbage collector algorithms were invented since, once plugged into the runtime, they become responsible in reclaiming the memory that is no longer accessible by a program.
Memory leaks don't persist past end of execution so a "solution" to any memory leak is to simply end program execution. Obviously this is more of an issue on certain types of software. Having a database server which needs to go offline every 8 hours due to memory leaks is more of an issue than a video game which needs to be restarted after 8 hours of continual play.
The term "leak" refers to the fact that over time memory consumption will grow without any increased benefit. The "leaked" memory is memory neither used by the program nor usable by the OS (and other programs).
Sadly memory leaks are very common in unmanaged code. I have had firefox running for a couple days now and memory usage is 424MB despite only having 4 tabs open. If I closed firefox and re-opened the same tabs memory usage would likely be <100MB. Thus 300+ MB has "leaked".

ARM/Linux memory leak: Can a user program retain memory after terminating?

I've got a memory leak somewhere, but it doesn't appear to be related to my program. I'm making this bold statement based on the fact that once my program terminates, either by seg-faulting, exitting, or aborting, the memory isn't recovered. If my program were the culprit, I would assume the MMU would recover everything, but this doesn't appear to be the case.
My question is:
On a small Linux system (64 Mb Ram) running a program that uses only stack memory and a few calls to malloc(), what causes can I look too see memory being run down and stay down once my program terminates?
A related question is here:
This all started when after code in question was directing its stdout, stderr to a file. After a few hours it aborted with a "Segmentation Fault". A quick (naive?) look at /proc/meminfo showed that there wasn't much available memory, so I assumed something was leaking.
It appears I don't have a memory leak (see here) but it does lead me to some new questions...
It turns out that writing to block devices can use a quite a pile of physical memory; in my system there was only 64 Meg, so writing hundreds of Megs to a USB drive was increasing the cached, active and inactive memory pools quite a bit.
These memory pools are immediately released to the Free memory pool when the device is dismounted.
The exact cause of my segmentation fault remains a small mystery, but I know it's occurence can be reduced by understanding the virtual memory resources better, particularly around the use of Block devices.

Resources