Trying to detect memory leak in a webapp.
Taken heap dump of the app at the time of crash.
using eclipse MAT to parse the dump.
The collated info from the parsing leads to these 2 conclusions -
Objects occupying more memory dont have GC roots. Essentially whenever GC happens, they get cleaned up.
Objects under GC roots occupy significantly less memory. So these may not be the root cause of the memory leak(?).
So does this mean there is no leak happening? and the crash happens because of out of memory error?
EDIT : ADDING ENV INFO
I am running a java webapp on tomcat 6.
The webapp is based on openreports (reporting tool)
Adding incoming reference list of the biggest object --
http://imgur.com/lYrju
Here each instance of hash map has a reference from com.opensymphony.xwork2 which is not GC collected. Would this probably be a source of the problem. Because tomcat logs say -
SEVERE: The web application [/openreports] created a ThreadLocal with key of type [com.opensymphony.xwork2.ActionContext.ActionContextThreadLocal] (value [com.opensymphony.xwork2.ActionContext$ActionContextThreadLocal#7c45901a]) and a value of type [com.opensymphony.xwork2.ActionContext] (value [com.opensymphony.xwork2.ActionContext#3af7dab3]) but failed to remove it when the web application was stopped. This is very likely to create a memory leak.
SEVERE: The web application [/openreports] created a ThreadLocal with key of type [com.opensymphony.xwork2.inject.ContainerImpl$10] (value [com.opensymphony.xwork2.inject.ContainerImpl$10#258c27bd]) and a value of type [com.opensymphony.xwork2.inject.InternalContext[]] (value [[Lcom.opensymphony.xwork2.inject.InternalContext;#1484fc8d]) but failed to remove it when the web application was stopped. This is very likely to create a memory leak.
EDIT : Adding stack trace of OOM error
java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOfRange(Arrays.java:3209)
at java.lang.String.<init>(String.java:215)
at java.lang.StringBuffer.toString(StringBuffer.java:585)
at java.io.StringWriter.toString(StringWriter.java:193)
at org.displaytag.tags.TableTag.writeExport(TableTag.java:1503)
at org.displaytag.tags.TableTag.doExport(TableTag.java:1454)
at org.displaytag.tags.TableTag.doEndTag(TableTag.java:1309)
at org.efs.openreports.engine.QueryReportEngine.generateReport(QueryReportEngine.java:198)
at org.efs.openreports.util.ScheduledReportJob.execute(ScheduledReportJob.java:173)
at org.quartz.core.JobRunShell.run(JobRunShell.java:202)
at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:529)
10:01:04,193 ERROR ErrorLogger - Job (90.70|1338960412084 threw an exception.
org.quartz.SchedulerException: Job threw an unhandled exception. [See nested exception: java.lang.OutOfMemoryError: Java heap space]
at org.quartz.core.JobRunShell.run(JobRunShell.java:213)
at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:529)
Caused by: java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOfRange(Arrays.java:3209)
at java.lang.String.<init>(String.java:215)
at java.lang.StringBuffer.toString(StringBuffer.java:585)
at java.io.StringWriter.toString(StringWriter.java:193)
at org.displaytag.tags.TableTag.writeExport(TableTag.java:1503)
at org.displaytag.tags.TableTag.doExport(TableTag.java:1454)
at org.displaytag.tags.TableTag.doEndTag(TableTag.java:1309)
at org.efs.openreports.engine.QueryReportEngine.generateReport(QueryReportEngine.java:198)
at org.efs.openreports.util.ScheduledReportJob.execute(ScheduledReportJob.java:173)
at org.quartz.core.JobRunShell.run(JobRunShell.java:202)
Are you running ASP.NET WebForms (not MVC) application? We had a similar issue in our application and these are the points that we ran across. First, I should state that our web farm is composed of a lot of application pools, one for each client of course. Each app pool was given a specific memory limit in order to ensure that clients did not spiral memory out of control and affect other worker processes.
1) Even when objects are not rooted, if they are over 85K they will be placed onto the Large Object Heap, which .NET's GC does not compact. That means that if you have an object that is 100K, it will sit on the LOH. When the object is cleaned up, you will get your 100K back and the GC might decide to put something else in that hole. Since it is not compacted, you can't ever be guaranteed that the space is completely filled. This leaves holes in the space that was not filled up completely and leads to memory fragmentation. The only way to resolve this is to examine your dump file and see what objects are taking up the largest space and identify classes/collections that can be pared down. You will likely see a lot of object arrays and strings as those are typically rather large in cache managers and such.
2) Are you disposing of instances? Waiting until the finalizer thread comes along to cleanup instances should not be the default case, because that can lead to terrible GC performance. Make sure you are disposing all classes that implement the IDisposable interface somewhere in their inheritance. Calling Dispose early means that the resources are released in a deterministic fashion, whereas depending on the finalizer means that cleanup can happen much, much later (if at all).
3) If you are getting OutOfMemoryExceptions, it could be due to two reasons: you ran out of managed memory (rare in the case of a worker process with a limit set as it will be recycled) OR you ran out of Virtual Memory, which is even worse since you are limited to 2GB of Virtual Memory if you are running in a 32-bit IIS application. We also saw this problem in production where the GC was allocating 64MB chunks of heap (32GB in Workstation Mode) and once it was close to the 2GB limit and was not able to allocate more space, it threw the exception (see: Should we use "workstation" garbage collection or "server" garbage collection?). If this is the case, you are probably leaking a managed resource where it is rooted to something static or you did not cleanup event handlers by removing them with the -= operator.
If you could post your dump file or at least the relevant parts of it, it would be easier to see where the problem lies. But from your short description, these are the items I would look at.
Related
I have node.js server running two instances in cluster mode (via pm2).
The two instances are obviously identical, they execute the same code, load the same data.
Yet memory usage differs by over 100%:
Instance 1: 303,592kB
Instance 2: 614,404kB
Is there any reason the OS (Linux) can cause this behavior? The machine has plenty or RAM, so I would exclude memory shortage.
Have the two servers been running for the same amount of time? Did they answer the same requests?
Node.js is a garbage-collected runtime. Memory use over time is not constant. The garbage collector kicks in depending on allocation behavior, heap size and limit, idleness, and possibly other factors. Maybe your instance 1 has just done a major round of garbage collection, and instance 2 is about to do one? Have you watched their memory usage over time?
I would like to unterstand the GC Process a little bit better in Nodejs/V8.
Could you provide some information for the following questions:
When GC is triggered, does this block the event loop of node js?
Is GC running in it's own process or is just a submethod of the event-loop ?
When spawning nodejs process via Pm2 (clustered mode) does the instance
really have it's own process or is the GC shared between the
instances ?
For Logging Purposes I am using Grafana
(https://github.com/RuntimeTools/appmetrics-statsd), can someone
explain the differences \ more details about these gauges:
gc.size the size of the JavaScript heap in bytes.
gc.used the amount of memory used on the JavaScript heap in bytes.
Are there any scenarios where GC is not freeing memory (gc.used) in relation with stress tests?
The questions are related to an issue that I am currently facing. The used memory of GC is rising and doesn't release any memory (classical memory leak). The problem is that it only appears when we a lot of requests.
I played around with max-old-space-size to avoid pm2 restarts, but it looks like that GC is not freeing up anymore and the whole application is getting really slow...
Any ideas ?
ok some questions, I already figured out:
gc.size = Total Heap Size (https://nodejs.org/api/v8.html -> getHeapStatistics),
gc.used = used_heap_size
it looks ok that when gc_size hits a plateu that it never goes down again =>
Memory usage doesn't decrease in node.js? What's going on?
Why is garbage collection expensive? The V8 JavaScript engine employs a stop-the-world garbage collector mechanism. In practice, it means that the program stops execution while garbage collection is in progress.
https://blog.risingstack.com/finding-a-memory-leak-in-node-js/
This may have been asked a few different ways, but this is a relatively new field to me so forgive me if it is redundant and point me on my way.
Essentially I have created a data collection engine that take high speed data (up to thousands of points a second) and stores them in a database.
The database is dynamic, so the statements being fed to the database are dynamically created in code as well, this in turn required a great deal of string manipulation. All of the strings however are declared within scope of asynchronous event handler methods, so they should fall out of scope as soon as the method completes.
As this application runs, its memory usage according to task manager / process explorer, slowly but steadily increases, so it would seem that something was not getting properly disposed and or collected.
If I attach CDB -p (yes I am loading the sos.dll from the CLR) and do a !dumpheap I see that the majority of this is being used by System.String, as well if I !dumpheap -type System.String, and the !do the addresses I see the exact strings (the SQL statements).
however if I do a !gcroot on the any of the addresses, I get "Found 0 unique roots (run '!GCRoot -all' to see all roots)." that in turn if I try as it suggests I get "Invalid argument -all" O.o
So after some googling, and some arguments concerning that unrooted objects will eventually be collected by GC, that this is not an issue.. I looked to see, and it appears 84% of my problem is sitting on the LOH (where depending on which thread you look at where, may or may not get processed for GC unless there is a memory constraint on the machine or I explicitly tell it to collect which is considered bad according to everything I can find)
So what I need to know is, is this essentially true, that this is not a memory leak, it is simply the system leaving stuff there until it HAS to be reclaimed, and if so how then do I tell that I do or do not have a legitimate memory leak.
This is my first time working the debugger external to the application as I have never had to address this sort of issue before, so I am very new to that portion, this is a learning experience.
Application is written in VS2012 Pro, C#, it is multi-threaded, and a console application is wrapping the API for testing, but will eventually be a Windows service.
What you read is true, managed applications use a memory model where objects pile on until you reach a certain memory threshold (calculated based on the amount of physical memory on your system and your application's real growth rate), after which all(*) "dead" objects get squished by the rest of the useful memory, making it one contiguous block for allocation speed.
So yes, don't worry about your memory steadily increasing until you're several tens of MB up and no collection has taken place.
(*) - is actually more complicated by multiple memory pools (based on object size and lifetime length), such that the system isn't constantly probing very long lived objects, and by finalizers. When an object has a finalizer, instead of being freed, the memory gets squished over them but they get moved to a special queue, the finalizer queue, where they wait for the finalizer to run on the UI thread (keep in mind the GC runs on a separate thread), and only then it finally gets freed.
I've implemented a web service using Camel's Jetty component through Akka (endpoint) which forwards received messages to an actor pool with the setup of:
def receive = _route()
def lowerBound = 5
def upperBound = 20
def rampupRate = 0.1
def partialFill = true
def selectionCount = 1
def instance() = Actor.actorOf[Processor]
And Processor is a class that processes the received message and replies with the result of the process. The app has been working normally and flawless on my local machine, however after deploying it on an EC2 micro instance (512m of memory - CentOS like OS) the OS (oom-killer) kills the process due to OutOfMemory (not JVM OOM) after 30 calls or so (regardless of the frequency of calls).
Profiling the application locally doesn't show any significant memory leaks, if there exist any at all. Due to some difficulties I could not perform proper profiling on the remote machine but monitoring "top"s output, I observed something interesting which is the free memory available stays around 400mb after the app is initialized, afterwards it bounces between 380mb to 400mb which seems pretty natural (gc, etc). But the interesting part is that after receiving the 30th or so call, it suddenly goes from there to 5mb of free memory and boom, it's killed. The oom-killer log in /var/log/messages verifies that this has been done by the OS due to lack of memory/free swap.
Now this is not totally Akka-relevant but I finally decided I should seek some advice from you guys, after 3 days of hopeless wrestling.
Thanks for any leads.
I have observed that when lot of small objects are created, which should be garbage collected immediately, the Java process is killed. Perhaps because the memory limit is reached before the temporary objects are reclaimed by GC.
Try running it with concurrent mark and sweep garbage collector:
java -XX:+UseConcMarkSweepGC
My general observation is that the JVM uses a lot of memory beyond the Java heap. I don't know exactly for what, but can only speculate that it might using normal C heap for compilation or compiled-code storage or other permgen stuff or whatnot. Either way, I have found it difficult to control its usage.
Unless you're very pressed on disk storage, you may want to simply create a swap file of a GB or two so that the JVM has some place to overflow. In my experience, the memory it uses outside the Java heap isn't referenced overly often anyway and can just lie swapped out safely without causing much I/O.
Is it due to basic misunderstandings of how memory is dynamically allocated and deallocated on the programmer's part? Is it due to complacency?
No. It's due to the sheer amount of accounting it takes to keep track of every memory allocation. Who is responsible for allocating the memory? Who is responsible for freeing it? Ensuring that you use the same API to allocate and free the memory, etc... Ensuring you catch every possible program flow and clean up in every situation(for example, ensure you clean up after you catch an error or exception). The list goes on...
In a decent sized project, one can lose track of allocated resources.
Sometimes a function is written expecting an uninitialized data structure as input that it will then initialize. Someone passes in a data structure that already initialized, and thus the previously allocated memory is leaked.
Memory leaks are caused by basic misunderstandings the same sense every bug is. And I would be shocked to find out anyone writes bug free code the first time every time. Memory leaks just happen to be the kind of bug that rarely causes a crash or explicitly wrong behavior (other than using too much memory, of course), so unless memory leaks are explicitly tested for a developer will likely never know they are present. Given that changes in the codebase always add bugs, and memory leaks are virtually invisible, memory leaks expand as a program ages and expands in size.
Even in languages which have automatic memory management, memory can be leaked because of cyclical references, depending on the garbage collection algorithm used.
I think it is due to the pressures of working in job that requires dead-lines and upper management pushing the project to get it out the door. So you could imagine, with the testing, q&a, peer code reviews, in such pressurized environments, that memory leaks could slip through the net.
Since your question did not mention language, today, there's automatic memory management that takes care of the memory accounting/tracking to ensure no memory leaks occur, think Java/.NET, but a few can slip through the net. It would have been with the likes of C/C++ that uses the malloc/new functions, and invariably are harder to check, due to the sheer volume of memory being allocated.
Then again, tracking down those leaks can be hard to find which is throwing another curveball to this answer - is it that it works on the dev's machine that it doesn't show up, but when in production, the memory starts leaking like hell, is it the configuration, hardware, software configuration, or worse, the memory leak can appear at random situation that is unique to within the production environment, or is it the time/cost constraint that allowed the memory leaks to occur or is it that the memory profiling tools are cost prohibitive or lack of funding to help the dev team track down leaks...
All in all, each and everyone within the dev team, have their own responsibility to ensure the code works, and know the rules about memory management (for example, such as for every malloc there should be a free, for every new there should be a delete), but no blame should be accounted for the dev team themselves, neither is finger pointing at the management for 'piling on the pressure on the dev team' either.
At the end of day, it would be false economy to rely on just the dev team and place 'complacency' on their shoulders.
Hope this helps,
Best regards,
Tom.
Bugs.
Even without bugs, it can be impossible to know in advance which function should deallocate memory. It's easy enough if the code structure is essentially functional (the main function calls sub-functions, which process data then return a result), but it isn't trivial if several treads (or several different objects) share a piece of memory. Smart pointers can be used (in C++), but otherwise it's more or less impossible.
Leaks aren't the worst kind of bug. Their effect is generally just a cumulative degradation in performance (until you run out of memory), so they just aren't as high a priority.
Lack of structured scopes and clear ownership of allocated memory.