Should HeapDumpBeforeFullGC be used in production environment?

Should HeapDumpBeforeFullGC be used in production environment? - garbage-collection

After full gc happens, we may want to know how it happens. Without heap dump, I think it is hard to do, but in production environment, we usually can't get it in time. So I want to use HeapDumpBeforeFullGC in my application when it runs online.
My question is should HeapDumpBeforeFullGC be used in product environment? Will it bring some bad effects(if we don't consider disk usage)?
Or do we have other effective way to find what cause full gc in production environment?
Thanks!

If you consider a full GC a problem in production then yes, adding a heap dump may help. But it will make the pause times on a full GC even worse.
As alternatives you can turn on detailed GC logging which are often a good start to identify the general cause (insufficient heap size, leaks, allocation spikes, misconfiguration, swapping, ...). You can also use less invasive profilers (e.g. async-profiler or jmc) to spot excessive allocations

Related

GC taking 32% of runtime expected?

Currently working on optimizing a library for speed. I've already reduced execution time drastically, using V8 CPU and Memory Profiling through Webstorm. This was achieved mainly by changing the core method from recursive to iterative.
Now the self time distribution breaks down as
I'm assuming the first entry "node" is timing internal functions calls, which is great. The other entries also make sense. I'm new to Nodejs profiling, but 31.6% for GC seems high, so I've decided to investigate.
I've now created a heap dump through Webstorm, but unfortunately that doesn't give me much information.
These seem to be system internal memory references mainly. Stepping through the core iteration code logic again, there also don't seem to be a lot of places where memory is explicitly allocated (using this as a reference).
Question
Can the GC overhead be reduced?
Is this amount of allocation just expected here?
Is it possible to get better memory profiling information?
Setup Instructions
In case someone want's to try debugging this, I'm including setup instructions.
Download or clone object-scan and run
yarn install --frozen-lockfile
yarn run test-simple --verbose
Now create a file test.js in the project root containing this content and run node --trace_gc test.js or run it through Webstorm for advanced profiling.

In Javascript and in v8 (node) particularly an amount of time spent for garbage collection depends on amount of data stored in heap, but that's only one of many factors.
In v8 engine there are two main "types" of GC: minor (scavenge) and major (mark-sweep/mark-compact). You may see GC types that happen during your tests in console with --trace-gc enabled. And in different cases one type could "eat" more time than other an vice versa. So before optimizations you should determine which gc takes more time.
There are not a lot of options for optimizing major GC, cause it highly affected by amount of data that stays in memory for "long" (actually in this case long means that object survives scavenge GC) period. Such data is stored in so called "old space" in heap. And major GC works with this space and it should scan all that memory and mark objects that no longer have any references for further clearance.
In your case the amount of test data you're loading goes to old space. As a result it affects major GC during the whole test. And in this case major GC will not clear too much, because you're using your test object, but it still consume time for scanning entire old space. So you may consider preventing v8 from doing that by launching node with gc-specific flags like: --nouse-idle-notification --expose-gc --gc_interval=100500 (where 100500 is number of allocation, it can be take high value that will prevent running gc before the whole test will pass) that will allow trigger garbage collections manually. Test your code using this approach and see how major GC affects it, try tests with different amount of data you provide to function. If the impact is quiet high you may try to refactor your code trying to minimize long-lived variables, closures, etc.
If you'll discover that major GC doesn't have much impact on performance, then scavenge GC takes the most of time. Unlike major GC it operates with so called "new space" in heap. It's a space where all new objects are stored. If those objects survive scavenge, then they are moved to old space. New space has much smaller size ( you may control it by setting --max_semi_space_size, note: new space size = 2 * semi space size) than old space and more new objects and variables you allocate more scavenge GC runs will happen. If this GC heats performance too much you may consider refactor your code to make less new allocations. But if you'll reuse variables it may also slowdown the performance and those objects will go to old space and may become a problem described in "major GC" section.
Also v8 GC doesn't always work in the same thread that your program runs. It does some work in background too, but I don't know what Webstorm shows in your case. If it counts just total time spend in GC, may be it just doesn't have so much impact.
You may find more details on v8 GC in this blog post.
TL;DR:
Can the GC overhead be reduced?
Yes, but first you should discover what should be optimized by following steps above.
Is this amount of allocation just expected here?
That's could be just discovered by comparing different approaches. There's no some absolute number that could limit "good" amount from "bad", because it depends on lot's of factors, including the amount on entry data.
Is it possible to get better memory profiling information?
You may find some good tools here, but in general you may use Chrome dev tools which could provide a bit more details rather than Webstorm does.

How to reduce memory to minimum with global.gc() in nodejs?

I found the related problem here
But still don't got the real answer for this. :(
So, How to reduce to minimum memory with global.gc() in nodejs?
should I spam global.gc() function to reduce?

Instead of forcing the garbage collector to run, you should first identify if you actually have a memory leak by using various tools (e.g. node's built-in inspector, heapdump module, etc.) available for node that allow you to detect such leaks.
It's entirely possible that it appears there is a leak when there isn't, due to how V8's garbage collector works (it is generally lazy because GC is not exactly a cheap operation CPU usage-wise).
Also, you can limit the amount of memory used by V8 via the --max-old-space-size=xxxx command-line argument (where xxxx is the amount of memory in megabytes). This can also be helpful in more quickly determining whether you have a legitimate memory leak.

Linux memory usage history

I had a problem in which my server began failing some of its normal processes and checks because the server's memory was completely full and taken.
I looked in the logging history and found that what it killed were some Java processes.
I used the "top" command to see what processes were taking up the most memory right now(after the issue was fixed) and it was a Java process. So in essence, I can tell what processes are taking up the most memory right now.
What I want to know is if there is a way to see what processes were taking up the most memory at the time when the failures started happening? Perhaps Linux keeps track or a log of the memory usage at particular times? I really have no idea but it would be great if I could see that kind of detail.

#Andy has answered your question. However, I'd like to add that for future reference use a monitoring tool. Something like these. These will give you what happened during a crash since you obviously cannot monitor all your servers all the time. Hope it helps.

Are you saying the kernel OOM killer went off? What does the log in dmesg say? Note that you can constrain a JVM to use a fixed heap size, which means it will fail affirmatively when full instead of letting the kernel kill something else. But the general answer to your question is no: there's no way to reliably run anything at the time of an OOM failure, because the system is out of memory! At best, you can use a separate process to poll the process table and log process sizes to catch memory leak conditions, etc...

There is no history of memory usage in linux be default, but you can achieve it with some simple command-line tool like sar.
Regarding your problem with memory:
If it was OOM-killer that did some mess on machine, then you have one great option to ensure it won't happen again (of course after reducing JVM heap size).
By default linux kernel allocates more memory than it has really. This, in some cases, can lead to OOM-killer killing the most memory-consumptive process if there is no memory for kernel tasks.
This behavior is controlled by vm.overcommit sysctl parameter.
So, you can try setting it to vm.overcommit = 2 is sysctl.conf and then run sysctl -p.
This will forbid overcommiting and make possibility of OOM-killer doing nasty things very low. Also you can think about adding a little-bit of swap space (if you don't have it already) and setting vm.swappiness to some really low value (like 5, for example. default value is 60), so in normal workflow your application won't go into swap, but if you'll be really short on memory, it will start using it temporarily and you will be able to see it even with df
WARNING this can lead to processes receiving "Cannot allocate memory" error if you have your server overloaded by memory. In this case:
Try to restrict memory usage by applications
Move part of them to another machine

Detect and remove Memory Leak in Linux Application

We have a a very large project which is basically an application which uses Linux Application programming and runs on PowerPC processor. This project was initially developed by another company. We acquired the project from the company and now we are maintaining the project.
The application is reported to have a lot of memory leak issue. Since this is a large project, it is not possible to go to each source code file and find out the memory leak. We have used Valgrid, mpatrol and other memory leak detection tools. These tools did not help much and the memory leak has not decreased by a significant percentage.
In this situation, how to go about to reduce the memory leak by a significant amount.Is there a general method which people use in these case to reduce the memory leak other than the memory leak detection tools like mentioned above.

Usually Valgrind belongs to the best tools for this tasks. If it does not work correctly, there might only be a couple of things you can still do.
First question: What language is the application in? Valgrind is very good for C and C++, but will not help you with garbage collected or scripting language. So check the language first. There might be something similar for java, but I have not used that much java, so you would have to ask someone else.
Play around a lot with the settings of valgrind. There are several plugins, that can help with this. One example could be using --leak-check=full or similar options. There are also plugins for valgrind, that can enhance it detection capabilities.
You say, that the application was reported to have a memory leak. How was this detected? Did the application detect this by itself. If it was detected by the application on it's own without any external tools, this probably means someone has added their own memory tracker inside the application. Custom memory tracker, memory pools etc. mess up valgrind and any other leak detection system very bad. So in case any custom memory handling is present in the application, your only choice is to either deactivate it (if possible) or to hook into this custom mechanism. How this could be done depends on your application only.
Add your own memory tracker. For example in C++ it is possible to hook into new/delete calls and get them to track the memory. There are a couple of libraries you can use for this. You can also write your own new/delete replacement in about 500 LOC. If you decide to use this method, be sure to read a lot of tutorials on replacing new/delete, since there are several things that are unusual in the C++ world when attempting this task.
What makes you so sure, there is an memory leak in the application (i.e. how was this detected)? If a tool just reported huge numbers of allocated memory, this might not even mean, there is an actual memory leak. A memory leak means that the handles to the memory are lost and hence it becomes impossible to every reach and free that memory again. In case your application just get's a lot of memory and keeps it accessible, you probably have a completely different problem. For example you simply might use an algorithm with a bad space complexity at one point or the other, leading to many allocations. In this case you will not need a leak detector, but rather a memory profiler, which gives you more detailed overview of the memory footprint of the code parts. However I have never used a profiler for this kind of task before, so I cannot give you any more hints on this.

You could replace all memory allocation calls with calls to your own allocation methods, which should call original methods and at the same time count memory usage and where it was allocated. This will allow you to find the leaks and eliminate them by hand.
There might also be automated tools that allow you to do this - not sure, haven't used any. But this method works.

Perhaps you might also consider using Boehm's garbage collector (that is using GC_malloc instead of malloc etc... and not bother about free-ing data).

Do Small Memory Leaks Matter Anymore?

With RAM typically in the Gigabytes on all PC's now, should I be spending time hunting down all the small (non-growing) memory leaks that may be in my program? I'm talking about those holes that may be less than 64 bytes, or even a bunch that are just 4 bytes.
Some of these are very difficult to identify because they are not in my own code, but may be in third party code or in the development tool's code, and I may not even have direct access to the source. In those cases, it would involve lengthy communication with the vendors of these products.
I have seen the number one memory leak question here at SO: Are memory leaks ever ok? and the number one answer to that, as of now voted up 85 times, is: No.
But here I'm talking about small leaks that may take an inordinate amount of debugging, research and communication to track down.
And I'm only talking about a simple desktop app. I understand that apps running on servers must be as tight as possible.
So the question I am really asking is, if I know I have a program that leaks, say 40 bytes every time it is run, does that matter?
(source: beholdgenealogy.com)
Also see my followup question: What Operating Systems Will Free The Memory Leaks?
Postscript: I just purchased EurekaLog for my program development.
I found an excellent article by Alexander, the author of EurekaLog (who should know these things), about catching memory leaks. In that article, Alexander states the answer to my question very well and succinctly:
While any error in your application is always bad, there are types of errors, which can be not visible in certain environments. For example, memory or resources leaks errors are relatively harmless on client machines and can be deadly on servers.

This is completely a personal decision.
However, if:
So the question I am really asking is, if I know I have a program that leaks, say 40 bytes every time it is run, does that matter?
In this case, I'd say no. The memory will be reclaimed when the program terminates, so if it's only leaking 40 bytes one time during the operation of an executable, that's practically meaningless.
If, however, it's leaking 40 bytes repeatedly, each time you do some operation, that might be more meaningful. The longer running the application, the more significant that becomes.
I would say, though, that fixing memory leaks often is worthwhile, even if the leak is a "meaningless" leak. Memory leaks are typically indicators of some underlying problem, so understanding and correcting the leak will often make your program more reliable over time.

Leaks are bugs.
You probably have other bugs too.
When you ship a product, you ship it with known bugs. When you choose which (of the known) bugs to "fix" versus "ship with", you do so based on the cost and risk to fix versus the customer benefit.
Leaks are no different. If it's a small leak that happens during an infrequent operation in a non-server app (e.g. an app that runs for minutes or hours and then shuts down), it might be "ok" in the same way any other bug is ok.
Actually, leaks can be kinda different in one important way, which is that if you are shipping a library/API, you really should fix them, because the customer benefit is enormous (otherwise all your customer 'inherit' your leak, and will be phoning you just as you have to do to talk to 3rd party vendor now).

While I agree that every little leak adds up, I don't agree that it's always the best business decision to fix it.
What if you have a stateless legacy system and no coders who understand it? Now you are using it in a situation that has to scale... and it's 100X cheaper to spawn a new instance and swap them out before memory goes overboard.
Or let's say you have a batch processing system that runs 24x7 but for which there is no real user. If it's cheaper to monitor memory and tell the system to restart itself periodically, why hunt down the leak?
I think you should try real hard but be pragmatic about the business ramifications of the decision.

No, it does not matter, however, only if, as you pointed out, the memory leak must not be repetitive. Memory leaks that don't grow as a program progress is usually okay. Non-growing memory leaks will eventually be solved when a process terminate.
However, it is difficult to prove an observed memory leak is not growing; you have sufficient empirical data. In reality, many huge program (even written in Java/C#) have memory leaks, but most of them are non-growing leaks.
Seriously, we can't live without memory leaks, deadlocks, data races. Having these bugs itself are okay. Only when it kills your program, it matters.
But, I have to disagree with your opinion: "memory is cheap". That can't justify memory leaks. That's very dangerous.

Yes. Leaks matter. If your apps runs 24x7x365 and handles a few thousands transactions per second, a few bytes turns into gigabytes rapidly.

A memory leak really depends on several things:
How often the leak happens
How much memory is lost each time
How long is the program going to run
For example, if you lose 40 bytes every time a task happens, and that task happens when the program starts, then nobody cares. If you lose 40Mb every time the program starts, then it should be investigated. If you lose 40 bytes every frame in your video or game engine, then you should look into that, because you'll lose 1.2kB each second, and after an hour you would have lost almost 4Mb.
It also depends on how long the program is going to stick around for. For example, I have a small calculator app, that I open, run a calculation in, and then close again. If that app loses 4Mb in it's run, then it doesn't really matter, because the OS will reclaim that lost memory once I close it. If the hypothetical video/game engine mentioned earlier lost 4Mb an hour, and it ran a demo unit, for several hours a day at a stand at a convention, then I'd look into it.
An example of a famous memory leak is Firefox, which lost a lot of memory in it's earlier versions. If your browser leaked memory 10 years ago, then you probably wouldn't care. You shut down the computer every day, and you while running the browser you only had one page up at a time. Today I just let my laptop go to standby, and I never close Firefox. It is open for weeks at a time, and I have at least 10 tabs open at any given time. If memory leaks every time a tab is closed, then that is going to build up to a larger leak now than it did 10 years ago, and so it is more important.

Are memory leaks ever ok?
Sure, if it's a short-lived process.
Memory leaks over a long period of time are, as the 85-point answer implies, problematic. Take a simple desktop app, for example -- prior to versions 3.x, did you ever notice how you needed it reboot Firefox after a while to recover it from sluggishness?
As for the short term, no, it doesn't matter. Take CGI or PHP scripts for example, or the little Perl three-liner in your ~/bin directory. Nobody's going to call the memory police if you write a 30-line non-looping application in C with 5 lines of malloc() and not a single call to free().

I am in the same boat as you. I have small memory leaks that don't grow ever. Most of the leaks are caused by improperly tearing down COM objects. I have studied the leaks and come to realize the time and money to fix them is disproportional to the damage the leaks do. Windows cleans up most of the time so the true damage is only realized if the user runs his computer for years without rebooting.
I think it's acceptable to leave in the leaks. It sounds so taboo, but if the leaks never ever ever grow and they are small, it's pretty insignificant in the larger scheme of things.

I agree with the earlier responses that leaks do matter.
People may have tons of memory, but they are also running more and more programs, and unless your application is completely hogging up the processor, it needs to play nice with other programs, which also means not hogging up resources it doesn't need.
So, this small memory leak will add up and mean that the user will have other problems, and if they decide they are having memory issues, if they decide that running your app causes them problems then they will stop running it.
Besides, as has been pointed out, if you don't know what is causing the leak then you may have other problems you don't know about. It may be the tip of a bug iceberg.

It depends on the nature of your application. I work primarily with web sites and web applications. So by most definitions, my application "runs" once per request. Code that leaks a few bytes per request on a high volume site can be catastrophic. From experience, we had some code which leaked a few kb per request. Added up, that caused our web server worker processes to restart so often it caused minute-long outages throughout the day.
But web applications (and many other kinds) have an indefinite lifespan - they run continuously, forever. The shorter-lived your application, the better. If each session of your application has a finite and reasonably predictable end point, there's of course a reasonable amount of leakage you can tolerate. It's all about the ROI.

It all depends. Reasons not to worry: the process is short-lived, the leaks are small and/or infrequent, the cost of an out of memory exception is low (eg, a web server instance in a cluster needs restarting and a few fetches need retrying). So I agree that some leaks don't really matter in practical terms.
But on the other hand, if you do have cause to worry, or even feel a nagging sense of doubt that maybe you're not taking quality seriously enough, it's a small matter (in most cases) to run your software with a memory leak detector and fix the problems. There are many good leak detectors out there. And you might find that the leak is part of a more serious problem, such as not releasing other resources (like open files). You may even find that the harmless leak would turn quite dangerous in usage scenarios you haven't tested yet.

Yes, it matters. Every little leak adds up.
For one, if your leaky code is used in a context where it is repeatedly used, and it leaks a little bit each time, those little bits add up. Even if the leak is small, and infrequent, those things can add up to significant quantities over long periods of time.
Secondarily... if you're writing code that has memory leaks, then that code has problems. I'm not saying that good code doesn't from time to time have memory leaks, but the fact of their existence means that there are some serious problems going on. Many, many security holes are due to just this sort of oversight (unbounded string copy, anyone?).
Bottom line is, if you know about it, and don't do all you can to track it down and fix it, then you're causing problems.

Memory leaks are never OK in any program, however small it may be.
Slowly they will add up to fill up your entire memory. Suppose you have a calling system which leaks about 4 bytes of memory per call it handles. You can handle say, 100 calls a second (this is a very small number), so you end up leaking 400 bytes a second or 400x60x60(1440000B) an hour. So, even a small leak is not acceptable.
And if you dont know the source of the leak then it may be some real big issue and you end up having buggy software.
But, essentially it boils down to the questions like, for how much time the program runs and what number of times the leak happens. So, it may be ok it leaks a very small amount and is not repeated but still the leak may be a small part of a bigger problem.
So, I feel that memory leaks are never ok.

That's like asking if there was a crack in a dam is it ok? NO! NEVER! Avoid memory leaks as if your life depends on it because when your application grows and gets more use that leak is going to turn into a flood and sooner or later someone or something is going to drown in it. Just because you can have lots of memory doesn't mean that you can take shortcuts with your code. Once you find a leak do what you can to resolve it and if you can't solve it make sure you keep coming back to it until it's fixed.
If you can't resolve the leak then try to see if you can clean up after it. The bigger issues come when the leak is repetitive.
Last note: if you ever hand the software to someone else and that leak is still there it may be a long time before someone else finds and/or fixes it.

I wouldn't be so worried about the quantity but the frequency of memory which you leak, but if you leak even just a few bytes very very often, your malloc's data structures will grow and might make it dramatically slower to traverse them, to allocate new memory and free. Unless you hit the border where you have leaked more than a tiny fraction of your RAM, mainly your program will suffer under those performance problems and not the whole system. Does not apply to even remotely dlmalloc-based systems (FreeBSD, Linux, etc), there it's just don't care, all you loose there is memory (perhaps a few times more than the amount you think) and not performance.
A single allocation which is not reclaimed by your program is not a leak at all. If you write a small command line utility which takes a second to complete, you may not need to even reclaim any memory there. Upon termination, the OS reclaims RAM, file handles, should basically apply to any kind of system resource, but you cannot rely on some OSes as much as on others, but as long as it's just memory, even Windows 95 will manage it just right.
Oh and another thing follows from that, if you leak memory, don't bother cleaning up at the end of the program or after a long execution time, or you will just waste even more CPU time. Always fix the leaks as near to the timepoint where they are created as possible. Other reason: malloc implementations prefer to keep the RAM they got from the OS for future allocations instead of giving it back. Also you may suffer address space fragmentation.

If someone says memory leaks are ok in small amounts and as long as it doesn't crash the application, it is like saying, stealing is ok if in small amounts and as long as you are not caught :)

Memory leaks are very important in 32 bit applications because the application is limited to 2^32 bytes of memory which is approximately 4 GB. Once a 32 bit application attempts to allocate more than 2^32 bytes of memory the application may crash or hang.
Memory leaks are not as important in 64 bit applications because the application is limited to 2^64 bytes of memory which is approximately 16 EB; so with a 64 bit application you are more-so limited by hardware, but the OS will still likely impose an artificial limit at some time.
Bottom line is that having a memory leak in your code is bad programming; so fix it and be a better programmer.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string