Memory leak in SBCL's REPL

Memory leak in SBCL's REPL - memory-leaks

I'm somewhat baffled by the following behaviour of SBCL garbage collector in REPL. Define two functions:
(defun test-gc ()
(let ((x (make-array 50000000)))
(elt x 0)))
(defun add-one (x) (+ 1 x))
Then run
(add-one (test-gc))
I would expect that nothing references the original array anymore. Yet, as (room) reports, the memory is not freed. I would understand, if I ran (test-gc) directly, then some reference could have been stuck somewhere in SLIME or in
(list * ** ***)
But was is the case here? Thanks, Andrei.
Update Some time ago I filed a bug. It was recently confirmed. See:
https://bugs.launchpad.net/sbcl/+bug/936304

Just because nothing references the objects anymore doesn't mean that the memory will be reclaimed. The garbage collector will be run some time in the future, and often the only guarantee that you get is that it will be run before you get an out of memory error.
Another thing that may happen here is that you are looking at the Lisp process memory usage. When memory is CG'ed it is generally not returned to the operating system. Instead, the memory is simply marked as free on the heap, and can be used in future memory allocations.

SBCL only...
(gc :full t)
This will forcibly kick off a garbage collection across all generations. I noticed SBCL was holding onto a ton of memory a few days ago and used this to drop the memory down to the "true" usage.
I then wrote an ensure-gc macro to wrap my garbagey computation & experimentation stuff in. I'll paste it in when I get home if I remember... it's nothing fancy.

Related

How to keep very big elements on memory without exhausting the garbage collector?

In Haskell, I created a Vector of 1000000 IntMaps. I then used Gloss to render a picture in a way that accesses random intmaps from that vector.
That is, I had keep every single one of them in memory. The rendering function itself is very lightweight, so the performance was supposed to be good.
Yet, the program was running at 4fps. Upon profiling, I noticed 95% of the time was spent on GC. Fair enough:
The GC is crazily scanning my vector, even though it never changes.
Is there any way to tell GHC "this big value is needed and will not change - do not try to collect anything inside it".
Edit: the program below is sufficient to replicate the issue.
import qualified Data.IntMap as Map
import qualified Data.Vector as Vec
import Graphics.Gloss
import Graphics.Gloss.Interface.IO.Animate
import System.Random
main = do
let size = 10000000
let gen i = Map.fromList $ zip [mod i 10..0] [0..mod i 10]
let vec = Vec.fromList $ map gen [0..size]
let draw t = do
rnd <- randomIO :: IO Int
let empty = Map.null $ vec Vec.! mod rnd size
let rad = if empty then 10 else 50
return $ translate (20 * cos t) (20 * sin t) (circle rad)
animateIO (InWindow "hi" (256,256) (1,1)) white draw
This accesses a random map on a huge vector and draws a rotating circle whose radius depend on whether the map is empty.
Despite that logic being very simple, the program struggles at around 1 FPS here.

gloss is the culprit here.
First, a little background on GHC's garbage collector. GHC uses (by default) a generational, copying garbage collector. This means that the heap consists of several memory areas called generations. Objects are allocated into the youngest generation. When a generation becomes full, it is scanned for live objects and the live objects are copied into the next older generation, and then the generation that was scanned is marked as empty. When the oldest generation becomes full, the live objects are instead copied into a new version of the oldest generation.
An important fact to take away from this is that the GC only ever examines live objects. Dead objects are never touched at all. This is great when collecting generations that are mostly garbage, as often happens in the youngest generation. It's not good if long-lived data undergoes many GCs, as it will be copied repeatedly. (It can also be counterintuitive to those used to malloc/free-style memory management, where allocation and deallocation are both quite expensive, but leaving objects allocated for a long time has no direct cost.)
Now, the "generational hypothesis" is that most objects are either short-lived or long-lived. The long-lived objects will quickly end up in the oldest generation since they are alive at every collection. Meanwhile, most of the short-lived objects that are allocated will never survive the youngest generation; only those that happen to be alive when it is collected will be promoted to the next generation. Similarly, most of those short-lived objects that do get promoted will not survive to the third generation. As a result, the oldest generation that holds the long-lived objects should fill up very slowly, and its expensive collections that have to copy all the long-lived objects should occur rarely.
Now, all of this is actually true in your program, except for one problem:
let displayFun backendRef = do
-- extract the current time from the state
timeS <- animateSR `getsIORef` AN.stateAnimateTime
-- call the user action to get the animation frame
picture <- frameOp (double2Float timeS)
renderS <- readIORef renderSR
portS <- viewStateViewPort <$> readIORef viewSR
windowSize <- getWindowDimensions backendRef
-- render the frame
displayPicture
windowSize
backColor
renderS
(viewPortScale portS)
(applyViewPortToPicture portS picture)
-- perform GC every frame to try and avoid long pauses
performGC
gloss tells the GC to collect the oldest generation every frame!
This might be a good idea if those collections are expected to take less time than the delay between frames anyways, but it's clearly not a good idea for your program. If you remove that performGC call from gloss, then your program runs quite quickly. Presumably if you let it run for long enough, then the oldest generation will eventually fill up and you might get a delay of a few tenths of a second as the GC copies all your long-lived data, but that's much better than paying that cost every frame.
All that said, there is a ticket #9052 about adding a stable generation, which would also suit your needs nicely. See there for more details.

I would try compiling -with-rtsopts and then playing with the heap (-H) and/or allocator (-A) options. Those greatly influence how the GC works.
More info here: https://downloads.haskell.org/~ghc/latest/docs/html/users_guide/runtime-control.html

To add to Reid's answer, I've found performMinorGC (added in https://ghc.haskell.org/trac/ghc/ticket/8257) to be the best of both worlds here.
Without any explicit GC scheduling, I still get frequent collection-related frame drops as the nursery becomes exhausted. But performGC indeed becomes performance-killing as soon as there is any significant long-lived memory usage.
performMinorGC does what we want, ignoring long-term memory and cleaning up the garbage from each frame predictably - especially if you tune -H and -A to ensure that the per-frame garbage fits in the nursery.

When does the garbage collector run when calling Haskell exports from C?

When exporting a Haskell function to be called from C, when does Haskell's garbage get collected? If C owns main then there is no way to predict the next call in to Haskell. This question is especially pertinent when running single-threaded Haskell or without parallel GC.

When you initialize the ghc runtime, you can pass rts flags to it via argc and argv like so:
RtsConfig conf = defaultRtsConfig;
conf.rts_opts_enabled = RtsOptsAll;
hs_init_ghc(&argc, &argv, conf);
This lets you set options to, for example fix a smaller maximum heap size or use a compaction algorithm on the nursery to further reduce allocation. Further, note there is an idle GC whose interval can be set (or disabled), and if you link the threaded runtime, that should run whether or not you ever yield back to a Haskell call.
Edit: I haven't actually performed experimentation to verify the following, but if we look at the source of hs_init_ghc we see that it initializes signal handlers, which should include the timer handlers that respond on SIGVTALRM and indeed it also starts the time, which calls (on POSIX) timer_create that should throw those signals on regular intervals. In turn, this periodically should "wake up" the RTS whether or not anything is happening, which in turn should mean that it will run idle GC whether or not the system yields back to Haskell from C. But again, I have only read the code and commentary, not tested this myself.

JRuby - How to start the garbage collector?

I fired up my JRuby irb console and typed:
irb(main):037:0* GC.enable
(irb):37 warning: GC.enable does nothing on JRuby
=> true
irb(main):038:0> GC.start
=> nil
irb(main):039:0>
How can I manually enable or start the JVM garbage during a program?
I ask because I have a program which is needs to generate about 500 MBytes of test data and save it in MySQL. The program uses about 5 levels of nested loops, and it crashes with a JVM memory heap exception after generating about 100 MBytes of test data because there is no more heap memory. I would like to give let the garbage collector run after every run of the outer loop so that all the orphaned objects created in the inner loops can be cleaned up .

The exact answer to your question would be:
require 'java'
java_import 'java.lang.System'
# ...
System.gc()
though, bearing in mind even though the JVM usually does run the GC, it may or may not do it – very dependent on the JVM implementation. It can also be quite a hit on performance.
A better answer is obviously to ensure that at the end of the nested loop, no reference is held on the test data you are generating, so that they can indeed be reclaimed by the GC later on. Example:
class Foo; end
sleep(5)
ary = []
100_000.times { 100_000.times{ ary << Foo.new }; puts 'Done'; ary = [] }
If you run this with jruby -J-verbose:gc foo.rb, you should see the GC regularly claiming the objects; this is also quite clear using JVisualVM (the sleep in the example is to give some time to connect to the Jruby process in JVisualVM).
Lastly you can increase heap memory by adding the following flag: -J-Xmx256m; see the JRuby wiki for more details.
Edit: Coincidentally, here is a mindmap on GC tuning recently presented by Mario Camou at Madrid DevOps re-posted by Nick Sieger.

It's not possible because Gc will be run automatically by JVM. Make sure that you're creating objects only when it's required. Avoid creating class level objects and try to find out which of the objects is taking more memory and create it only when it's required.

Space leaks in Haskell

I have read it many times that lazy evaluation in Haskell may sometimes lead to space leaks. What kind of code can lead to space leaks? How to detect them? And what precautions can be taken on part of a programmer to avoid them?

You will get probably many answeres, this is the one, I have encountered when trying to do some 'real-world' application.
I was using multithreading and some MVars to pass data around (MVar is something like locked shared memory). My typical pattern was:
a <- takeMVar mvar
putMVar mvar (a + 1)
And then, just sometimes, when a proper condition happened I did something like:
a <- takeMVar mvar
when (a > 10) ....
The problem is that the content of mvar was essentially (0 + 1 + 1 + 1 + ....)...which was quite intensive for numbers like 100k... This type of problem was quite pervasive in my code; unfortunately for multithreading applications it's very easy to get into such problems.
Detecting...what I did was starting haskell in the mode that produces data regarding memory consumption, starting and stopping different threads and looking if memory footprint is stable or not...
Anotomy of a thunk leak (with instructions how to debug it)
An example: Thunk memory leak as a result of map function

I've run into this problem when doing recursion over large data structures. The built up thunks get to be too much and then you get a space leak.
In Haskell, you need to be constantly aware of the possibility of running into a space leak. Since iteration doesn't exist, basically any recursive function has the potential to generate a space leak.
To avoid this problem, memoize recursive functions, or rewrite them tail-recursively.

Why is Garbage Collection Required for Tail Call Optimization?

Why is garbage collection required for tail call optimization? Is it because if you allocate memory in a function which you then want to do a tail call on, there'd be no way to do the tail call and regain that memory? (So the stack would have to be saved so that, after the tail call, the memory could be reclaimed.)

Like most myths, there may be a grain of truth to this one. While GC isn't required for tail call optimization, it certainly helps in a few cases. Let's say you have something like this in C++:
int foo(int arg) {
// Base case.
vector<double> bar(10);
// Populate bar, do other stuff.
return foo(someNumber);
}
In this case, return foo(someNumber); looks like a tail call, but because you're using RAII memory management, and you have to free bar, this line would translate to a lower level as follows (in informal pseudocode):
ret = foo(someNumber);
free(bar);
return ret;
If you have GC, it is not necessary for the compiler to insert instructions to free bar. Therefore, this function can be optimized to a tail call.

Where did you hear that?
Even C compilers without any kind of garbage collector are able to optimize tail recursive calls to their iterative equivalent.

Garbage collection is not required for tail-call optimization.
Any variables allocated on the call stack will be reused in the recursive call, so there's no memory leak there.
Any local variables allocated on the heap and not freed before a tail call will leak memory whether or not tail-call optimization is used. Local variables allocated on the heap and freed before the tail call will not leak memory regardless of whether or not tail-call optimization is used.

It's true, garbage collection is not really required for tail-call optimization.
However, let's say you have 1 GB of RAM and you want to filter a 900MB list of integers to keep only the positive ones. Assume about half are positive, half are negative.
In a language with GC, you just write the function. GC will occur a bunch of times, and you'll end up with a 450 MB list. The code will look like this:
list *result = filter(make900MBlist(), funcptr);
make900MBlist will be incrementally GCd as the parts filter has gone through are no longer referenced by anything.
In a language without GC, to preserve tail-recursion, you'd have to do something like this:
list *srclist = make900MBlist();
list *result = filter(srclist, funcptr);
freelist(srclist);
This will end up having to use 900MB + 450MB before finally freeing the srclist, so the program will run out of memory and fail.
If you write your own filter_reclaim, that frees the input list as its no longer necessary:
list *result = filter_reclaim(make900MBlist(), funcptr);
It will no longer be tail-recursive, and you'll likely overflow your stack.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string