ANTLR4 memory cleanup - memory-leaks

Is it possible to desallocate/reset/compress the memory allocated by ANTLR?
I already use ParserATNSimulator.clearDFA(), but some huge objects are still allocated (e.g., ArrayPredictionContext, SingletonPredictionContext).
My program alternates parsing and computation phases. Between two parsing phases, I would like to reduce the memory footprint of ANTLR.

If I replace the cache of the parser and lexer, there is no more path between these data structures and some static fields. Therefore the garbage collector can collect them.
lexer.setInterpreter(new LexerATNSimulator(lexer, lexer.getATN(), lexer.getInterpreter().decisionToDFA, new PredictionContextCache()));
parser.setInterpreter(new ParserATNSimulator(parser, parser.getATN(), parser.getInterpreter().decisionToDFA, new PredictionContextCache()));

Interesting. The answer is no I would say at this time. I will add an issue at https://github.com/antlr/antlr4.

Related

What memory leaks can occur outside the view of GHC's heap profiler

I have a program that exhibits the behavior of a memory leak. It gradually takes up all of the systems memory until it fills all swap space and then the operating system kills it. This happens once every several days.
I have extensively profiled the heap in a manner of ways (-hy, -hm, -hc) and tried limiting heap size (-M128M) tweaked the number of generations (-G1) but no matter what I do the heap size appears constant-ish and low always (measured in kB not MB or GB). Yet when I observe the program in htop, its resident memory steadily climbs.
What this indicates to me is that the memory leak is coming from somewhere besides the GHC heap. My program makes use of dependencies, specifically Haskell's yaml library which wraps the C library libyaml, it is possible that the leak is in the number of foreign pointers it has to objects allocated by libyaml.
My question is threefold:
What places besides the GHC heap can memory leak from in a Haskell program?
What tools can I use to track these down?
What changes to my source code need to be made to avoid these types of leaks, as they seem to differ from the more commonly experienced space leaks in Haskell?
This certainly sounds like foreign pointers aren't being finalized properly. There are several possible reasons for this:
The underlying C library doesn't free memory properly.
The Haskell library doesn't set up finalization properly.
The ForeignPtr objects aren't being freed.
I think there's actually a decent chance that it's option 3. If the RTS consistently finds enough memory in the first GC generation, then it just won't bother running a major collection. Fortunately, this is the easiest to diagnose. Just have your program run System.Memory.performGC every so often. If that fixes it, you've found the bug and can tweak just how often you want to do that.
Another possible issue is that you could have foreign pointers lying around in long-lived thunks or other closures. Make sure you don't.
One particularly strong possibility when working with a wrapped C library is that the wrapper functions will return ByteStrings whose underlying arrays were allocated by C code. So any ByteStrings you get back from yaml could potentially be off-heap.

what is sequential store buffer structure in gc specifically?

I have read garbage collection book, it mentioned a sort of data structure ,sequential store buffer , could anyone help me to explain how it works? or the principle? or where i can find the thesis about it ?
For generational collectors, different regions of the heap get collected at different times (minor for young gen., major for old gen.). To ensure consistency of collection a remembered set is typically used that records links from objects in the old generation to the young generation.
There are different ways of recording the remembered set, as described in the GC book you mention. A common way is the use of a card table, which is how the G1 collector does it.
An alternative is the sequential store buffer. This is an area of memory that is treated roughly like a stack, i.e. there is a pointer to where the next piece of data can be stored. Once the data is saved the pointer is bumped by the size of the data. This is very efficient (and is also the way space is allocated in the young generation). For a GC algorithm that uses a write-barrier (most) this is a good way of reducing the load created by the write-barrier. It is also very efficient on pipelined architectures with branch prediction.

1GB Vector, will Vector.Unboxed give trouble, will Vector.Storable give trouble?

We need to store a large 1GB of contiguous bytes in memory for long periods of time (weeks to months), and are trying to choose a Vector/Array library. I had two concerns that I can't find the answer to.
Vector.Unboxed seems to store the underlying bytes on the heap, which can be moved around at will by the GC.... Periodically moving 1GB of data would be something I would like to avoid.
Vector.Storable solves this problem by storing the underlying bytes in the c heap. But everything I've read seems to indicate that this is really only to be used for communicating with other languages (primarily c). Is there some reason that I should avoid using Vector.Storable for internal Haskell usage.
I'm open to a third option if it makes sense!
My first thought was the mmap package, which allows you to "memory-map" a file into memory, using the virtual memory system to manage paging. I don't know if this is appropriate for your use case (in particular, I don't know if you're loading or computing this 1GB of data), but it may be worth looking at.
In particular, I think this prevents the GC moving the data around (since it's not on the Haskell heap, it's managed by the OS virtual memory subsystem). On the other hand, this interface handles only raw bytes; you couldn't have, say, an array of Customer objects or something.

Stop and copy garbage collector in two phases

When implementing a stop and copy garbage collector as a pair, I need two memory banks (the old one and a free new one). One memory bank consists of the-cars and the-cdrs. So basicly when I make a new addres, it is a pointer to the-cars and the-cdrs.
When allocating new memory and I see that I don't have enough space, I start a GC. What this one does is:
switch the memory banks
move: read car and cdr from the old bank, copy to the new bank and put a pointer in the old bank to the new for later.
scan: loops over the memory and copies everything from old to new.
Now the question is: Why do I need to scan first and move after. Why can't I do both together?
It sounds like you are going through the really awesome garbage collection assignment where you implement your own collectors (mark and sweep, stop and copy, generational).
General answer to your question: two-pass algorithms typically use less memory than one-pass algorithms, by trading time for space.
More specific answer: in a stop-and-copy collector, you do it in two passes by (1) first copying the live data over to the new semispace, and (2) adjusting internal references in the live data to refer to elements in the new semispace, mapping old memory to new memory.
You must realize that the information necessary to do the mapping isn't magically available: you need memory to keep track how to redirect the moved memory. And remember: your collector itself is a program, and it must use a bounded, small amount of memory! Keeping a hash table in your collector to do the bookkeeping, for example, would be verboten: it's not playing by the rules. So one thing you need to keep track of is making sure the collector is playing with a limited amount of memory. So that explains why a stop-and-copy collector will reuse the old semispace and write those redirect records there.
With that constraint in mind: it's important to realize that we need to be careful of how we're traversing the live set. Which approach we choose may or may not require additional memory, in some very subtle and surprising ways. In particular, recursion in the general case is not free! Technically, in the first pass we should be using the new semispace not only as the target of our copying, but as a funky representation of the control stack that we use to implement the recursive process that walks through the live data.
Concretely, if we're doing a one-pass approach like this to copy the live set:
;; copy-live-set: number -> void
;; copies the live set starting from memory-location.
Pseudocode:
to copy-live-set starting at memory-location:
copy the block at memory-location over to the new semispace, and
record a redirection record in the old semispace
for each internal-reference in the block:
recursively call copy-live-set at the internal-reference if
it hasn't been copied already
remap the internal-reference to that new memory location
then you may be surprised to know that we've messed up with memory. The above will break the promise that the collector is making to the runtime environment because the recursion here is not iterative! It will consume control stack space. During the live set traversal, it will consume control stack space proportional to the depth of the structures we're walking across. Ooops.
If you try an alternative approach for walking through the live set, you should eventually see that there's a good way to traverse the whole live set while still guaranteeing bounded, small control stack usage. Hint: consider how graph traversal algorithms can be written as a simple while loop, with an explicit container that holds what to visit next till we exhaust the container. If you squint just right, the intermediate values in the new semispace look awfully like that container.
Once you discover how to traverse the live set in constant control stack space, you'll see that you'll need those two passes to do the complete copy-and-rewrite-internal-references thing. Worrying about these details is messy, but it's important in seeing how garbage collectors actually work. A real collector needs to do something like this, to be concerned about control stack usage, to ensure it uses bounded memory during the collection.
Summary: a two-pass algorithm is a solution that helps us with memory at the cost of some time. But we don't pay much in terms of performance: though we pass through the live set twice, the process is still linear in the size of the live set.
History: see Cheney's Algorithm, and note the title of the seminal paper's emphasis: "A Nonrecursive List Compacting Algorithm". That single highlighted word "Nonrecursive" is the key to what motivates the two-pass approach: it's trying to avoid consuming the control stack. Cheney's paper is an extension of the paper by Fenichel and Yochelson "A LISP Garbage-Collector for Virtual-Memory Computer Systems", in which the authors there proposed basically the recursive, stack-using approach first. To improve the situation, Fenichel and Yochelson then proposed using the non-recursive (but complicated!) Schorr-Waite DFS algorithm to do the traversal. Cheney's approach is an improvement because the traversal is simpler.

Does the GHC garbage collector have any special optimisations for large objects?

Does the GHC garbage collector handle "large" objects specially? Or does it treat them exactly the same as any other object?
Some GC engines put large objects in a separate area, which gets scanned less regularly and possibly has a different collection algorithm (e.g., compacting instead of copying, or maybe even using freelists rather than attempting to defragment). Does GHC do anything like this?
Yes. The GHC heap is not kept in one contiguous stretch of memory; rather, it is organized into blocks.
When an allocated object’s size is above a specific threshold (block_size*8/10, where block_size is 4k, so roughly 3.2k), the block holding the object is marked as large (BF_LARGE). Now, when garbage collection occurs, rather than copy large objects from this block to a new one, the block itself is added to the new generation's set of blocks; this involves fiddling with a linked list (a large object list, to be precise).
Since this means that it may take a while for us to reclaim dead space inside a large block, it does mean that large objects can suffer from fragmentation, as seen in bug 7831. However, this doesn't usually occur until individual allocations hit half of the megablock size, 1M.

Resources