Some points about implementing a garbage collector - garbage-collection

I'm trying to implement a simple programming language. I want to make the user of it not have to manage the memory, so I decided to implement a garbage collector. The simplest way I can think of after checking out some material is like this:
There are two kinds of heap zones. The first is for storing big objects(bigger than 85,000 bytes), the other is for small objects. In the following I use BZ for the first, SZ for the second.
The BZ uses the mark and sweep algorithm, because moving a big object is expensive. I don't compact, so there will be fragmentation.
The SZ uses generations with mark-compact. There are three generations: 0, 1, and 2. Allocation requests go directly to generation 0, and when generation 0 is full, I will do garbage collection on it, the survivals will be promoted to generation 1. generation 1 and generation 2 will also do garbage collection when full.
When the virtual machine starts, it will allocate a big memory from the OS to be used as a heap zone in the virtual machine The BZ and every generation in SZ will occupy a fixed portion of memory, and when an allocation request can't be satisfied, the virtual machine will give an error OTM (out of memory). This has a problem: when the virtual machine starts, even getting the program to run on it should need only a little memory, but it still uses a lot. A better way will be for the virtual machine get a small amount of memory from the OS, and then when the program needs more memory the virtual machine will get more from the OS. I am going to allocate a larger memory for the generation 2 in SZ, and then copy all the things in generation 2 to the new memory zone. And do the same thing for the BZ.
The other problem occurs when the BZ is full and SZ is empty, I would be silly not be able to satisfy a big object allocation request even though we in fact have enough free heap size for the big object in SZ. How to deal with this problem?

I am trying to understand your methodology . Since you didnt mention your strategy completely, i am having some assumption.
NOTE : Following is my hypothetical analysis and may not be practically possible .So please skip answer if you don't have time .
Your are trying to use Generational GC with changes ; There are classifications of 2 types
(1) big size objects BZ and
(2)small size objects SZ .
SZ perform generational GC with compaction( CMS )
From above understanding we know that the SZG2 has long lived objects .I am expecting that GC in szG2 is not as frequent as SZG1 or SZG0 since long lived object generally tend to live longer so less dead collection and size of SZG2 will be more as time lapses, so GC'ing takes lot of time traversing all elements ,so doing frequent GC on SZG2 is less productive(long GC spike ,so notable delay for user) compared to that of SZG1 or SZG0 .
And similarly for BZ there might be large memory requirement (as big object occupy more space) . So in order to address you query
"The other problem occurs when the BZ is full and SZ is empty, I would be silly not be able to satisfy a big object allocation request even though we in fact have enough free heap size for the big object in SZ. How to deal with this problem?"
Since you said that " when the program needs more memory the virtual machine will get more from the OS"
I have a small idea, may not be productive or may not be possible to implement and completely dependent on your implementation of GCHeap structure .
Let your virtual machine allocate memory like follows
Coming to the possibility (i borrowed idea from "memory segments of program" as shown below) below is possible at low level .
As shown in above figure a GCHeap structure has to be defined in such a way that SZG0 and BZ expand towards each other .For implementing GCHeap structure mentioned in figure a, figure b we need to have proper convention of memory growth in zones SZG[0-2] size and BZ .
So if you want to divide your heap for application into multiple heaps then you can pile figure A over figure B to decrease fragmentation (when i say fragmentation it means "when the BZ is full and SZ is empty, I would be silly not be able to satisfy a big object allocation request even though we in fact have enough free heap size for the big object in SZ." ).
So effective structure will be
B
|
B
|
B
|
B
|
A
Now its completely depends on heuristics to decide whether to consider GCHeap data structure in multiple GCHeap structures like GCHeapA , GCHeapB or take it as single heap based on requirement .
If you don't want to have multiple heaps then you can use figure A with small correction by Setting base address of **SZG2** to top of heap
The key reason behind Figure a is as follows :
we know that SZG0 get GC'ed frequently so it can have more free space compared to SZG1 and SZG2 ,since dead objects are removed and survived object gets moved to SZG1 and SZG2 .So if allocation of BZ is more it can grow towards SZG0 .
In the figure a base address of SZG1 and SZG2 are contigious because SZG2 is more prone to out of memory error as old generation object tend to live longer and GC'ing doesnt sweep much (NOTE : it is just my assumption and opinion ) so SZG2 is kept bound outwards .

Related

what is sequential store buffer structure in gc specifically?

I have read garbage collection book, it mentioned a sort of data structure ,sequential store buffer , could anyone help me to explain how it works? or the principle? or where i can find the thesis about it ?
For generational collectors, different regions of the heap get collected at different times (minor for young gen., major for old gen.). To ensure consistency of collection a remembered set is typically used that records links from objects in the old generation to the young generation.
There are different ways of recording the remembered set, as described in the GC book you mention. A common way is the use of a card table, which is how the G1 collector does it.
An alternative is the sequential store buffer. This is an area of memory that is treated roughly like a stack, i.e. there is a pointer to where the next piece of data can be stored. Once the data is saved the pointer is bumped by the size of the data. This is very efficient (and is also the way space is allocated in the young generation). For a GC algorithm that uses a write-barrier (most) this is a good way of reducing the load created by the write-barrier. It is also very efficient on pipelined architectures with branch prediction.

How object allocated in the heap measuring time?

When I read the book 《The Garbage Collection HandBook》, the chapter 9
impile that:"object lifetimes are better measured by the number of bytes of heap space allocated between their birth and death.". I am not very understand this sentence. why lifetime can be measured by the allocated bytes? I try to google for that, but I get no answer.
Who can explain that to me? Thanks!
By measuring object lifetimes in terms of bytes allocated between instantiation and death, it is easier for the GC algorithm to adapt to program behaviour.
If the rate of object allocation is very slow, a simple time measurement would show long pauses between collections, which would appear to be good. However, if the byte allocation measurement of object lifetimes is high objects may be getting promoted to a survivor space or the old generation too quickly. By measuring the byte allocation the collector could optimise heap sizes more efficiently by expanding the young generation to increase the number of objects that become garbage before a minor collection occurs. Just using time as this measure would not make the need for the heap resizing obvious.
As the book points out, with multi-threaded applications it is hard to measure byte allocation for individual threads so collectors tend to measure lifetimes in terms of how many collections an object survives. This is a simpler number to monitor and requires less space to record.
“time” is only a scale that allows to bring an order to events. There are many possible units, even in the real world. Inside the computer, for the purpose of garbage collection, there is no real world’s time unit needed, all the garbage collector usually wants to know, is, which object is older than the other.
For this purpose, just assigning an ascending number to each allocated object would be sufficient, but this would imply maintaining an additional counter. In contrast, the number of allocated bytes comes for free. It’s important that we accumulate the allocated bytes only, never subtracting deallocated bytes, so we have an always growing number.
In a generational memory management, this number doesn’t need to be updated on every allocation, as objects are allocated continuously in a dedicated space, so their addresses represent their relative age within this memory region whereas the start of the region is associated with the last garbage collection. Only when the garbage collector runs and moves the surviving objects, it has to merge this information into an absolute age, if needed.
Implementations like the HotSpot JVM simplify this further. For surviving objects, it maintains a small counter holding the number of garbage collection cycles it survived. After having survived a configurable number of collection cycles, it gets promoted to the old generation and beyond that point, the object’s age becomes irrelevant.

How does Nodejs/V8 know how big a native object is?

Specifically, in node-opencv, opencv Matrix objects are represented as a javascript object wrapping a c++ opencv Matrix.
However, if you don't .release() them manually, the V8 engine does not seem to know how big they are, and the NodeJS memory footprint can grow far beyond any limits you try to set on the command line; i.e. it only seems to run the GC when it approaches the set memory limits, but because it does not see the objects as large, this does not happen until it's too late.
Is there something we can add to the objects which will allow V8 to see them as large objects?
Illustrating this, you can create and 'forget' large 1M buffers all day on a nodejs set to limit it's memory to 256Mbytes.
But if you do the same with 1M opencv Matrices, NodeJS will quickly use much more than the 256M limit - unless you either run GC manually, or release the Matrices manually.
(caveat: a c++ opencv matrix is a reference to memory; i.e. more than one Matrix object can point to the same data - but it would be a start to have V8 see ALL references to the same memory as being the size of that memory for the purposes of GC, safer that seeing them as all very small.)
Circumstances: on an RPi3, we have a limited memory footprint, and processing live video (using about 4M of mat objects per frame) can soon exhaust all memory.
Also, the environment I'm working in (a Node-Red node) is designed for 'public' use, so difficult to ensure that all users completely understand the need to manually .release() images; hence this question is about how to bring this large data under the GC's control.
You can inform v8 about your external memory usage with AdjustAmountOfExternalAllocatedMemory(int64_t delta). There are wrappers for this function in n-api and NAN.
By the way, "large objects" has a special meaning in v8: objects large enough to be created in large object space and never moved. External memory is off-heap memory and I think what you're referring to.

1GB Vector, will Vector.Unboxed give trouble, will Vector.Storable give trouble?

We need to store a large 1GB of contiguous bytes in memory for long periods of time (weeks to months), and are trying to choose a Vector/Array library. I had two concerns that I can't find the answer to.
Vector.Unboxed seems to store the underlying bytes on the heap, which can be moved around at will by the GC.... Periodically moving 1GB of data would be something I would like to avoid.
Vector.Storable solves this problem by storing the underlying bytes in the c heap. But everything I've read seems to indicate that this is really only to be used for communicating with other languages (primarily c). Is there some reason that I should avoid using Vector.Storable for internal Haskell usage.
I'm open to a third option if it makes sense!
My first thought was the mmap package, which allows you to "memory-map" a file into memory, using the virtual memory system to manage paging. I don't know if this is appropriate for your use case (in particular, I don't know if you're loading or computing this 1GB of data), but it may be worth looking at.
In particular, I think this prevents the GC moving the data around (since it's not on the Haskell heap, it's managed by the OS virtual memory subsystem). On the other hand, this interface handles only raw bytes; you couldn't have, say, an array of Customer objects or something.

Memory of type "root-set" reallocation Error - Erlang

I have been running a crypto-intensive application that was generating pseudo-random strings, with special structure and mathematical requirements. It has generated around 1.7 million voucher numbers per node in over the last 8 days. The generation process was CPU intensive, with very low memory requirements.
Mnesia running on OTP-14B02 was the storage database and the generation was done within each virtual machine. I had 3 nodes in the cluster with all mnesia tables disc_only_copies type. Suddenly, as activity on the Solaris boxes increased (other users logged on remotely and were starting webservers, ftp sessions, and other tasks), my bash shell started reporting afork: not enough space error.
My erlang Vms also, went down with this error below:
Crash dump was written to: erl_crash.dump
temp_alloc: Cannot reallocate 8388608 bytes of memory (of type "root_set").
Usually, we get memory allocation errors and not memory re-location errors and normally memory of type "heap" is the problem. This time, the memory type reported is type "root-set".
Qn 1. What is this "root-set" memory?
Qn 2. Has it got to do with CPU intensive activities ? (why am asking this is that when i start the task, the Machine reponse to say mouse or Keyboard interrupts is too slow meaning either CPU is too busy or its some other problem i cannot explain for now)
Qn 3. Can such an error be avoided? and how ?
The fork: not enough space message suggests this is a problem with the operating system setup, but:
Q1 - The Root Set
The Root Set is what the garbage collector uses as a starting point when it searches for data that is live in the heap. It usually starts off from the registers of the VM and off from the stack, if the stack has references to heap data that still needs to be live. There may be other roots in Erlang I am not aware of, but these are the basic stuff you start off from.
That it is a reallocation error of exactly 8 Megabyte space could mean one of two things. Either you don't have 8 Megabyte free in the heap, or that the heap is fragmented beyond recognition, so while there are 8 megabytes in it, there are no contiguous such space.
Q2 - CPU activity impact
The problem has nothing to do with the CPU per se. You are running out of memory. A large root set could indicate that you have some very deep recursions going on where you keep around a lot of pointers to data. You may be able to rewrite the code such that it is tail-calling and uses less memory while operating.
You should be more worried about the slow response times from the keyboard and mouse. That could indicate something is not right. Does a vmstat 1, a sysstat, a htop, a dstat or similar show anything odd while the process is running? You are also on the hunt to figure out if the kernel or the C libc is doing something odd here due to memory being constrained.
Q3 - How to fix
I don't know how to fix it without knowing more about what the application is doing. Since you have a crash dump, your first instinct should be to take the crash dump viewer and look at the dump. The goal is to find a process using a lot of memory, or one that has a deep stack. From there on, you can seek to limit the amount of memory that process is using. either by rewriting the code so it can give memory up earlier, by tuning the garbage collection setup for the process (see the spawn options in the erlang man pages for this), or by adding more memory to the system.

Resources