JRuby - How to start the garbage collector? - garbage-collection

I fired up my JRuby irb console and typed:
irb(main):037:0* GC.enable
(irb):37 warning: GC.enable does nothing on JRuby
=> true
irb(main):038:0> GC.start
=> nil
irb(main):039:0>
How can I manually enable or start the JVM garbage during a program?
I ask because I have a program which is needs to generate about 500 MBytes of test data and save it in MySQL. The program uses about 5 levels of nested loops, and it crashes with a JVM memory heap exception after generating about 100 MBytes of test data because there is no more heap memory. I would like to give let the garbage collector run after every run of the outer loop so that all the orphaned objects created in the inner loops can be cleaned up .

The exact answer to your question would be:
require 'java'
java_import 'java.lang.System'
# ...
System.gc()
though, bearing in mind even though the JVM usually does run the GC, it may or may not do it – very dependent on the JVM implementation. It can also be quite a hit on performance.
A better answer is obviously to ensure that at the end of the nested loop, no reference is held on the test data you are generating, so that they can indeed be reclaimed by the GC later on. Example:
class Foo; end
sleep(5)
ary = []
100_000.times { 100_000.times{ ary << Foo.new }; puts 'Done'; ary = [] }
If you run this with jruby -J-verbose:gc foo.rb, you should see the GC regularly claiming the objects; this is also quite clear using JVisualVM (the sleep in the example is to give some time to connect to the Jruby process in JVisualVM).
Lastly you can increase heap memory by adding the following flag: -J-Xmx256m; see the JRuby wiki for more details.
Edit: Coincidentally, here is a mindmap on GC tuning recently presented by Mario Camou at Madrid DevOps re-posted by Nick Sieger.

It's not possible because Gc will be run automatically by JVM. Make sure that you're creating objects only when it's required. Avoid creating class level objects and try to find out which of the objects is taking more memory and create it only when it's required.

Related

How can I know who call System.gc() in spark streaming program?

The GC time is too long in my spark streaming programme. In the GC log, I found that Someone called System.gc() in the programme. I do not call System.gc() in my code. So the caller should be the api I used.
I add -XX:-DisableExplicitGC to JVM and fix this problem. However, I want to know who call the System.gc().
I tried some methods.
Use jstack. But the GC is not so frequent, it is difficult to dump the thread that call the method.
I add trigger that add thread dump when invoke method java.lang.System.gc() in JProfiler. But it doesn't seem to work.
How can I know who call System.gc() in spark streaming program?
You will not catch System.gc with jstack, because during stop-the-world pauses JVM does not accept connections from Dynamic Attach tools, including jstack, jmap, jcmd and similar.
It's possible to trace System.gc callers with async-profiler:
Start profiling beforehand:
$ profiler.sh start -e java.lang.System.gc <pid>
After one or more System.gc happens, stop profiling and print the stack traces:
$ profiler.sh stop -o traces <pid>
Example output:
--- Execution profile ---
Total samples : 6
Frame buffer usage : 0.0007%
--- 4 calls (66.67%), 4 samples
[ 0] java.lang.System.gc
[ 1] java.nio.Bits.reserveMemory
[ 2] java.nio.DirectByteBuffer.<init>
[ 3] java.nio.ByteBuffer.allocateDirect
[ 4] Allocate.main
--- 2 calls (33.33%), 2 samples
[ 0] java.lang.System.gc
[ 1] sun.misc.GC$Daemon.run
In the above example, System.gc is called 6 times from two places. Both are typical situations when JDK internally forces Garbage Collection.
The first one is from java.nio.Bits.reserveMemory. When there is not enough free memory to allocate a new direct ByteBuffer (because of -XX:MaxDirectMemorySize limit), JDK forces full GC to reclaim unreachable direct ByteBuffers.
The second one is from GC Daemon thread. This is called periodically by Java RMI runtime. For example, if you use JMX remote, periodic GC is automatically enabled once per hour. This can be tuned with -Dsun.rmi.dgc.client.gcInterval system property.

How garbage collection takes place in Python?

How garbage collection takes place in Python. Some people say that it happens automatically. But what is the proper process of it ?
Introduction to Python memory management
Python's memory allocation and deallocation method is automatic. The user does not have to preallocate or deallocate memory by hand as one has to when using dynamic memory allocation in languages such as C or C++. Python uses two strategies for memory allocation reference counting and garbage collection.
Prior to Python version 2.0, the Python interpreter only used reference counting for memory management. Reference counting works by counting the number of times an object is referenced by other objects in the system. When references to an object are removed, the reference count for an object is decremented. When the reference count becomes zero the object is deallocated.
Reference counting is extremely efficient but it does have some caveats. One such caveat is that it cannot handle reference cycles. A reference cycle is when there is no way to reach an object but its reference count is still greater than zero. The easiest way to create a reference cycle is to create an object which refers to itself as in the example below:
def make_cycle():
1 = [ ]
1.append(l)
make_cycle()
Because make_cycle() creates an object 1 which refers to itself, the object 1 will not automatically be freed when the function returns. This will cause the memory that 1 is using to be held onto until the Python garbage collector is invoked.
Automatic garbage collection of cycles
Because reference cycles are take computational work to discover, garbage collection must be a scheduled activity. Python schedules garbage collection based upon a threshold of object allocations and object deallocations. When the number of allocations minus the number of deallocations are greater than the threshold number, the garbage collector is run. One can inspect the threshold for new objects (objects in Python known as generation 0 objects) by loading the gc module and asking for garbage collection thresholds:
import gc
print "Garbage collection thresholds: %r" % gc.get_threshold()
Garbage collection thresholds: (700, 10, 10)
Here we can see that the default threshold on the above system is 700. This means when the number of allocations vs. the number of deallocations is greater than 700 the automatic garbage collector will run.
Automatic garbage collection will not run if your Python device is running out of memory; instead your application will throw exceptions, which must be handled or your application crashes. This is aggravated by the fact that the automatic garbage collection places high weight upon the NUMBER of free objects, not on how large they are. Thus any portion of your code which frees up large blocks of memory is a good candidate for running manual garbage collection.
Manual garbage collection
For some programs, especially long running server applications or embedded applications running on a Digi Device automatic garbage collection may not be sufficient. Although an application should be written to be as free of reference cycles as possible, it is a good idea to have a strategy for how to deal with them. Invoking the garbage collector manually during opportune times of program execution can be a good idea on how to handle memory being consumed by reference cycles.
The garbage collection can be invoked manually in the following way:
import gc
gc.collect()
gc.collect() returns the number of objects it has collected and deallocated. You can print this information in the following way:
import gc
collected = gc.collect()
print "Garbage collector: collected %d objects." % (collected)
If we create a few cycles, we can see manual collection work:
import sys, gc
def make_cycle():
1 = { }
1[0] = 1
def main():
collected = gc.collect()
print "Garbage collector: collected %d objects." % (collected)
print "Creating cycles..."
for i in range(10):
make_cycle()
collected = gc.collect()
print "Garbage collector: collected %d objects." % (collected)
if __name__ == "__main__":
ret = main()
sys.exit(ret)
In general there are two recommended strategies for performing manual garbage collection: time-based and event-based garbage collection. Time-based garbage collection is simple: the garbage collector is called on a fixed time interval. Event-based garbage collection calls the garbage collector on an event. For example, when a user disconnects from the application or when the application is known to enter an idle state.
Reference:
Python garbage collection
Understand the internals of python garbage collection and how Instagram engineers achieved a performance improvement of 10% by tweaking it.
do-you-know-how-python-cleanses-itself

When does the garbage collector run when calling Haskell exports from C?

When exporting a Haskell function to be called from C, when does Haskell's garbage get collected? If C owns main then there is no way to predict the next call in to Haskell. This question is especially pertinent when running single-threaded Haskell or without parallel GC.
When you initialize the ghc runtime, you can pass rts flags to it via argc and argv like so:
RtsConfig conf = defaultRtsConfig;
conf.rts_opts_enabled = RtsOptsAll;
hs_init_ghc(&argc, &argv, conf);
This lets you set options to, for example fix a smaller maximum heap size or use a compaction algorithm on the nursery to further reduce allocation. Further, note there is an idle GC whose interval can be set (or disabled), and if you link the threaded runtime, that should run whether or not you ever yield back to a Haskell call.
Edit: I haven't actually performed experimentation to verify the following, but if we look at the source of hs_init_ghc we see that it initializes signal handlers, which should include the timer handlers that respond on SIGVTALRM and indeed it also starts the time, which calls (on POSIX) timer_create that should throw those signals on regular intervals. In turn, this periodically should "wake up" the RTS whether or not anything is happening, which in turn should mean that it will run idle GC whether or not the system yields back to Haskell from C. But again, I have only read the code and commentary, not tested this myself.

D programming without the garbage collector

I've been looking at D today and on the surface it looks quite amazing. I like how it includes many higher level constructs directly in the language so silly hacks or terse methods don't have to be used. One thing that really worries me if the GC. I know this is a big issues and have read many discussions about it.
My own simple tests sprouted from a question here shows that the GC is extremely slow. Over 10 times slower than straight C++ doing the same thing. (obviously the test does not directly convert into real world but the performance hit is extreme and would slow down real world happens that behave similarly(allocating many small objects quickly)
I'm looking into writing a real time low latency audio application and it is possible that the GC will ruin the performance of the application to make it nearly useless. In a sense, if it has any issues it will ruin the real time audio aspect which is much more crucial since, unlike graphics, audio runs at a much higher frame rate(44000+ vs 30-60). (due to it's low latency it is more crucial than a standard audio player which can buffer significant amounts of data)
Disabling the GC improved the results to within about 20% of the C++ code. This is significant. I'll give the code at the end for analysis.
My questions are:
How difficult is it to replace D's GC with a standard smart pointers implementation so that libraries that rely on the GC can still be used. If I remove GC completely I'll lose a lot of grunt work, as D already has limit libraries compared to C++.
Does GC.Disable only halt the garbage collection temporarily(preventing the GC thread from running) and GC.Enable pick back up where it left off. So I could potentially disable the GC from running in high cpu usage moments to prevent latency issues.
Is there any way to enforce a pattern to not use GC consistently. (this is because I've not programming in D and when I start writing my glasses that do not use the GC I would like to be sure I don't forget to implement their own clean up.
Is it possible to replace the GC in D easily? (not that I want to but it might be fun to play around with different methods of GC one day... this is similar to 1 I suppose)
What I'd like to do is trade memory for speed. I do not need the GC to run every few seconds. In fact, if I can properly implement my own memory management for my data structures then chances are it will not need to run very often at all. I might need to run it only when memory becomes scarce. From what I've read, though, the longer you wait to call it the slower it will be. Since there generally will be times in my application where I can get away with calling it without issues this will help alleviate some of the pressure(but then again, there might be hours when I won't be able to call it).
I am not worried about memory constraints as much. I'd prefer to "waste" memory over speed(up to a point, of course). First and foremost is the latency issues.
From what I've read, I can, at the very least, go the route of C/C++ as long as I don't use any libraries or language constructs that rely on the GC. The problem is, I do not know the ones that do. I've seen string, new, etc mentioned but does that mean I can't use the build in strings if I don't enable the GC?
I've read in some bug reports that the GC might be really buggy and that could explain its performance problems?
Also, D uses a bit more memory, in fact, D runs out of memory before the C++ program. I guess it is about 15% more or so in this case. I suppose that is for the GC.
I realize the following code is not representative of your average program but what it says is that when programs are instantiating a lot of objects(say, at startup) they will be much slower(10 times is a large factor). Of the GC could be "paused" at startup then it wouldn't necessarily be an issue.
What would really be nice is if I could somehow have the compiler automatically GC a local object if I do not specifically deallocate it. This almost give the best of both worlds.
e.g.,
{
Foo f = new Foo();
....
dispose f; // Causes f to be disposed of immediately and treats f outside the GC
// If left out then f is passed to the GC.
// I suppose this might actually end up creating two kinds of Foo
// behind the scenes.
Foo g = new manualGC!Foo(); // Maybe something like this will keep GC's hands off
// g and allow it to be manually disposed of.
}
In fact, it might be nice to actually be able to associate different types of GC's with different types of data with each GC being completely self contained. This way I could tailor the performance of the GC to my types.
Code:
module main;
import std.stdio, std.conv, core.memory;
import core.stdc.time;
class Foo{
int x;
this(int _x){x=_x;}
}
void main(string args[])
{
clock_t start, end;
double cpu_time_used;
//GC.disable();
start = clock();
//int n = to!int(args[1]);
int n = 10000000;
Foo[] m = new Foo[n];
foreach(i; 0..n)
//for(int i = 0; i<n; i++)
{
m[i] = new Foo(i);
}
end = clock();
cpu_time_used = (end - start);
cpu_time_used = cpu_time_used / 1000.0;
writeln(cpu_time_used);
getchar();
}
C++ code
#include <cstdlib>
#include <iostream>
#include <time.h>
#include <math.h>
#include <stdio.h>
using namespace std;
class Foo{
public:
int x;
Foo(int _x);
};
Foo::Foo(int _x){
x = _x;
}
int main(int argc, char** argv) {
int n = 120000000;
clock_t start, end;
double cpu_time_used;
start = clock();
Foo** gx = new Foo*[n];
for(int i=0;i<n;i++){
gx[i] = new Foo(i);
}
end = clock();
cpu_time_used = (end - start);
cpu_time_used = cpu_time_used / 1000.0;
cout << cpu_time_used;
std::cin.get();
return 0;
}
D can use pretty much any C library, just define the functions needed. D can also use C++ libraries, but D does not understand certain C++ constructs. So... D can use almost as many libraries as C++. They just aren't native D libs.
From D's Library reference.
Core.memory:
static nothrow void disable();
Disables automatic garbage collections performed to minimize the process footprint. Collections may continue to occur in instances where the implementation deems necessary for correct program behavior, such as during an out of memory condition. This function is reentrant, but enable must be called once for each call to disable.
static pure nothrow void free(void* p);
Deallocates the memory referenced by p. If p is null, no action occurs. If p references memory not originally allocated by this garbage collector, or if it points to the interior of a memory block, no action will be taken. The block will not be finalized regardless of whether the FINALIZE attribute is set. If finalization is desired, use delete instead.
static pure nothrow void* malloc(size_t sz, uint ba = 0);
Requests an aligned block of managed memory from the garbage collector. This memory may be deleted at will with a call to free, or it may be discarded and cleaned up automatically during a collection run. If allocation fails, this function will call onOutOfMemory which is expected to throw an OutOfMemoryError.
So yes. Read more here: http://dlang.org/garbage.html
And here: http://dlang.org/memory.html
If you really need classes, look at this: http://dlang.org/memory.html#newdelete
delete has been deprecated, but I believe you can still free() it.
Don't use classes, use structs. Structs are stack allocated, classes are heap. Unless you need polymorphism or other things classes support, they are overhead for what you are doing. You can use malloc and free if you want to.
More or less... fill out the function definitions here: https://github.com/D-Programming-Language/druntime/blob/master/src/gcstub/gc.d . There's a GC proxy system set up to allow you to customize the GC. So it's not like it is something that the designers do not want you to do.
Little GC knowledge here:
The garbage collector is not guaranteed to run the destructor for all unreferenced objects. Furthermore, the order in which the garbage collector calls destructors for unreference objects is not specified. This means that when the garbage collector calls a destructor for an object of a class that has members that are references to garbage collected objects, those references may no longer be valid. This means that destructors cannot reference sub objects. This rule does not apply to auto objects or objects deleted with the DeleteExpression, as the destructor is not being run by the garbage collector, meaning all references are valid.
import std.c.stdlib; that should have malloc and free.
import core.memory; this has GC.malloc, GC.free, GC.addroots, //add external memory to GC...
strings require the GC because they are dynamic arrays of immutable chars. ( immutable(char)[] ) Dynamic arrays require GC, static do not.
If you want manual management, go ahead.
import std.c.stdlib;
import core.memory;
char* one = cast(char*) GC.malloc(char.sizeof * 8);.
GC.free(one);//pardon me, I'm not used to manual memory management.
//I am *asking* you to edit this to fix it, if it needs it.
why create a wrapper class for an int? you are doing nothing more than slowing things down and wasting memory.
class Foo { int n; this(int _n){ n = _n; } }
writeln(Foo.sizeof); //it's 8 bytes, btw
writeln(int.sizeof); //Its *half* the size of Foo; 4 bytes.
Foo[] m;// = new Foo[n]; //8 sec
m.length=n; //7 sec minor optimization. at least on my machine.
foreach(i; 0..n)
m[i] = new Foo(i);
int[] m;
m.length=n; //nice formatting. and default initialized to 0
//Ooops! forgot this...
foreach(i; 0..n)
m[i] = i;//.145 sec
If you really need to, then write the Time-sensitive function in C, and call it from D.
Heck, if time is really that big of a deal, use D's inline assembly to optimize everything.
I suggest you read this article: http://3d.benjamin-thaut.de/?p=20
There you will find a version of the standard library that does own memory management and completely avoids garbage collection.
D's GC simply isn't as sophisticated as others like Java's. It's open-source so anyone can try to improve it.
There is an experimental concurrent GC named CDGC and there is a current GSoC project to remove the global lock: http://www.google-melange.com/gsoc/project/google/gsoc2012/avtuunainen/17001
Make sure to use LDC or GDC for compilation to get better optimized code.
The XomB project also uses a custom runtime but it's D version 1 I think.
http://wiki.xomb.org/index.php?title=Main_Page
You can also just allocate all memory blocks you need then use a memory pool to get blocks without the GC.
And by the way, it’s not as slow as you mentionned. And GC.disable() doesn’t really disable it.
We might look at the problem from a bit different view. Suboptimal performance of allocating many little objects, which you mention as a rationale for the question, has little to do with GC alone. Rather, it's a matter of balance between general-purpose (but suboptimal) and highly-performant (but task-specialised) memory management tools. The idea is: presence of GC doesn't prevent you from writing a real-time app, you just have to use more specific tools (say, object pools) for special cases.
Since this hasn't been closed yet, recent versions of D have the std.container library which contains an Array data structure that is significantly more efficient with respect to memory than the built-in arrays. I can't confirm that the other data structures in the library are also efficient, but it may be worth looking into if you need to be more memory conscious without having to resort to manually creating data structures that don't require garbage collection.
D is constantly evolving. Most of the answers here are 9+ years old, so I figured I'd answer these questions again for anyone curious what the current situation is.
(...) replace D's GC with a standard smart pointers implementation so that libraries that rely on the GC can still be used. (...)
Replacing the GC itself with smart pointers is not something I've looked into (i.e. where new creates a smart pointer). There are several D libraries that add smart pointers. You can interface with any C library. Interfacing with C++ and even Objective-C is also supported to some degree, so that should cover you pretty well.
Does GC.disable only halt the garbage collection temporarily (preventing the GC thread from running) and GC.enable pick back up where it left off. (...)
"Collections may continue to occur in instances where the implementation deems necessary for correct program behaviour, such as during an out of memory condition."
[source]
So mostly, yes. You can also manually invoke collection during down-time.
Is there any way to enforce a pattern to not use GC consistently. (...) when I start writing my classes that do not use the GC I would like to (...)
Classes are always allocated on the GC and are reference types. Structs should be used instead. However, keep in mind that structs are value types, so by default they're copied when being moved. You can #disable the copy constructor if you don't like this behaviour, but then your struct won't be POD.
What you're probably looking for is #nogc, which is a function attribute that stops a function from using the GC. You can't mark a struct type as #nogc, but you can mark each of its methods as #nogc. Just keep in mind that #nogc code can't call GC code. There's also nothrow.
If you intend to never use GC, you ought to look into Better C. It's a D language setting that removes all of D's runtime, standard library (Phobos), GC and all GC-reliant features (namely associative arrays and exceptions) in favour of using C's runtime and the C Standard Library.
Is it possible to replace the GC in D (...)
Yes it is: https://dlang.org/spec/garbage.html#gc_registry
And you can configure the pre-existing GC to better suit your needs if you don't want to make your own GC.

examples of garbage collection bottlenecks

I remembered someone telling me one good one. But i cannot remember it. I spent the last 20mins with google trying to learn more.
What are examples of bad/not great code that causes a performance hit due to garbage collection ?
from an old sun tech tip -- sometimes it helps to explicitly nullify references in order to make them eligible for garbage collection earlier:
public class Stack {
private static final int MAXLEN = 10;
private Object stk[] = new Object[MAXLEN];
private int stkp = -1;
public void push(Object p) {stk[++stkp] = p;}
public Object pop() {return stk[stkp--];}
}
rewriting the pop method in this way helps ensure that garbage collection gets done in a timely fashion:
public Object pop() {
Object p = stk[stkp];
stk[stkp--] = null;
return p;
}
What are examples of bad/not great code that causes a performance hit due to garbage collection ?
The following will be inefficient when using a generational garbage collector:
Mutating references in the heap because write barriers are significantly more expensive than pointer writes. Consider replacing heap allocation and references with an array of value types and an integer index into the array, respectively.
Creating long-lived temporaries. When they survive the nursery generation they must be marked, copied and all pointers to them updated. If it is possible to coalesce updates in order to reuse of an old version of a collection, do so.
Complicated heap topologies. Again, consider replacing many references with indices.
Deep thread stacks. Try to keep stacks shallow to make it easier for the GC to collate the global roots.
However, I would not call these "bad" because there is nothing objectively wrong with them. They are only inefficient when used with this kind of garbage collector. With manual memory management, none of the issues arise (although many are replaced with equivalent issues, e.g. performance of malloc vs pool allocators). With other kinds of GC some of these issues disappear, e.g. some GCs don't have a write barrier, mark-region GCs should handle long-lived temporaries better, not all VMs need thread stacks.
When you have some loop involving the creation of new object's instances: if the number of cycles is very high you procuce a lot of trash causing the Garbage Collector to run more frequently and so decreasing performance.
One example would be object references that are kept in member variables oder static variables. Here is an example:
class Something {
static HugeInstance instance = new HugeInstance();
}
The problem is the garbage collector has no way of knowing, when this instance is not needed anymore. So its usually better to keep things in local variables and have small functions.
String foo = new String("a" + "b" + "c");
I understand Java is better about this now, but in the early days that would involve the creation and destruction of 3 or 4 string objects.
I can give you an example that will work with the .Net CLR GC:
If you override a finalize method from a class and do not call the super class Finalize method such as
protected override void Finalize(){
Console.WriteLine("Im done");
//base.Finalize(); => you should call him!!!!!
}
When you resurrect an object by accident
protected override void Finalize(){
Application.ObjJolder = this;
}
class Application{
static public object ObjHolder;
}
When you use an object that uses Finalize it takes two GC collections to get rid of the data, and in any of the above codes you won't delete it.
frequent memory allocations
lack of memory reusing (when dealing with large memory chunks)
keeping objects longer than needed (keeping references on obsolete objects)
In most modern collectors, any use of finalization will slow the collector down. And not just for the objects that have finalizers.
Your custom service does not have a load limiter on it, so:
A lot requests come in for some reason at the same time (everyone logs on in the morning say)
The service takes longer to process each requests as it now has 100s of threads (1 per request)
Yet more part processed requests builds up due to the longer processing time.
Each part processed request has created lots of objects that live until the end of processing that request.
The garbage collector spends lots of time trying to free memory it, however it can’t due to the above.
Yet more part processed requests builds up due to the longer processing time…. (including time in GC)
I have encountered a nice example while doing some parallel cell based simulation in Python. Cells are initialized and sent to worker processes after pickling for running. If you have too many cells at any one time the master node runs out of ram. The trick is to make a limited number of cells pack them and send them off to cluster nodes before making some more, remember to set the objects already sent off to "None". This allows you to perform large simulations using the total RAM of the cluster in addition to the computing power.
The application here was cell based fire simulation, only the cells actively burning were kept as objects at any one time.

Resources