Java - ThreadFactory memory leak? - multithreading

This code:
while (true) {
new ThreadFactory() {
#Override
public Thread newThread(Runnable r) {
return null;
}
};
}
make the JVM go out of memory very quickly.
Why?

I have tried to run this code with JRE 1.7.0_60 x86_64 on Windows 7 with default options and here are the results:
Author's code being run as is doesn't seem to perform any allocation at all, most likely because JIT detects unused references;
Modified version of code that outputs created threadFactorys to System.out results in "saw-like" heap usage pattern:
Which means that both allocation and garbage collection takes place.
Back to your question: I think you missed some significant parts of the code, or put -Xmx to extremely low value, or some other reason. The code you posted is ok, though.

Related

C++/CLI marshal_context native strings corrupted

The following is C++/CLI code gets compiled into a DLL, and is called by a C# application:
void Foo(String^ strManaged) {
marshal_context^ context = gcnew marshal_context();
FooUnmanaged(context->marshal_as<const char*>(strManaged));
}
FooUnmanaged() reads the const char*, runs some processing which takes about a second, and then reads the const char* again, for example:
void FooUnmanaged(const char* str) {
// 1
Log(str);
// Process things unrelated to 'str'
// ...
// 2
Log(str);
}
On occasions, the contents of str changes between the first and the second read inside FooUnmanaged(), as if that memory had been reused for some other purposes. This happens regardless of the processing done in FooUnmanaged(), as long as it takes a noticeable amount of time (I guess, long enough that the GC has a chance to trigger).
This does not happen if Foo is written either this way
void Foo(String^ strManaged) {
marshal_context^ context = gcnew marshal_context();
FooUnmanaged(context->marshal_as<const char*>(strManaged));
delete context; // addded
}
or that way
void Foo(String^ strManaged) {
marshal_context context; // created on the stack
FooUnmanaged(context.marshal_as<const char*>(strManaged));
}
Is the original code incorrect? Why does it not correctly reserve the memory of the const char* for the lifetime of context? Or can the lifetime of context be shorter than I think it is (the scope of Foo())?
Answer by #HansPassant:
Yes, that's a lifetime issue. .NET uses an aggressive collector, it has no idea that the native code relies on the context. The first snippet requires GC::KeepAlive(context); at the end. The last snippet is how it was meant to be used, stack semantics emulates RAII, the auto-generated Dispose() call keeps it alive in similar fashion. And avoids the temporary memory leak. If FooUnmanaged() stores the passed pointer then you can't use marshal_context.
This is confirmed by this article:
The lifetime of [local variables] can depend on the way the program was built. In debug builds, a local variable lasts for as long as the method is on the stack. In release builds, the JIT is able to look at the program structure to work out the last point within the execution that a variable can be used by the method and will discard it when it is no longer required.

Strange memory usage when using CacheEntryProcessor and modifying cache entry

I wonder if anybody can explain what is going wrong with my code.
I have an IgniteCache of Long->Object[] which is a kind of batching mechanism.
The cache is onheap,partitioned and has one backup configured.
I want to modify some of the objects within the cache entry value array.
So I wrote and implementation of CacheEntryProcessor
#Override
public Object process(MutableEntry<Long, Object[]> entry, Object... arguments)
throws EntryProcessorException {
boolean updated = false;
int key = (int)arguments[0];
Set<Long> someIds = Ignition.ignite().cluster().nodeLocalMap().get(key);
Object[] values = entry.getValue();
for (int i = 0; i < values.length; i++) {
Person p = (Person) values[i];
if (someIds.contains(p.getId())) {
p.modify();
if (!updated) {
updated = true;
}
}
}
if (updated) {
entry.setValue(values);
}
return null;
}
}
When the cluster is loaded with data each node consumes around 20GB of heap.
When I run the cache processor with cache.invokeAll on multiple node cluster I have a crazy memory behavior - when the processor is being run I see memory usage going up to even 48GB or higher eventually leading to node separation from the cluster cause GC took too long.
However, if I remove the entry.setValue(values) line, which stores back the modified array into the cache everything is ok, apart from the fact that the data will not be replicated since the cache is not aware of the change - the update is only visible on the primary node :(
Can anybody tell me how to make it work? What is wrong with this approach?
First of all, I would not recommend to allocate large heap sizes. This will very likely cause a long GC pause even if everything is working properly. Basically, JVM will not clean up memory until it reaches certain threshold, and when it does reach, there will be to much garbage to collect. Try switching to off-heap or start more Ignite nodes.
The fact that more garbage is generated in case you update the entry, makes perfect sense. Basically each time you update you replace old value with a new one, and the old one becomes garbage.
If none of this helps, grab a heap dump and check what is occupying the memory.

How to get Java stacks when JVM can't reach a safepoint

We recently had a situation where one of our production JVMs would randomly freeze. The Java process was burning CPU, but all visible activity would cease: no log output, nothing written to the GC log, no response to any network request, etc. The process would persist in this state until restarted.
It turned out that the org.mozilla.javascript.DToA class, when invoked on certain inputs, will get confused and call BigInteger.pow with enormous values (e.g. 5^2147483647), which triggers the JVM freeze. My guess is that some large loop, perhaps in java.math.BigInteger.multiplyToLen, was JIT'ed without a safepoint check inside the loop. The next time the JVM needed to pause for garbage collection, it would freeze, because the thread running the BigInteger code wouldn't be reaching a safepoint for a very long time.
My question: in the future, how can I diagnose a safepoint problem like this? kill -3 didn't produce any output; I presume it relies on safepoints to generate accurate stacks. Is there any production-safe tool which can extract stacks from a running JVM without waiting for a safepoint? (In this case, I got lucky and managed to grab a set of stack traces just after BigInteger.pow was invoked, but before it worked its way up to a sufficiently large input to completely wedge the JVM. Without that stroke of luck, I'm not sure how we would ever have diagnosed the problem.)
Edit: the following code illustrates the problem.
// Spawn a background thread to compute an enormous number.
new Thread(){ #Override public void run() {
try {
Thread.sleep(5000);
} catch (InterruptedException ex) {
}
BigInteger.valueOf(5).pow(100000000);
}}.start();
// Loop, allocating memory and periodically logging progress, so illustrate GC pause times.
byte[] b;
for (int outer = 0; ; outer++) {
long startMs = System.currentTimeMillis();
for (int inner = 0; inner < 100000; inner++) {
b = new byte[1000];
}
System.out.println("Iteration " + outer + " took " + (System.currentTimeMillis() - startMs) + " ms");
}
This launches a background thread which waits 5 seconds and then starts an enormous BigInteger computation. In the foreground, it then repeatedly allocates a series of 100,000 1K blocks, logging the elapsed time for each 100MB series. During the 5 second period, each 100MB series runs in about 20 milliseconds on my MacBook Pro. Once the BigInteger computation begins, we begin to see long pauses interleaved. In one test, the pauses were successively 175ms, 997ms, 2927ms, 4222ms, and 22617ms (at which point I aborted the test). This is consistent with BigInteger.pow() invoking a series of ever-larger multiply operations, each taking successively longer to reach a safepoint.
Your problem interested me very much. You were right about JIT. First I tried to play with GC types, but this did not have any effect. Then I tried to disable JIT and everything worked fine:
java -Djava.compiler=NONE Tests
Then printed out JIT compilations:
java -XX:+PrintCompilation Tests
And noticed that problem starts after some compilations in BigInteger class, I tried to exclude methods one by one from compilation and finally found the cause:
java -XX:CompileCommand=exclude,java/math/BigInteger,multiplyToLen -XX:+PrintCompilation Tests
For large arrays this method could work long, and problem might really be in safepoints. For some reason they are not inserted, but should be even in compiled code. Looks like a bug. The next step should be to analyze assembly code, I did not do it yet.
It's not a bug, it's a performance feature. JVM eliminates safepoint check from counted loops, making them run faster. It expects that either
you care about STW pauses and don't have extra long loops
or you have extra long loops, but fine with safepoints being eventual
If it doesn't fit you, it can be switched off with this flag: -XX:+UseCountedLoopSafepoints
And answering the title question, you can still stop and explore a program with gdb, but the stack traces wouldn't be so nice.
Perhaps that's what the "-F" option of jstack is good for:
OPTIONS
-F
Force a stack dump when 'jstack [-l] pid' does not respond.
I always wondered when an why that could help.

Possible memory leakage with the following MOSS 2007 code?

Does the below code leaks memory? If so, any recommendations on optimising it?
SPWeb web = (SPWeb)properties.Feature.Parent; // comes from the event receiver
//... lots of other code
// the below is the focal point.
foreach (SPWeb childWeb in web.Webs)
{
try
{
// lots of heavy processing with the childWebs
}
finally
{
if (childWeb != null)
{
childWeb.Dispose();
}
}
}
The code you have posted should be fine. However, depending on what you do with the childWeb within the try-statement, it might cause memory leaks. Can you post the entire code? Do you suspect memory leaks?
According to Disposing Objects, your code matches the Good Coding Practice for SPWeb.Webs.
As mentioned on that page, I would recommend downloading and using SPDisposeCheck as both verification of correct code and identification of potential memory leaks.

examples of garbage collection bottlenecks

I remembered someone telling me one good one. But i cannot remember it. I spent the last 20mins with google trying to learn more.
What are examples of bad/not great code that causes a performance hit due to garbage collection ?
from an old sun tech tip -- sometimes it helps to explicitly nullify references in order to make them eligible for garbage collection earlier:
public class Stack {
private static final int MAXLEN = 10;
private Object stk[] = new Object[MAXLEN];
private int stkp = -1;
public void push(Object p) {stk[++stkp] = p;}
public Object pop() {return stk[stkp--];}
}
rewriting the pop method in this way helps ensure that garbage collection gets done in a timely fashion:
public Object pop() {
Object p = stk[stkp];
stk[stkp--] = null;
return p;
}
What are examples of bad/not great code that causes a performance hit due to garbage collection ?
The following will be inefficient when using a generational garbage collector:
Mutating references in the heap because write barriers are significantly more expensive than pointer writes. Consider replacing heap allocation and references with an array of value types and an integer index into the array, respectively.
Creating long-lived temporaries. When they survive the nursery generation they must be marked, copied and all pointers to them updated. If it is possible to coalesce updates in order to reuse of an old version of a collection, do so.
Complicated heap topologies. Again, consider replacing many references with indices.
Deep thread stacks. Try to keep stacks shallow to make it easier for the GC to collate the global roots.
However, I would not call these "bad" because there is nothing objectively wrong with them. They are only inefficient when used with this kind of garbage collector. With manual memory management, none of the issues arise (although many are replaced with equivalent issues, e.g. performance of malloc vs pool allocators). With other kinds of GC some of these issues disappear, e.g. some GCs don't have a write barrier, mark-region GCs should handle long-lived temporaries better, not all VMs need thread stacks.
When you have some loop involving the creation of new object's instances: if the number of cycles is very high you procuce a lot of trash causing the Garbage Collector to run more frequently and so decreasing performance.
One example would be object references that are kept in member variables oder static variables. Here is an example:
class Something {
static HugeInstance instance = new HugeInstance();
}
The problem is the garbage collector has no way of knowing, when this instance is not needed anymore. So its usually better to keep things in local variables and have small functions.
String foo = new String("a" + "b" + "c");
I understand Java is better about this now, but in the early days that would involve the creation and destruction of 3 or 4 string objects.
I can give you an example that will work with the .Net CLR GC:
If you override a finalize method from a class and do not call the super class Finalize method such as
protected override void Finalize(){
Console.WriteLine("Im done");
//base.Finalize(); => you should call him!!!!!
}
When you resurrect an object by accident
protected override void Finalize(){
Application.ObjJolder = this;
}
class Application{
static public object ObjHolder;
}
When you use an object that uses Finalize it takes two GC collections to get rid of the data, and in any of the above codes you won't delete it.
frequent memory allocations
lack of memory reusing (when dealing with large memory chunks)
keeping objects longer than needed (keeping references on obsolete objects)
In most modern collectors, any use of finalization will slow the collector down. And not just for the objects that have finalizers.
Your custom service does not have a load limiter on it, so:
A lot requests come in for some reason at the same time (everyone logs on in the morning say)
The service takes longer to process each requests as it now has 100s of threads (1 per request)
Yet more part processed requests builds up due to the longer processing time.
Each part processed request has created lots of objects that live until the end of processing that request.
The garbage collector spends lots of time trying to free memory it, however it can’t due to the above.
Yet more part processed requests builds up due to the longer processing time…. (including time in GC)
I have encountered a nice example while doing some parallel cell based simulation in Python. Cells are initialized and sent to worker processes after pickling for running. If you have too many cells at any one time the master node runs out of ram. The trick is to make a limited number of cells pack them and send them off to cluster nodes before making some more, remember to set the objects already sent off to "None". This allows you to perform large simulations using the total RAM of the cluster in addition to the computing power.
The application here was cell based fire simulation, only the cells actively burning were kept as objects at any one time.

Resources