I am currently using Wildlfly 10.1. in production and just discovered that we have a lot of gc pause times. Analysis of the gc log exposed that 95% of the gc runs are triggered by System.gc() calls. Our application code does not invoke any of them.
Is this a Wildfly feature?
Or can someone point me in the right direction to figure out if these System.gc() invokations make sense?
Of course, I am aware that there is a number of measures to optimize gc behavoir. I am just asking myself why there are so many System.gc() calls.
System.gc() is called by Java RMI or more specifically by sun.misc.GC class - Sourcecode
Default interval is 1 hour. You can set it by using these parameters:
-Dsun.rmi.dgc.client.gcInterval=3600000
-Dsun.rmi.dgc.server.gcInterval=3600000
Setting -XX:+DisableExplicitGC can make your application slower and slower over time.
See also: What is the default Full GC interval in Java 8
If you want to find callers for System.gc the most reliable method is to attach a debugger and set a method entry breakpoint on it.
Related
We are using spring-data-redis with the spring-cache abstraction and lettuce as our redis-client.
Additionally we use multi-threading and async execution on some methods.
An example workflow would look like this:
Main-Method A (main-thread) --> calls Method B (#Async) which is a proxy-method to be able to run the logic asynchronously in another thread. --> Method B calls Method C, which is #Cacheable. The #Cacheable Annotation handles reading/writing to our redis-cache.
What's the problem?
Lettuce is Netty-based which works relying on the DirectMemory. Due to the #Async nature of our program, we have multiple threads using the LettuceConnection (and therefore Netty) at the same time.
By design all threads will use the same (?) Netty which shares the DirectMemory. Due to an apparently too small MaxDirectMemorySize we get an OutOfDirectMemoryError, when too many threads are accessing Netty.
Example:
io.lettuce.core.RedisException: io.netty.handler.codec.EncoderException: io.netty.util.internal.OutOfDirectMemoryError: failed to allocate 8388352 byte(s) of direct memory (used: 4746467, max: 10485760)
What have we found so far?
We use the https://docs.cloudfoundry.org/buildpacks/java/ and calculate the MaxDirectMemorySize using the https://github.com/cloudfoundry/java-buildpack-memory-calculator.
Which leeds to a MaxDirectMemorySize=10M. Having an actual available memory of 4GB the calculated MaxDirectMemorySize is probably way to conservative. This might be part of the problem.
Potential solutions to the problem
increase the MaxDirectMemorySize of the JVM --> but we are not sure that is sufficent
configure Netty not to use the DirectMemory (noPreferDirect=true) --> Netty will then use the heap, but we are insecure if this would slow down our application too much if Netty is too hungry for memory
no idea if this would be an option or even make the problem worse: configure Lettuce to shareNativeConnection=false --> which would lead to multiple connections to the redis
Our Question is: How do we solve this the correct way?
I'll happily provide more information on how we set up the configuration of our application (application.yml, LettuceConnection etc.), if any of these would help to fix the problem.
Thanks to the folks over at: https://gitter.im/lettuce-io/Lobby we got some clues on how to approach these issues.
As suspected, the 10M MaxDirectMemorySize is too conservative considering the total available memory.
Recommendation was to increase this value. Since we don't actually know how much memory Netty would need to perform more stable, we thought of the following steps.
First: We will disable Netty's preference of the MaxDirectMemory by setting noPreferDirect=true. Netty will then use the heap-buffer.
Second: We will then monitor how much heap-memory Netty is going to consume during operation. Doing this, we'll be able to infer an average memory consumption for Netty.
Third: We will take the average memory consumption value and set this as the "new" MaxDirectMemorySize by setting it in the JVM option -XX:MaxDirectMemorySize. Then we'll re-enable Netty to use the DirectMemory by setting noPreferDirect=false.
Fourth: Monitor log-entries and exceptions and see if we still have a problem or if this did the trick.
[UPDATE]
We started with the mentioned steps but realized, that setting noPreferDirect=true does not completely deter netty from using DirectMemory. For some use-cases (nio-Processes) Netty still uses DirectMemory.
So we had to increase the MaxDirectMemorySize.
For now we set the following JAVA_OPTS -Dio.netty.noPreferDirect=true -XX:MaxDirectMemorySize=100M. Which will probably fix our issue.
I need to test some node frameworks, or at least their routing part. That means from the request arrives at the node process for processing until a route has been decided and a function/class with the business logic is called, e.g. just before calling it. I have looked hard and long for a suitable approach, but concluded that it must be done directly in the code and not using an external benchmark tool. I fear measuring the wrong attributes. I tried artillery and ab but they measure a lot more attributes then I want to measure, like RTT, bad OS scheduling, random tasks executing in the OS and so on. My initial benchmarks for my custom routing code using process.hrtime() shows approx. 0.220 ms (220 microseconds) execution time but the external measure shows 0.700 (700 microseconds) which is not an acceptable difference, since it's 3.18x additional time. Sometimes execution time jumps to 1.x seconds due to GC or system tasks. Now I wonder how a reproducible approach would look like? Maybe like this:
Use Docker with Scientific Linux to get a somewhat controlled environment.
A minimal docker container install, node enabled container only, no extras.
Store time results in global scope until test is done and then save to disk.
Terminate all applications with high/moderate diskIO and/or CPU on host OS.
Measure time as explained before and crossing my fingers.
Any other recommendations to take into consideration?
I'm using ArangoDb 3.0.5 with arangodb-java-driver 3.0.1. ArangoDB is running on a 3.5ghz i7 with 24gb ram and an ssd.
Loading some simple Vertex data from Apache Flink seems to be going very slowly, at roughly 1000 vertices/sec. Task Manager shows it is CPU bound on the ArangoDB process.
My connector is calling startBatchMode, iterating through 500 calls to graphCreateVertex (with wait for sync set to false), and then calling executeBatch.
System resources in the management interface shows roughly 15000 (per sec?) while the load is running, and used CPU time fixed to 1 for user time. I'm new to ArangoDB and am not sure how to profile what is going on. Any help much appreciated!
Rob
Your performance result is the expected behavior. The point with batchMode is, that all of you 500 calls are send in one and executed on the server in only one thread.
To gain better performance, you can use more than one thread in your client for creating your vertices. More requests in parallel will allow the server to use more than one thread.
Also you can use createDocument instead of graphCreateVertex. This avoids consistency checks on the graph, but is a lot faster.
If you don't need these checks, you can also use importDocuments instead of batchMode + createDocument which is even faster.
I'm the inheritor of a groovy application. The groovy part is rather small, maybe 500 loc that's mostly used to prime and start java threads.
Now the sys admin people comes to me with tales of woe, with tales of the dreaded OOME.
With a java app I would take a look at the gc log but here there is none such.
How do I get a gc log for groovy? Is it possible?
I've googled around for quite a bit to no avail.
Anybody with any ideas?
With a java app I would take a look at the gc log but here there is none such.
groovy runs on the JVM, it accepts the same JVM options as java applications would. You can pass them via the JAVA_OPTS environment variable
I have been asked to debug, and improve, a complex multithreaded app, written by someone I don't have access to, that uses concurrent queues (both GCD and NSOperationQueue). I don't have access to a plan of the multithreaded architecture, that's to say a high-level design document of what is supposed to happen when. I need to create such a plan in order to understand how the app works and what it's doing.
When running the code and debugging, I can see in Xcode's Debug Navigator the various threads that are running. Is there a way of identifying where in the source-code a particular thread was spawned? And is there a way of determining to which NSOperationQueue an NSOperation belongs?
For example, I can see in the Debug Navigator (or by using LLDB's "thread backtrace" command) a thread's stacktrace, but the 'earliest' user code I can view is the overridden (NSOperation*) start method - stepping back earlier in the stack than that just shows the assembly instructions for the framework that invokes that method (e.g. __block_global_6, _dispatch_call_block_and_release and so on).
I've investigated and sought various debugging methods but without success. The nearest I got was the idea of method swizzling, but I don't think that's going to work for, say, queued NSOperation threads. Forgive my vagueness please: I'm aware that having looked as hard as I have, I'm probably asking the wrong question, and probably therefore haven't formed the question quite clearly in my own mind, but I'm asking the community for help!
Thanks
The best I can think of is to put breakpoints on dispatch_async, -[NSOperation init], -[NSOperationQueue addOperation:] and so on. You could configure those breakpoints to log their stacktrace, possibly some other info (like the block's address for dispatch_async, or the address of the queue and operation for addOperation:), and then continue running. You could then look though the logs when you're curious where a particular block came from and see what was invoked and from where. (It would still take some detective work.)
You could also accomplish something similar with dtrace if the breakpoints method is too slow.