drools with high GC time - garbage-collection

In my case, i want to use drools to deal with real time streaming (from kafka) message as facts.
In pre-work, i test drools (version 7.5.0.Final) with the following code with jvm params(-Xms1500m -Xmx1500m -Xmn500m, jdk 1.8):
[code sample : simulate continuous facts with while method][1]
And the test result is(the app runs more than 24 hours):
jconsole monitor
Get from the monitor, i find the GC is too high, i guess the fact is located in the working memory with reference, so cannot be freed until major GC collection.
Is there any method to free facts explictly? Or how can i lower the GC collections(major GC)?

If your facts are events, you can take advantage of Drools' expiration policies for events to automatically remove your facts from the session.
If not, your session will get bigger and bigger. Drools needs to have all the facts in memory in order to work. You can manually retract facts from a session by one of these options:
From the right-hand side of a rule by using delete(fact).
From your application by using kSession.delete(factHandle).
Hope it helps,

Related

OutOfDirectMemoryError using Spring-Data-Redis with Lettuce in a multi-threading context

We are using spring-data-redis with the spring-cache abstraction and lettuce as our redis-client.
Additionally we use multi-threading and async execution on some methods.
An example workflow would look like this:
Main-Method A (main-thread) --> calls Method B (#Async) which is a proxy-method to be able to run the logic asynchronously in another thread. --> Method B calls Method C, which is #Cacheable. The #Cacheable Annotation handles reading/writing to our redis-cache.
What's the problem?
Lettuce is Netty-based which works relying on the DirectMemory. Due to the #Async nature of our program, we have multiple threads using the LettuceConnection (and therefore Netty) at the same time.
By design all threads will use the same (?) Netty which shares the DirectMemory. Due to an apparently too small MaxDirectMemorySize we get an OutOfDirectMemoryError, when too many threads are accessing Netty.
Example:
io.lettuce.core.RedisException: io.netty.handler.codec.EncoderException: io.netty.util.internal.OutOfDirectMemoryError: failed to allocate 8388352 byte(s) of direct memory (used: 4746467, max: 10485760)
What have we found so far?
We use the https://docs.cloudfoundry.org/buildpacks/java/ and calculate the MaxDirectMemorySize using the https://github.com/cloudfoundry/java-buildpack-memory-calculator.
Which leeds to a MaxDirectMemorySize=10M. Having an actual available memory of 4GB the calculated MaxDirectMemorySize is probably way to conservative. This might be part of the problem.
Potential solutions to the problem
increase the MaxDirectMemorySize of the JVM --> but we are not sure that is sufficent
configure Netty not to use the DirectMemory (noPreferDirect=true) --> Netty will then use the heap, but we are insecure if this would slow down our application too much if Netty is too hungry for memory
no idea if this would be an option or even make the problem worse: configure Lettuce to shareNativeConnection=false --> which would lead to multiple connections to the redis
Our Question is: How do we solve this the correct way?
I'll happily provide more information on how we set up the configuration of our application (application.yml, LettuceConnection etc.), if any of these would help to fix the problem.
Thanks to the folks over at: https://gitter.im/lettuce-io/Lobby we got some clues on how to approach these issues.
As suspected, the 10M MaxDirectMemorySize is too conservative considering the total available memory.
Recommendation was to increase this value. Since we don't actually know how much memory Netty would need to perform more stable, we thought of the following steps.
First: We will disable Netty's preference of the MaxDirectMemory by setting noPreferDirect=true. Netty will then use the heap-buffer.
Second: We will then monitor how much heap-memory Netty is going to consume during operation. Doing this, we'll be able to infer an average memory consumption for Netty.
Third: We will take the average memory consumption value and set this as the "new" MaxDirectMemorySize by setting it in the JVM option -XX:MaxDirectMemorySize. Then we'll re-enable Netty to use the DirectMemory by setting noPreferDirect=false.
Fourth: Monitor log-entries and exceptions and see if we still have a problem or if this did the trick.
[UPDATE]
We started with the mentioned steps but realized, that setting noPreferDirect=true does not completely deter netty from using DirectMemory. For some use-cases (nio-Processes) Netty still uses DirectMemory.
So we had to increase the MaxDirectMemorySize.
For now we set the following JAVA_OPTS -Dio.netty.noPreferDirect=true -XX:MaxDirectMemorySize=100M. Which will probably fix our issue.

Is reproducible benchmarking possible?

I need to test some node frameworks, or at least their routing part. That means from the request arrives at the node process for processing until a route has been decided and a function/class with the business logic is called, e.g. just before calling it. I have looked hard and long for a suitable approach, but concluded that it must be done directly in the code and not using an external benchmark tool. I fear measuring the wrong attributes. I tried artillery and ab but they measure a lot more attributes then I want to measure, like RTT, bad OS scheduling, random tasks executing in the OS and so on. My initial benchmarks for my custom routing code using process.hrtime() shows approx. 0.220 ms (220 microseconds) execution time but the external measure shows 0.700 (700 microseconds) which is not an acceptable difference, since it's 3.18x additional time. Sometimes execution time jumps to 1.x seconds due to GC or system tasks. Now I wonder how a reproducible approach would look like? Maybe like this:
Use Docker with Scientific Linux to get a somewhat controlled environment.
A minimal docker container install, node enabled container only, no extras.
Store time results in global scope until test is done and then save to disk.
Terminate all applications with high/moderate diskIO and/or CPU on host OS.
Measure time as explained before and crossing my fingers.
Any other recommendations to take into consideration?

JVM Major Garbage Collections Not Running for months

My application garbage collector used to run a major frequently, maybe once a day. But it stopped working suddenly. Now it has reached to 90 % and I had to restart the application few times.
This is in a production environment and what I allowed to do is read the logs and see the JVM states via provided UI.
Another observation I made was for the last 3 months, 1st 2 months there was no minor garbage collections but a lot of majors. For the last month no major collections but many minor collections.
Perhaps it never does a major collection because you are restarting the application before it gets a chance.
You should be getting many minor collections if the young space is a reasonable size.
If you were only getting major collections most likely your JVM wasn't tuned correctly. I would try to remove as many GC tuning parameters as possible and only add each one if you know it helps. Having too many tuning parameters set is a good way to get strange behaviour.

Performance issue while using Parallel.foreach() with MaximumDegreeOfParallelism set as ProcessorCount

I wanted to process records from a database concurrently and within minimum time. So I thought of using parallel.foreach() loop to process the records with the value of MaximumDegreeOfParallelism set as ProcessorCount.
ParallelOptions po = new ParallelOptions
{
};
po.MaxDegreeOfParallelism = Environment.ProcessorCount;
Parallel.ForEach(listUsers, po, (user) =>
{
//Parallel processing
ProcessEachUser(user);
});
But to my surprise, the CPU utilization was not even close to 20%. When I dig into the issue and read the MSDN article on this(http://msdn.microsoft.com/en-us/library/system.threading.tasks.paralleloptions.maxdegreeofparallelism(v=vs.110).aspx), I tried using a specific value of MaximumDegreeOfParallelism as -1. As said in the article thet this value removes the limit on the number of concurrently running processes, the performance of my program improved to a high extent.
But that also doesn't met my requirement for the maximum time taken to process all the records in the database. So I further analyzed it more and found that there are two terms as MinThreads and MaxThreads in the threadpool. By default the values of Min Thread and MaxThread are 10 and 1000 respectively. And on start only 10 threads are created and this number keeps on increasing to a max of 1000 with every new user unless a previous thread has finished its execution.
So I set the initial value of MinThread to 900 in place of 10 using
System.Threading.ThreadPool.SetMinThreads(100, 100);
so that just from the start only minimum of 900 threads are created and thought that it will improve the performance significantly. This did create 900 threads, but it also increased the number of failure on processing each user very much. So I did not achieve much using this logic. So I changed the value of MinThreads to 100 only and found that the performance was much better now.
But I wanted to improve more as my requirement of time boundation was still not met as it was still exceeding the time limit to process all the records. As you may think I was using all the best possible things to get the maximum performance in parallel processing, I was also thinking the same.
But to meet the time limit I thought of giving a shot in the dark. Now I created two different executable files(Slaves) in place of only one and assigned them each half of the users from DB. Both the executable were doing the same thing and were executing concurrently. I created another Master program to start these two Slaves at the same time.
To my surprise, it reduced the time taken to process all the records nearly to the half.
Now my question is as simple as that I do not understand the logic behind Master Slave thing giving better performance compared to a single EXE with all the logic same in both the Slaves and the previous EXE. So I would highly appreciate if someone will explain his in detail.
But to my surprise, the CPU utilization was not even close to 20%.
…
It uses the Http Requests to some Web API's hosted in other networks.
This means that CPU utilization is entirely the wrong thing to look at. When using the network, it's your network connection that's going to be the limiting factor, or possibly some network-related limit, certainly not CPU.
Now I created two different executable files … To my surprise, it reduced the time taken to process all the records nearly to the half.
This points to an artificial, per process limit, most likely ServicePointManager.DefaultConnectionLimit. Try setting it to a larger value than the default at the start of your program and see if it helps.

Velocity CTP2 Serious Memory Bug

When you create an instance of the cachefactory and then don’t use it anymore the memory that was used during the creation of the object is not released. This will have a substantial effect on all web apps or scenarios where a cachfactory might be created multiple times. The symptoms of this will be unusually high memory use one the process and in IIS this will most likely result in your app having to recycle more often since it with overrun its allocated memory more quickly.
The following code will show an increase of about 500MB yes I mean MegaBytes of memory usage!
To duplicate put the following code into your app:
Dim CacheFactory1 As CacheFactory = New CacheFactory()
For i As Int32 = 1 To 1 * (10 ^ 4)
CacheFactory1 = New CacheFactory()
CacheFactory1 = Nothing
Next
There are only two workarounds for this.
Velocity team fixes the bug (and I’m sure they will)
You need to use the same cachefactory object on a static method in your app and reference it every time you want to use the cache. (this works but isn’t optimal in my opinion.)
I also have a cachingscope that can be used to wrap your caching methods and will post this on codeplex soon. You can wrap it around your caching methods just like a transaction scope and it will manage the locking and connection for you.
So where is the question? You should file the bug, and not post this here, as the Velocity team is more than likely monitoring Microsoft Connect for bugs.
I've build a scope provider for resolving this issue. You can get the code here.
http://www.codeplex.com/CacheScope

Resources