SocketException between our Load Balancers and Tomcat during Garbage Collection - linux

We have noticed the following problem: whenever our Tomcat JVM performs full GC, the requests to create a connection between the LB ant the Tomcat are failed. This is very problematic since all these requests will never get the chance to arrive to the application server.
This problem occured even when we have pointed one Tomcat to the other without any LB in between.
Are there any definition that can be done in the JVM / Tomcat / Linux that will make the HTTP connection to wait the time till the GC ends and the application JVM will receive the request.
We are using Java6, Tomcat7, and Linux Ubuntu OS.
Thanks,
Yosi

Have you looked into using the concurrent garbage collector using the 'XX:+UseConcMarkSweepGC' option? This essentially performs garbage collection in the background so that there aren't nearly as many (if any) "stop the world" full GCs.

You may need to enable concurrent garbage collection as described in http://www.oracle.com/technetwork/java/gc-tuning-5-138395.html
-XX:+UseConcMarkSweepGC
Also try other GC configs.

Related

Diagnosing Sporadic Lockups in Website Running on IIS

Goal
Determine the cause of the sporadic lock ups of our web application running on IIS.
Problem
An application we are running on IIS sporadically locks up throughout the day. When it locks up it will lock up on all workers and on all load balanced instance.
Environment and Application
The application is running on 4 different Windows Server 2016 machines. The machines are load balanced using ha-proxy using a round robin load balancing scheme. The IIS application pools this website is hosted in are configured to have 4 workers each and the application it hosts is a 32-bit application. The IIS instances are not using a shared configuration file but the application pools for this application are all configured the same.
This application is the only application in the IIS application pool. The application is an ASP.NET web API and is using .NET 4.6.1. The application is not creating threads of its own.
Theory
My theory for why this is happening is that we have requests that are coming in that are taking ~5-30 minutes to complete. Every machine gets tied up servicing these requests so they look "locked up". The company rolled their own logging mechanism and from that I can tell we have requests that are taking ~5-30 minutes to complete. The team responsible for the application has cleaned up many of these but I am still seeing ~5 minute requests in the log.
I do not have access to the machines personally so our systems team has gotten memory dumps of the application when this happens. In the dumps I generally will see ~50 threads running and all of them are in our code. These threads will be all over our application and do not seem to be stopped on any common piece of code. When the application is running correctly the dumps have 3-4 threads running. Also I have looked at performance counters like the ASP.NET\Requests Queued but it never seems to have any requests queued. During these times the CPU, Memory, Disk and Network usage look normal. Using windbg none of the threads seem to have a high CPU time other than the finalizer thread which as far as I know should live the entire time.
Conclusion
I am looking for a means to prove or disprove my theory as to why we are locking up as well as any metrics or tools I should look at.
So this issue came down to our application using query in stitch on a table with 2,000,000 records in it to another table. Memory would become so fragmented that the Garbage Collector was spending more time trying to find places to put objects and moving them around than it was running our code. This is why it appeared that our application was still working and why their was no exceptions. Oddly IIS would time out the requests but would continue processing the threads.

NodeJS Performance Issue

I'm running an API server using NodeJS 6.10.3 LTS on Ubuntu 14.04 (trusty). I've noticed that my API server tops out at ~600 reqs/min running on a c4.large EC2 instance. By tops out I mean, I see the CPU go uptil 100% Note, I know that I'm not fully utilizing the instance by using the cluster module, but that's ok for now.
I took a .cpuprofile dump of my API server for 10 seconds, and noticed that every second, for ~300ms, the profiler shows my NodeJS code is sitting (idle).
Does anyone know what that (idle) implies? Is it a GC issue? Or is it a internal (to V8) lock that I'm triggering? Any help or pointers to tools to help debug this would be nice. I'm working on anonymizing some of stack traces in the cpuprofile so I can share.
The packages I'm using are ExpressJS 4, Couchbase NodeJS SDK, Socket.IO mainly. The codepaths are mainly reading requests, and pushing to Couchbase. And finally querying couchbase via Views API, and pushing some aggregated data on a Socket.IO channel. So all pretty I/O async friendly stuff. I've made sure that I'm not calling any synchronous functions. There are no patterns of function calls before the (idle) in the cpu profile.
It could also just be I/O wait, meaning none of the sockets have data ready to read yet and so the time is spent idle. If you are using a load testing library you should check that the requests are evenly distributed within a second.
Take a look at https://www.npmjs.com/package/gc-stats to check GC data. There are flags to increase heap space, and to change when GC runs, if the problem turns out to be GC related.

Web Api Requests Queueing up forever on IIS (in state: ExecuteRequestHandler)

I'm currently experiencing some hangs on production environment, and after some investigation I'm seeing a lot of request queued up in the worker process of the Application Pool. The common thing is that every request that is queued for a long time is a web api request, I'm using both MVC and Web API in the app.
The requests are being queued for about 3 hours, when the application pool is recycled they immediately start queueing up.
They are all in ExecuteRequestHandler state
Any ideas for where should I continue digging?
Your requests can be stalled for a number of reasons:
they are waiting on I/O operation e.g database, web service call
they are looping or performing operations on a large data set
cpu intensive operations
some combination of the above
In order to find out what your requests are doing, start by getting the urls of the requests taking a long time.
You can do this in the cmd line as follows
c:\windows\system32\inetsrv\appcmd list requests
If its not obvious from the urls and looking at the code, you need to do a process dump of the w3wp.exe on the server. Once you have a process dump you will need to load it into windbg in order to analyze what's taking up all the cpu cycles. Covering off windbg is pretty big, but here's briefly what you need to do:
load the SOS dll (managed debug extension)
call the !runaway command
to get list of long running threads dive into a long running thread
by selecting it and calling !clrstack command
There are many blogs on using windbg. Here is one example. A great resource on analyzing these types of issues is Tess Ferrandez's blog.
This is a really long shot without having first hand access to your system but try and check the Handler mappings in IIS Manager gui for your WebApi. Compare it with IIS settings of your DEV or any other Env where it works.
IF this isnt the issue then do a comparison of all other IIS settings for that App.
Good luck.

Jboss-6.1 Application running very slow

my application is running on jboss 6.1, and after few days my applications runs very slow., this is the situation I am facing every week,. for this I am killing java and clearing the temp and work folders and restarting the jboss again. Is there any other ways to clean the memory / manage the application. Kindly give me the suggestions for Linux and windows platforms.
Kindly help any one.
Thanks & Regards,
Sharath
Based on your RAM size of the system you can increase following parameters in run.conf(for linux) or run.conf.bat(for windows):
XMS, XMX, MaxPermSize.
-Xms512M -Xmx1024M -XX:MaxPermSize=128M
The flag Xmx specifies the maximum memory allocation pool for a Java Virtual Machine (JVM), while Xms specifies the initial memory allocation pool.
MaxPermSize are used to set size for Permanent Generation
The Permanent Generation is where class files are kept. These are the result of compiled classes and jsp pages. If this space is full, it triggers a Full Garbage Collection. If the Full Garbage Collection cannot clean out old unreferenced classes and there is no room left to expand the Permanent Space, an Out‐of‐ Memory error (OOME) is thrown and the JVM will crash
Hope you are aware of these three flags.

JBoss Threads - JMX vs Java Visual VM - reporting different results - Why?

When i use Java VisualVM to monitor my JBoss Application.
It shows
Live Threads as: 155
Daemon Threads as: 135
When i use JMX Web Console of JBoss.
It shows
Current Busy Threads as: 40
Current Thread Count as: 60
Why is there so much discrepancy between what Java Visual VM is reporting and what JMX Web Console shows. (How is Live Threads different from Busy Threads)
A live thread is one that exists and is not Terminated. (See Thread.State)
A busy thread is one that is actually working or, more precisely, Runnable.
JBoss's Web Console tends to report fewer threads because it is very non-invasive. In other words, it does not have to spawn additional threads just to render you a web page. It's already running a web server and it already allocated threads to handle web requests before you went into JMX Console.
Visual VM on the other hand, starts up several threads to support the JMX remoting (usually RMI) which comes with a little extra baggage. You might see extra threads like:
RMI TCP Connection(867)
RMI TCP Connection(868)
RMI TCP Connection(869)
JMX server connection timeout
Having said that, the discrepancy you are reporting is way out of line and makes me think that you're not looking at the same JVM.
The JMX Console is obvious :), so I would guess your Visual VM is connected elsewhere. See if you can correlate similar thread name (using the MBean jboss.system:type=ServerInfo listThreadDump operation), or browse the MBeans in Visual VM and inspect the JBoss MBeans. These mbeans are good ones to look at because they indicate a binding to a socket so they could not have the same values if they were not the same JVM process:
jboss.web:name=HttpRequest1,type=RequestProcessor,worker=http-0.0.0.0-18080
Of course, the other thing would be that if you start VisualVM first, have it running and then go to JMX Console and don't see as many threads, you're definitely in a different VM.
Cheers.
//Nicholas

Resources