Garbage collection monitoring - when to alert?

Garbage collection monitoring - when to alert? - garbage-collection

Spring Boot exposes the following 4 metrics related to GC. I am configuring Nagios to send two alerts (Warning and Critical) when they reach certain value. I am wondering what that thresholds should be? I have a 16GB heap.
gc.ps_marksweep.count
gc.ps_marksweep.time
gc.ps_scavenge.count
gc.ps_scavenge.time

Related

Hazelcast increased Context Switching

Hazelcast in Embedded mode is increasing Context Switching by approximately 46%, Is this expected?
Is there anyway to control or configure the same?

Except the 2 health monitor threads, rest are Hazelcast internal threads and it is not recommended to change anything there unless they are having a drastically negative affect on performance.
What is shown in the attached picture is the total time taken by the threads. 8 threads is not a lot of context switching. You will need to provide more information on how this impacts your application.
If you do not want health monitoring and diagnostics, you can disable it.
Check out https://docs.hazelcast.org/docs/3.12.5/manual/html-single/index.html#threading-model for info on other threads.

100% Memory usage on Azure App Service Plan with two Apps - working set used 10gb+

I've got an app service plan with 14gb of memory - it should be plenty for my application's needs. There are two application services running on it, each identical - the private memory consumption of these hovers around 1gb but can spike to 4gb during periods of high usage. One app has a heavier usage pattern than the other.
Lately, during periods of high usage, I've noticed that the heavily used service can become unresponsive, and memory usage stays at 100% in the App Service Plan.
The high traffic service is using 4gb of private memory and starting to massively slow down. When I head over to the /scm.../ProcessExplorer/ page, I can see that the low traffic service has 1gb private memory used and 10gb of 'Working Set'.
As I understand it, on a single machine at least, the working set should be freed up when that memory is needed on another process. Does this happen naturally when two App Services share a single Plan?
It looks to me like the working set on the low-traffic instance is not being freed up to supply the needs of the high-traffic App Service.
If this is indeed the case, the simple fix is to move them to separate App Service Plans, each with 7gb of memory. However this seems like it might potentially be just shifting the problem around - has anyone else noticed similar issues with multiple Apps on a single App Service Plan? As far as I understand it, these shouldn't interfere with one another to the extent that they all need to be separated. Or have I got the wrong diagnosis?

In some high memory-consumption scenarios, your app might truly require more computing resources. In that case, consider scaling to a higher service tier so the application gets all the resources it needs. Other times, a bug in the code might cause a memory leak. A coding practice also might increase memory consumption. Getting insight into what's triggering high memory consumption is a two-part process. First, create a process dump, and then analyze the process dump. Crash Diagnoser from the Azure Site Extension Gallery can efficiently perform both these steps. For more information.
refer Capture and analyze a dump file for intermittent high memory for Web Apps.

In the end we solved this one via mitigation, rather than getting to the root cause.
We found a mitigation strategy to our previous memory issues several months ago, which was just to restart the server each night using a powershell script. This seems to prevent the memory just building up over time, and only costs us a few seconds of downtime. Our system doesn't have much overnight traffic as our users are all based in the same geographic location.
However we recently found that the overnight restart was reporting 'success' but actually failing each night due to expired credentials. Which meant that the memory issues we were having in the question I posted were actually exacerbated by server uptimes of several weeks. Restoring the overnight restart resolved the memory issues we were seeing, and we certainly don't see our system ever using 10gb+ again.
We'll investigate the memory issues if they rear their heads again. KetanChawda-MSFT's suggestion of using memory dumps to analyse the memory usage will be employed for this investigation when it's needed.

Reason for high response time of a transaction during Load test

I am doing load testing for one hour and observed one transaction is taking high response time compared to expected value. Why is this happening? What could be the reasons even if GC, thread and system resources (CPU and Memory) utilization are normal.
How to analyze it?

Numerous. The most obvious ones would be:
Slow database query - use a DB Monitoring Tool to see what happens on database level
An issue with your application code (memory leak, large object, "heavy" function) - re-run your test with Profiler Tool telemetry and collect all the information on JVM heap, threads, objects, etc. as you can. A thread dump can shed light on where your application is stuck
It can be even a networking issue, response time includes metrics like Connect Time and Latency (time to first byte) so you can receive higher response times due to low network capacity or even a faulty router

Is it possible to log systems memory and cpu usage using iis logs?

I have a requirement to motiror what was the CPU usage and memory usage of the system when perticular request came.
Is it possible using IIS logs or any other method/tool to do so?
We dont want the usage of IIS process we want the usage of whole system at that time.

You can use windows performance monitor to record cpu and memory usage (using data collector sets). Then, you can check in your IIS logs at what time the request in question came in and look up the recorded data in the performance monitor data collector set.
I don't think there is a tool which automatically combines the IIS log with system performance data. There are tools which include IIS monitoring, but those usually won't break reports down to a single request. If you want to do some further research you can use my list of 40 windows server performance monitoring tools as a starting point.

iis Cpu is on 95% usage with very little users - on production

I have a web site and I am using iis as my web server. I noticed that on production server, the cpu reaches 95% usage pretty fast with very little users. this behaviour I don't see on my developement server. I am using visual studio to develop and iis as my local web server as well.

How much big traffic you have on production comparing to development server? How their parameters compare? Before starting a deep analysis of the application itself, I would identify all the infrastructure and environmental differences. Sometime such problems happens because of some other software, like antivirus software running in the background...
Nevertheless, because it sounds rather as a application problem, I would first check Event Viewer for errors. Then I would start from monitoring a few Performance Counters to correlate % Processor Time counter with Current Connections, Available Memory, # of Exceps Thrown / sec, % Time in GC and so on. This kind of behavior usually has a reason from the list:
excessive loops usage due to some logic error, like calling the same service again and again, trying to load or parse malfunctioned file etc. This can be analyzed with dump analysis (look below).
high CPU usage due to Garbage Collector - when memory usage is extensive (or there is a memory leak even) GC may start to consume more and more CPU fighting with the memory shortage. You will see this with memory-related performance counters.
a considerable amount of exceptions thrown (for example due to some environmental problems like network unavailability, production data difference) can also consume a lot of CPU. Event Viewer and exception-related performance counters (as they can be handled silently by your application) should be a indicator here.
To further analyze your application, I suggest to make a full memory dump during high CPU usage. You can do that with Debug Diag tool. Please refer this IIS troubleshooting guide for details.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string