Why long running threads cause cpu overhead? - iis

Recently in our production environment, one of our web site which is running at IIs caused 100% CPU usage.
We figured out that issue was about long running MySQL queries. I know that the ASP.NET engine uses thread pool for processing HTTP requests and when I got a dump file for the worker process, thread count was about 100~. I can't really understand why long running DB queries cause CPU overhead. If anyone can explain with details I'll be glad.

Related

ASP.NET Core 2.2 experiencing high CPU usage

So I have hosted asp.net core 2.2 web service on Azure(S2 plan). The problem is that my application sometimes getting high CPU usage(almost 99%). What I have done for now - checked process explorer on azure. I see there a lot of processes who are consuming CPU. Maybe someone knows if it's okay for these processes consume CPU?
Currently, I don't have an idea where do they come from. Maybe it's normal to have them here.
Shortly about my application:
Currently, there is not much traffic. 500-600 request in a day. Most of the request is used to communicate with MS SQL by querying records, adding, etc.
As well I am using MS Websocket, but high CPU happens when no WebSocket client is connected to web service, so I hardly believe that it's a cause. I tried to use apache ab for load testing, but there isn't any pattern, that after one request's load test, I would get high CPU. So sometimes happens, sometimes don't during load testing.
So I just update screenshot of processes, I see that lots of threads are being locked/used during the time when fluent migrator start running its logging.
Update*
I will remove fluent migrator logging middleware from Configure method. Will look forward with the situation.
UPDATE**
So I removed logging of FluentMigrator. Until now I didn't notice any CPU usage over 90%.
But still, I am confused. My CPU usage is spinning. Is it health CPU usage graph or not?
Also, I tried to make a load test on the websocket server.
I made a script that calls some functions of WebSocket every 100ms from 6-7 clients. So every 100ms there are 7 calls to WebSocket server from different clients, every function within itself queries some data/insert (approximately 3-4 queries of every WebSocket function).
What I did notice, on Azure S1 DTU 20 after 2min I am getting out of SQL pool connections, If I increase DTU to 100, it handles 7 clients properly without any errors of 'no connection pool'.
So the first question: is it a normal CPU spinning?
Second: should I get an error message of 'no SQL connection free' using this kind of load test on DTU 10 Azure SQL. I am afraid that when creating a scoped service on singleton WebSocket Service I am leaking connections.
This topic gets too long, maybe I should move it to a new topic?
-
At this stage I would say you need to profile your application and figure out what areas of your code are CPU intensive. In the past I have used dotTrace, this highlighted methods which are the most expensive with a call tree.
Once you know what areas of your code base are the least efficient, you can begin to refactor them so that they are more efficient. This could simply be changing some small operations, adding caching for queries or using distributed locking for example.
I believe the reason the other DLLs are showing CPU usage is because your code calling methods which are within those DLLs.

Diagnosing Sporadic Lockups in Website Running on IIS

Goal
Determine the cause of the sporadic lock ups of our web application running on IIS.
Problem
An application we are running on IIS sporadically locks up throughout the day. When it locks up it will lock up on all workers and on all load balanced instance.
Environment and Application
The application is running on 4 different Windows Server 2016 machines. The machines are load balanced using ha-proxy using a round robin load balancing scheme. The IIS application pools this website is hosted in are configured to have 4 workers each and the application it hosts is a 32-bit application. The IIS instances are not using a shared configuration file but the application pools for this application are all configured the same.
This application is the only application in the IIS application pool. The application is an ASP.NET web API and is using .NET 4.6.1. The application is not creating threads of its own.
Theory
My theory for why this is happening is that we have requests that are coming in that are taking ~5-30 minutes to complete. Every machine gets tied up servicing these requests so they look "locked up". The company rolled their own logging mechanism and from that I can tell we have requests that are taking ~5-30 minutes to complete. The team responsible for the application has cleaned up many of these but I am still seeing ~5 minute requests in the log.
I do not have access to the machines personally so our systems team has gotten memory dumps of the application when this happens. In the dumps I generally will see ~50 threads running and all of them are in our code. These threads will be all over our application and do not seem to be stopped on any common piece of code. When the application is running correctly the dumps have 3-4 threads running. Also I have looked at performance counters like the ASP.NET\Requests Queued but it never seems to have any requests queued. During these times the CPU, Memory, Disk and Network usage look normal. Using windbg none of the threads seem to have a high CPU time other than the finalizer thread which as far as I know should live the entire time.
Conclusion
I am looking for a means to prove or disprove my theory as to why we are locking up as well as any metrics or tools I should look at.
So this issue came down to our application using query in stitch on a table with 2,000,000 records in it to another table. Memory would become so fragmented that the Garbage Collector was spending more time trying to find places to put objects and moving them around than it was running our code. This is why it appeared that our application was still working and why their was no exceptions. Oddly IIS would time out the requests but would continue processing the threads.

High CPU Utilisation Cassandra (Native Transport Request)

We are running cassandra version 2.0.9 in production. It's a 4 node cluster. For the past few days we are experiencing a high spike in CPU Utilisation. You may see in the picture below.
This is the jconsole output.
When we looked into the threads which are eating a lot of CPU we came across Native Transport request these are eating a lot of CPU (Like 12%) which is huge.
Thread stack trace.
Threads info.
Thread CPU%.
What can the problem be how should we go about debugging it?
Why are majority of NTR request stuck on BCrypt.java? Is this the problem?
The cluster was behaving normally a few days back but now out of 4 nodes 3 are always on high CPU Utilisation.
You have authentication enabled which stores bcrypted hash, not the password. So each request needs to to be checked. This will end up being a CPU issue if you are continually creating new connections instead of reusing an authenticated session. Sessions are long lived objects and should be by default (https://github.com/datastax/php-driver/tree/master/features#persistent-sessions) but if using CGI or something constantly creating new processes you will still have issues. Maybe try php-fpm ?

Web Api Requests Queueing up forever on IIS (in state: ExecuteRequestHandler)

I'm currently experiencing some hangs on production environment, and after some investigation I'm seeing a lot of request queued up in the worker process of the Application Pool. The common thing is that every request that is queued for a long time is a web api request, I'm using both MVC and Web API in the app.
The requests are being queued for about 3 hours, when the application pool is recycled they immediately start queueing up.
They are all in ExecuteRequestHandler state
Any ideas for where should I continue digging?
Your requests can be stalled for a number of reasons:
they are waiting on I/O operation e.g database, web service call
they are looping or performing operations on a large data set
cpu intensive operations
some combination of the above
In order to find out what your requests are doing, start by getting the urls of the requests taking a long time.
You can do this in the cmd line as follows
c:\windows\system32\inetsrv\appcmd list requests
If its not obvious from the urls and looking at the code, you need to do a process dump of the w3wp.exe on the server. Once you have a process dump you will need to load it into windbg in order to analyze what's taking up all the cpu cycles. Covering off windbg is pretty big, but here's briefly what you need to do:
load the SOS dll (managed debug extension)
call the !runaway command
to get list of long running threads dive into a long running thread
by selecting it and calling !clrstack command
There are many blogs on using windbg. Here is one example. A great resource on analyzing these types of issues is Tess Ferrandez's blog.
This is a really long shot without having first hand access to your system but try and check the Handler mappings in IIS Manager gui for your WebApi. Compare it with IIS settings of your DEV or any other Env where it works.
IF this isnt the issue then do a comparison of all other IIS settings for that App.
Good luck.

Node.JS web worker memory leak?

I've got a Node.JS application that spawns a number of Web Workers.
I'm seeing what looks like a slow memory leak, but I don't think it's my code. Even if I comment out the code entirely, and I just have a web worker that accepts messages and returns nothing, the memory leak still occurs!
The problem seems to be that I'm sending large messages. Often they are 1MB of JSON or more. Eventually the Workers balloon up from 6MB up to 25MB, and I'm not sure it will stop there.
Is this a known problem with Node.JS web workers? Is there a workaround?
The workers are managed with a pool abstraction. Should I just kill them off and spawn new ones from time to time?
EDIT: I'm thinking maybe it's the particular pool library I used,backgrounder. No obvious culprits in the code, though.

Resources