ASP.NET Core 2.2 experiencing high CPU usage - azure

So I have hosted core 2.2 web service on Azure(S2 plan). The problem is that my application sometimes getting high CPU usage(almost 99%). What I have done for now - checked process explorer on azure. I see there a lot of processes who are consuming CPU. Maybe someone knows if it's okay for these processes consume CPU?
Currently, I don't have an idea where do they come from. Maybe it's normal to have them here.
Shortly about my application:
Currently, there is not much traffic. 500-600 request in a day. Most of the request is used to communicate with MS SQL by querying records, adding, etc.
As well I am using MS Websocket, but high CPU happens when no WebSocket client is connected to web service, so I hardly believe that it's a cause. I tried to use apache ab for load testing, but there isn't any pattern, that after one request's load test, I would get high CPU. So sometimes happens, sometimes don't during load testing.
So I just update screenshot of processes, I see that lots of threads are being locked/used during the time when fluent migrator start running its logging.
I will remove fluent migrator logging middleware from Configure method. Will look forward with the situation.
So I removed logging of FluentMigrator. Until now I didn't notice any CPU usage over 90%.
But still, I am confused. My CPU usage is spinning. Is it health CPU usage graph or not?
Also, I tried to make a load test on the websocket server.
I made a script that calls some functions of WebSocket every 100ms from 6-7 clients. So every 100ms there are 7 calls to WebSocket server from different clients, every function within itself queries some data/insert (approximately 3-4 queries of every WebSocket function).
What I did notice, on Azure S1 DTU 20 after 2min I am getting out of SQL pool connections, If I increase DTU to 100, it handles 7 clients properly without any errors of 'no connection pool'.
So the first question: is it a normal CPU spinning?
Second: should I get an error message of 'no SQL connection free' using this kind of load test on DTU 10 Azure SQL. I am afraid that when creating a scoped service on singleton WebSocket Service I am leaking connections.
This topic gets too long, maybe I should move it to a new topic?

At this stage I would say you need to profile your application and figure out what areas of your code are CPU intensive. In the past I have used dotTrace, this highlighted methods which are the most expensive with a call tree.
Once you know what areas of your code base are the least efficient, you can begin to refactor them so that they are more efficient. This could simply be changing some small operations, adding caching for queries or using distributed locking for example.
I believe the reason the other DLLs are showing CPU usage is because your code calling methods which are within those DLLs.


Is it a good practice to use multithreading to handle requests in bulk in a micro services architecture systems?

I have to design a micro service which performs search query in a sql db multiple times(say 7 calls) along with multiple third party http calls(say 8 calls) in sequential and interleaved manner to complete an order, by saying sequential I mean before next call of DB or third party previous call to DB or third party must be completed as the result of these calls will be used in further third party or search operations in DB.
I) CPU: 4 cores(per instance)
II) RAM: 4 GB(per instance)
III) It can be auto scaled upto at max of 4 pods or instances.
IV) Deployment: Open Shift (Own cloud architecture)
V) Framework: Spring Boot
My Solution:
I've created a fixed thread pool of 5 threads(Size of blocking queue is not configured, also there are another 20 fixed pool threads running apart from these 5 threads for creating orders of multiple types i.e. in total there are 25 threads running per instance) using thread pool executor of Java. So when multiple requests are sent to this micro service I keep submitting the job and the JVM by using some scheduling algorithms schedules these jobs and complete the jobs.
I'm not able to achieve the expected through put, using above approach the micro service is able to achieve only 3 to 5 tps or orders per second which is very low. Sometimes it also happens that tomcat gets choked and we have to restart services to bring back the system in responsive situation.
I've observed that even when orders are processed very slowly by the thread pool executor if I call orders api through jmeter at the same time when things are going slow, these kind of requests which are directly landing on the controller layer are processed faster than the request getting processed by thread pool executor.
My Questions
I) What changes I should make at the architectural level to make through put upto 50
to 100 tps.
II) What changes should be done so that even if traffic on this service increases in
future then the service can either be auto scaled or justification to increase
hardware resources can be given easily.
III) Is this the way tech giants(Amazon, Paypal) solve scaling problems like these
using multithreading to optimise performance of their code.
You can assume that third parties are responding as expected and query optimisation is already done with proper indexing.
Tomcat already has a very robust thread pooling algorithm. Making your own thread pool is likely causing deadlocks and slowing things down. The java threading model is non-trivial, and you likely are causing more problems than you are solving. This is further evidenced by the fact that you are getting better performance relying on Tomcat's scheduling when you hit the controller directly.
High-volume services generally solve problems like this by scaling wide, keeping things as stateless as possible. This allows you to allocate many small servers to solve the solution much more efficiently than a single large server.
Debugging multi-threaded executions is not for the faint of heart. I would highly recommend you simplify things as much as possible. The most important bit about threading is to avoid mutable state. Mutable state is the bane of shared executions, moving memory around and forcing reads through to main memory can be very expensive, often costing far more than savings due to threading.
Finally, the way you are describing your application, it's all I/O bound anyway. Why are you bothering with threading when it's likely I/O that's slowing it down?

Diagnosing Sporadic Lockups in Website Running on IIS

Determine the cause of the sporadic lock ups of our web application running on IIS.
An application we are running on IIS sporadically locks up throughout the day. When it locks up it will lock up on all workers and on all load balanced instance.
Environment and Application
The application is running on 4 different Windows Server 2016 machines. The machines are load balanced using ha-proxy using a round robin load balancing scheme. The IIS application pools this website is hosted in are configured to have 4 workers each and the application it hosts is a 32-bit application. The IIS instances are not using a shared configuration file but the application pools for this application are all configured the same.
This application is the only application in the IIS application pool. The application is an ASP.NET web API and is using .NET 4.6.1. The application is not creating threads of its own.
My theory for why this is happening is that we have requests that are coming in that are taking ~5-30 minutes to complete. Every machine gets tied up servicing these requests so they look "locked up". The company rolled their own logging mechanism and from that I can tell we have requests that are taking ~5-30 minutes to complete. The team responsible for the application has cleaned up many of these but I am still seeing ~5 minute requests in the log.
I do not have access to the machines personally so our systems team has gotten memory dumps of the application when this happens. In the dumps I generally will see ~50 threads running and all of them are in our code. These threads will be all over our application and do not seem to be stopped on any common piece of code. When the application is running correctly the dumps have 3-4 threads running. Also I have looked at performance counters like the ASP.NET\Requests Queued but it never seems to have any requests queued. During these times the CPU, Memory, Disk and Network usage look normal. Using windbg none of the threads seem to have a high CPU time other than the finalizer thread which as far as I know should live the entire time.
I am looking for a means to prove or disprove my theory as to why we are locking up as well as any metrics or tools I should look at.
So this issue came down to our application using query in stitch on a table with 2,000,000 records in it to another table. Memory would become so fragmented that the Garbage Collector was spending more time trying to find places to put objects and moving them around than it was running our code. This is why it appeared that our application was still working and why their was no exceptions. Oddly IIS would time out the requests but would continue processing the threads.

I'm not sure how to correctly configure my server setup

This is kind of a multi-tiered question in which my end goal is to establish the best way to setup my server which will be hosting a website as well as a service (using for an iOS (and eventually an Android) app. Both the app service and the website are going to be written in node.js as I need high concurrency and scaling for the app server and I figured whilst I'm at it may as well do the website in node because it wouldn't be that much different in terms of performance than something different like Apache (from my understanding).
Also the website has a lower priority than the app service, the app service should receive significantly higher traffic than the website (but in the long run this may change). Money isn't my greatest priority here, but it is a limiting factor, I feel that having a service that has 99.9% uptime (as 100% uptime appears to be virtually impossible in the long run) is more important than saving money at the compromise of having more down time.
Firstly I understand that having one node process per cpu core is the best way to fully utilise a multi-core cpu. I now understand after researching that running more than one per core is inefficient due to the fact that the cpu has to do context switching between the multiple processes. How come then whenever I see code posted on how to use the in-built cluster module in node.js, the master worker creates a number of workers equal to the number of cores because that would mean you would have 9 processes on an 8 core machine (1 master process and 8 worker processes)? Is this because the master process usually is there just to restart worker processes if they crash or end and therefore does so little it doesnt matter that it shares a cpu core with another node process?
If this is the case then, I am planning to have the workers handle providing the app service and have the master worker handle the workers but also host a webpage which would provide statistical information on the server's state and all other relevant information (like number of clients connected, worker restart count, error logs etc). Is this a bad idea? Would it be better to have this webpage running on a separate worker and just leave the master worker to handle the workers?
So overall I wanted to have the following elements; a service to handle the request from the app (my main point of traffic), a website (fairly simple, a couple of pages and a registration form), an SQL database to store user information, a webpage (probably locally hosted on the server machine) which only I can access that hosts information about the server (users connected, worker restarts, server logs, other useful information etc) and apparently nginx would be a good idea where I'm handling multiple node processes accepting connection from the app. After doing research I've also found that it would probably be best to host on a VPS initially. I was thinking at first when the amount of traffic the app service would be receiving will most likely be fairly low, I could run all of those elements on one VPS. Or would it be best to have them running on seperate VPS's except for the website and the server status webpage which I could run on the same one? I guess this way if there is a hardware failure and something goes down, not everything does and I could run 2 instances of the app service on 2 different VPS's so if one goes down the other one is still functioning. Would this just be overkill? I doubt for a while I would need multiple app service instances to support the traffic load but it would help reduce the apparent down time for users.
Maybe this all depends on what I value more and have the time to do? A more complex server setup that costs more and maybe a little unnecessary but guarantees a consistent and reliable service, or a cheaper and simpler setup that may succumb to downtime due to coding errors and server hardware issues.
Also it's worth noting I've never had any real experience with production level servers so in some ways I've jumped in the deep end a little with this. I feel like I've come a long way in the past half a year and feel like I'm getting a fairly good grasp on what I need to do, I could just do with some advice from someone with experience that has an idea with what roadblocks I may come across along the way and whether I'm causing myself unnecessary problems with this kind of setup.
Any advice is greatly appreciated, thanks for taking the time to read my question.

Transaction Per Second not increasing with more threads or servers using WCF

I have a windows service which wakes up everyday at a particular time and finds around 100k transactions that it needs to process. It will spawn 25 threads, which look at the bucket of transactions that need to be processed and will make a call to a WCF service.
This WCF service will do some internal processing and make a synchronous call to an external service (which we have mocked and written an emulator for the sake of volume testing). Using this setup for shorter runs for around 10k transactions we were able to achieve a TPS of around 10.
I scaled this setup to have three load balanced servers running our WCF services and two other servers running the emulator, also we increased the number of threads on the windows service to 75 threads. With this new setup we expected an increase in performance, but the TPS is still at 10.
I have performance monitor running on all five machines. The three loaded balanced servers which have the WCF service are showing an "Outstanding Calls" of around 25 constantly in "ServiceModelService" category for the WCF service. But the two servers which have the emulators running show only around 9 "Outstanding Calls" constantly for the mocked out service. This same emulator was showing around 20 "Outstanding Calls" when it was running on a single server.
My questions are:
Why is there no increase in TPS in the three load balanced machines setup?
Where is the bottleneck in this system?
The target is to get to a TPS of around 30 with the three loaded balanced servers running the WCF service.
Note: I have increased the maxconnection limit in the web config on the WCF service and windows service to 200 which increased the TPS from around 6 to the current value of 10.
Edit: More information, if each of the load balanced server has 25 outstanding calls, shouldn't the mocked external service have 3*25=75 outstanding calls?
By maxconnection limit I meant:
<add address="*" maxconnection="200" />
You probably can't give us enough information to diagnose the problem accurately. However, what you describe gives me enough to recommend some places to look.
First, you probably shouldn't be spawning 25 (or more) threads in your Windows service. Rather, you can have a single thread looking at the "bucket of transactions," and make asynchronous calls to the WCF service. You can control the number of concurrent transactions using a Semaphore. Something like:
Semaphore _transactionSemaphore = new Semaphore(25, 25);
while (transactionCount > 0)
transactionSemaphore.WaitOne(); // wait for a free spot
var transaction = getTransaction();
And the async completed event (see the above link about asynchronous calls) releases the semaphore:
void AsyncCompletedEvent(...)
// do after-call processing
// and then release the semaphore
When transactionCount gets to 0, you have to wait for all outstanding calls to complete. You do that by repeatedly waiting on the semaphore: 25 times. That is:
for (int i = 0; i < 25; ++i)
If your main thread has gobbled up the semaphore, then you know that there can't be any outstanding calls.
You can extend that concurrent transaction count to 50 or 75, or whatever value you like. Your hardware being able to handle it, of course.
The difference here is that asynchronous service calls use I/O completion ports rather than individual threads. Allocating a thread that just sits there and waits is very wasteful. With the I/O completion port, the only time a thread gets involved is when the call completes--in the async completed method. Those threads are allocated automatically by the thread pool.
If the service is constantly showing 25 outstanding calls, then the total of outstanding calls for all of the servers better not be more than that. If the WCF services are showing more outstanding transactions than the Windows service is showing, then you have a problem. Come to think of it, if the service is showing more outstanding calls than do the load balanced servers, you also have a problem. If the two don't match, then somebody's dropping something: either the Windows service thinks that it has outstanding calls that the WCF services think are filled, or vice-versa.
It's hard to say where the bottleneck is. If you're experiencing high CPU usage on the Windows service machine, that's probably your bottleneck. You say that the WCF services call yet another external service. That external service could be the bottleneck. Depending on how you mocked it, the mock could be the bottleneck. Have you determined how much time that external service takes? That the services running the mocked service seem to have lower throughput than the WCF service that's talking to the real service makes me think there's a problem with performance of your mock.
I suppose it's possible that your WCF services aren't properly cleaning up resources, and they are spending an inordinate amount of time in garbage collection. Have you verified that you're using the server garbage collector? I think that's the default for WCF services, but you need to check.
Given the information you've provided, I consider those the most likely possible bottlenecks.
One other thing. It's incredibly wasteful to have a Windows service that does nothing but sit there and once a day "wakes up" to process some transactions. You should make that a console application and schedule a task to run it once per day. You can use the Task Scheduler GUI, or you can schedule it with the schtasks command. See Programs are not cats.
Another benefit of making your program a console app rather than a service is that it's a whole lot easier to debug a console app.

How does one configure "Autoscale" to deal with Web instances which have long wait times due to external processes?

I am using MVC3, ASP.NET4.5, C#, Razor, EF6.1, SQL Azure
I have been doing some load testing using JMeter, and I have found some surprising results.
I have a test of 30 concurrent users, ramping up over 10 secs. The test plan is fairly simple:
Navigate to page
Do query
Navigate back
I am using "small" "standard" instances.
I have noticed that web instances may be waiting on external processes, such as databases queries, so the web CPU could be low, but it is still a bottleneck. The CPU could be idling at 40% while waiting for a result set from the DB. So this could also be a reason why the extra instance may not be triggered. Actually this is a real issue. How do you trigger extra instance based on longer wait times? At the moment the only way round this is to have 2 instances up there permanently, or proactively set it up against a schedule.
Use async calls and you won't have to worry about scaling up. The waiting threads will be asleep, freeing up resources to handle other users.
If you still see lengthened response times after that it's probably the external process that's choking and in need of being scaled up
