IIS Connection Pool interrogation/leak tracking - iis

Per this helpful article I have confirmed I have a connection pool leak in some application on my IIS 6 server running W2k3.
The tough part is that I'm serving 300 websites written by 700 developers from this server in 6 application pools, 50% of which are .NET 1.1 which doesn't even show connections in the CLR Data performance counter. I could watch connections grow on my end if everything were .NET 2.0+, but I'm even out of luck on that slim monitoring tool.
My 300 websites connect to probably 100+ databases spread out between Oracle, SQLServer and outliers, so I cannot watch the connections from the database end either.
Right now my best and only plan is to do a loose binary search for my worst offenders. I will kill application pools and slowly remove applications from them until I find which individual applications result in the most connections dropping when I kill their pool. But since this is a production box and I like continued employment, this could take weeks as a tracing method.
Does anyone know of a way to interrogate the IIS connection pools to learn their origin or owner? Is there an MSMQ trigger I might be able to which I might be able to attach when they are created? Anything silly I'm overlooking?
Kevin
(I'll include the error code to facilitate others finding your answers through search:
Exception: System.InvalidOperationException
Message: Timeout expired. The timeout period elapsed prior to obtaining a connection from the pool. This may have occurred because all pooled connections were in use and max pool size was reached.)

Try starting with this first article from Bill Vaughn.

Todd Denlinger wrote a fantastic class http://www.codeproject.com/KB/database/connectionmonitor.aspx which watches Sql Server connections and reports on ones that have not been properly disposed within a period of time. Wire it into your site, and it will let you know when there is a leak.

Related

Azure search - connection timeout

I've had a couple of random instances in a 1 hour period, of the Azure Search service returning a connection timeout, it is being called from a .net core web application running as an Azure App Service.
App Insights has a dependency failure for the same time (a POST to /indexes('products')/docs/search.post.search?api-version=2019-05-06) with a response of "Faulted".
Any help/idea on why this happened and how I can prevent would be appreciated.
You could be attempting to retrieve too much data at once. Or you may have a throttling issue because of too much traffic. The reason for the timeout is not possible to determine without more context.
To avoid timeouts, you could optimize and resolve your root cause by reducing the response size, limiting the number of requests, or addressing your issue's root cause.
Also, consider implementing a retry mechanism with exponential backoff. See this thread for information: Azure Search .net SDK- How to use "FindFailedActionsToRetry"?
As Dan mentioned, it is recommended to use retries since failures due to network or many other reasons can happen and this will help improve your app availability. However, if you are seeing failures happen repeatedly or need more information then please open a support issue so the support team can investigate it further.

ASP.NET Core 2.2 experiencing high CPU usage

So I have hosted asp.net core 2.2 web service on Azure(S2 plan). The problem is that my application sometimes getting high CPU usage(almost 99%). What I have done for now - checked process explorer on azure. I see there a lot of processes who are consuming CPU. Maybe someone knows if it's okay for these processes consume CPU?
Currently, I don't have an idea where do they come from. Maybe it's normal to have them here.
Shortly about my application:
Currently, there is not much traffic. 500-600 request in a day. Most of the request is used to communicate with MS SQL by querying records, adding, etc.
As well I am using MS Websocket, but high CPU happens when no WebSocket client is connected to web service, so I hardly believe that it's a cause. I tried to use apache ab for load testing, but there isn't any pattern, that after one request's load test, I would get high CPU. So sometimes happens, sometimes don't during load testing.
So I just update screenshot of processes, I see that lots of threads are being locked/used during the time when fluent migrator start running its logging.
Update*
I will remove fluent migrator logging middleware from Configure method. Will look forward with the situation.
UPDATE**
So I removed logging of FluentMigrator. Until now I didn't notice any CPU usage over 90%.
But still, I am confused. My CPU usage is spinning. Is it health CPU usage graph or not?
Also, I tried to make a load test on the websocket server.
I made a script that calls some functions of WebSocket every 100ms from 6-7 clients. So every 100ms there are 7 calls to WebSocket server from different clients, every function within itself queries some data/insert (approximately 3-4 queries of every WebSocket function).
What I did notice, on Azure S1 DTU 20 after 2min I am getting out of SQL pool connections, If I increase DTU to 100, it handles 7 clients properly without any errors of 'no connection pool'.
So the first question: is it a normal CPU spinning?
Second: should I get an error message of 'no SQL connection free' using this kind of load test on DTU 10 Azure SQL. I am afraid that when creating a scoped service on singleton WebSocket Service I am leaking connections.
This topic gets too long, maybe I should move it to a new topic?
-
At this stage I would say you need to profile your application and figure out what areas of your code are CPU intensive. In the past I have used dotTrace, this highlighted methods which are the most expensive with a call tree.
Once you know what areas of your code base are the least efficient, you can begin to refactor them so that they are more efficient. This could simply be changing some small operations, adding caching for queries or using distributed locking for example.
I believe the reason the other DLLs are showing CPU usage is because your code calling methods which are within those DLLs.

Azure Web Site CPU High at random intervals of the day

I have a Azure Web Site running for 6 months and on Friday 1st April 2016 at 09:50pm the CPU was very high and this had a impact on the performance of the web site. Stopping and restarting the web service solved the problem but it came back at 13:00pm. Since then the CPU has stayed high and making the web site un-useable
I've tried all monitoring tools, Daas, Event Logs, checked for Open Connections and ensure my software is closing or disposing objects correctly.
But the CPU is still high. Only way to resolve is to restart the web service but I dont want to keep doing this.
Has anyone else experience a similar problem and what was the solutions.
The only thing from the event logs that look an issues is the odd "A network-related or instance-specific error occurred while establishing a connection to SQL Server", which could be because the SQL Aure is not available.
Please help
Hmmm, high cpu means that your web site is executing code, perhaps a wrong loop on some not frequent code path.
The brute force way to identify what code is being executed, would be to add tracing to your solution by System.Diagnostics.Trace.WriteLine("I am here") and then check the Azure Application Log.
Another way would be to attach the Visual Studio Debugger during high cpu and check what is being executed
The other way would be to take a dump or minidump from kudu site and analyze it with WinDbg:
1)What thread is conuming cpu:
!runaway
2) What is this thread doing:
!clrstack
hth,
Aldo

I'm not sure how to correctly configure my server setup

This is kind of a multi-tiered question in which my end goal is to establish the best way to setup my server which will be hosting a website as well as a service (using Socket.io) for an iOS (and eventually an Android) app. Both the app service and the website are going to be written in node.js as I need high concurrency and scaling for the app server and I figured whilst I'm at it may as well do the website in node because it wouldn't be that much different in terms of performance than something different like Apache (from my understanding).
Also the website has a lower priority than the app service, the app service should receive significantly higher traffic than the website (but in the long run this may change). Money isn't my greatest priority here, but it is a limiting factor, I feel that having a service that has 99.9% uptime (as 100% uptime appears to be virtually impossible in the long run) is more important than saving money at the compromise of having more down time.
Firstly I understand that having one node process per cpu core is the best way to fully utilise a multi-core cpu. I now understand after researching that running more than one per core is inefficient due to the fact that the cpu has to do context switching between the multiple processes. How come then whenever I see code posted on how to use the in-built cluster module in node.js, the master worker creates a number of workers equal to the number of cores because that would mean you would have 9 processes on an 8 core machine (1 master process and 8 worker processes)? Is this because the master process usually is there just to restart worker processes if they crash or end and therefore does so little it doesnt matter that it shares a cpu core with another node process?
If this is the case then, I am planning to have the workers handle providing the app service and have the master worker handle the workers but also host a webpage which would provide statistical information on the server's state and all other relevant information (like number of clients connected, worker restart count, error logs etc). Is this a bad idea? Would it be better to have this webpage running on a separate worker and just leave the master worker to handle the workers?
So overall I wanted to have the following elements; a service to handle the request from the app (my main point of traffic), a website (fairly simple, a couple of pages and a registration form), an SQL database to store user information, a webpage (probably locally hosted on the server machine) which only I can access that hosts information about the server (users connected, worker restarts, server logs, other useful information etc) and apparently nginx would be a good idea where I'm handling multiple node processes accepting connection from the app. After doing research I've also found that it would probably be best to host on a VPS initially. I was thinking at first when the amount of traffic the app service would be receiving will most likely be fairly low, I could run all of those elements on one VPS. Or would it be best to have them running on seperate VPS's except for the website and the server status webpage which I could run on the same one? I guess this way if there is a hardware failure and something goes down, not everything does and I could run 2 instances of the app service on 2 different VPS's so if one goes down the other one is still functioning. Would this just be overkill? I doubt for a while I would need multiple app service instances to support the traffic load but it would help reduce the apparent down time for users.
Maybe this all depends on what I value more and have the time to do? A more complex server setup that costs more and maybe a little unnecessary but guarantees a consistent and reliable service, or a cheaper and simpler setup that may succumb to downtime due to coding errors and server hardware issues.
Also it's worth noting I've never had any real experience with production level servers so in some ways I've jumped in the deep end a little with this. I feel like I've come a long way in the past half a year and feel like I'm getting a fairly good grasp on what I need to do, I could just do with some advice from someone with experience that has an idea with what roadblocks I may come across along the way and whether I'm causing myself unnecessary problems with this kind of setup.
Any advice is greatly appreciated, thanks for taking the time to read my question.

Connecting to Azure Cache Service takes about 3.3 seconds

In our still-in-development project we have noticed sudden delays when accessing our ASP.NET Web API services. Using the awesome Mini Profiler we nailed it that these delays are caused when connections to the Azure Data Cache (Preview) services are dropped and they have to be reestablished. This process takes about 3.3 seconds. After reconnecting, getting an object from the cache takes 1.4 ms.
When I increased maxConnectionsToServer from 1 to 20, I noticed another thing. If I don't make requests to the Web API for 1 or 2 minutes (that's usually when the connections are dropped) and then start making calls, next 20 requests are delayed for 3.3 seconds, which is how connection pooling works I guess (round-tripping the connections from the pool).
Both the Web API and Caching service are hosted in the East US region, we have disabled local cache, SSL is disabled, auto discover is enabled.
So, I'm wondering if something is wrong with our configuration or is this a thing because Azure Cache is still in preview?
Any information will be valued.
Thanks!
It sounds like your shared cache is being offloaded due to inactivity. One way to test this would be to add an In-Role Cache to an existing service (if available) and swap your cache usage to this new cache. In-Role cache is described here.
Once the cache is moved off of a shared offering, wait the requisite 1-2 minutes for idle time out and retry the connection, the delay should not be present.
Assuming you want to stick with the shared cache option after isolating the problem, the only current workaround that I am aware of is running a background task that will periodically ping the cache to keep it alive.
If you are running a full Web role you can launch a background task on application start up.
If you deploying via Mobile Services, then you can run the "ping" via Scheduled Jobs. The only issue you may run into here is that the minimum time for a scheduled job is 1 minute, which may not be aggressive enough to keep your cache alive 100% of the time.
Nothing that I see points to you doing anything wrong per se. It may be the Azure is genuinely having problems getting the cache connections up and running quickly. According to several best practices documents and MSDN posts, you want to increase your number of connections to caches to allow for a failover to an active connection, which you've effectively done with your configuration change.
Try making sure that your cache accessor is a static object (another MSDN recommendation) and this may be a long shot but consider using the Sliding Window option for object expiration and see if that not only tells the countdown for the object store to reset, but also prompts the cache service to reset the connection.

Resources