Is it possible to identify what is blocking an await? - azure

I've got an intermittent problem on our application (.net core, mongo database and hosted with Azure).
Occasionally requests appear to get stuck. A profiler trace using application insights shows this:
So the application is being held at an await for 17 seconds, before running the query for 63ms.
Is it possible, or is there a technique for finding out why it's hanging there and what's blocking it? (note, I'm not asking specifically why it's hanging at this point...but how to find out)

Related

ASP.NET WebApp in Azure using lots of CPU

We have a long running ASP.NET WebApp in Azure which has no real endpoints exposed – it serves a single functional purpose primarily reading and manipulating database data, effectively a batched, scheduled task, triggered by a timer every 30 seconds.
The app runs fine most of the time but we are seeing occasional issues where the CPU load for the app goes close to the maximum for the AppServicePlan, instantaneously rather than gradually, and stops executing any more timer triggers and we cannot find anything explicitly in the executing code to account for it (no signs of deadlocks etc. and all code paths have try/catch so there should be no unhandled exceptions). More often than not we see errors getting a connection to a database but it’s not clear if those are cause or symptoms.
Note, this is the only resource within the AppService Plan. The Azure SQL database is in the same region and whilst utilised by other apps is very lightly used by them and they also exhibit none of the issues seen by the problem app.
It feels like this is infrastructure related but we have been unable to find anything to explain what is happening so if anyone has any suggestions for where we should be looking they would be gratefully received. We have enabled basic Application Insights (not SDK) but other than seeing CPU load spike prior to loss of app response there is little information of interest given our limited knowledge of how to best utilise Insights.
According to your description, I thought of two points to troubleshoot your problem. First of all, you can track the running status of your program through the code, and put a log at the beginning and end of your batch scheduled tasks to record the status of each run. If possible, record request and response information and start and end information. This can completely record the time and running status of your task.
Secondly, you can record logs before the program starts database operations, and whether the database connection is successful. The best case is to be able to record, what business will trigger CPU load when operating, and track the specific operating conditions, in order to specifically analyze what causes the database connection failure.
Because you cannot reproduce your problem, you can only guess the cause of the problem. If you still can't find where the problem is through the above two points, then modify your timer appropriately, and let the program trigger once every 5 minutes instead of 30s.

ASP.NET Core 2.2 experiencing high CPU usage

So I have hosted asp.net core 2.2 web service on Azure(S2 plan). The problem is that my application sometimes getting high CPU usage(almost 99%). What I have done for now - checked process explorer on azure. I see there a lot of processes who are consuming CPU. Maybe someone knows if it's okay for these processes consume CPU?
Currently, I don't have an idea where do they come from. Maybe it's normal to have them here.
Shortly about my application:
Currently, there is not much traffic. 500-600 request in a day. Most of the request is used to communicate with MS SQL by querying records, adding, etc.
As well I am using MS Websocket, but high CPU happens when no WebSocket client is connected to web service, so I hardly believe that it's a cause. I tried to use apache ab for load testing, but there isn't any pattern, that after one request's load test, I would get high CPU. So sometimes happens, sometimes don't during load testing.
So I just update screenshot of processes, I see that lots of threads are being locked/used during the time when fluent migrator start running its logging.
Update*
I will remove fluent migrator logging middleware from Configure method. Will look forward with the situation.
UPDATE**
So I removed logging of FluentMigrator. Until now I didn't notice any CPU usage over 90%.
But still, I am confused. My CPU usage is spinning. Is it health CPU usage graph or not?
Also, I tried to make a load test on the websocket server.
I made a script that calls some functions of WebSocket every 100ms from 6-7 clients. So every 100ms there are 7 calls to WebSocket server from different clients, every function within itself queries some data/insert (approximately 3-4 queries of every WebSocket function).
What I did notice, on Azure S1 DTU 20 after 2min I am getting out of SQL pool connections, If I increase DTU to 100, it handles 7 clients properly without any errors of 'no connection pool'.
So the first question: is it a normal CPU spinning?
Second: should I get an error message of 'no SQL connection free' using this kind of load test on DTU 10 Azure SQL. I am afraid that when creating a scoped service on singleton WebSocket Service I am leaking connections.
This topic gets too long, maybe I should move it to a new topic?
-
At this stage I would say you need to profile your application and figure out what areas of your code are CPU intensive. In the past I have used dotTrace, this highlighted methods which are the most expensive with a call tree.
Once you know what areas of your code base are the least efficient, you can begin to refactor them so that they are more efficient. This could simply be changing some small operations, adding caching for queries or using distributed locking for example.
I believe the reason the other DLLs are showing CPU usage is because your code calling methods which are within those DLLs.

Random 503 errors in Azure Mobile Services

At certain times during the week while I'm testing my Mobile Services app I get a 503 error (Service Unavailable). It happens whether I try to call the app from localhost or live on my Azure Website. It hangs around for 10-15 minutes and then goes away on its own. It doesn't seem to be caused by anything in particular that I am doing (i.e. I have not updated any code). The 503 error occurs when I'm trying to call one of my custom APIs in my Mobile Services account. A few of the requests make it through (strangely enough) but the majority return a 503 error.
I've seen that someone had a very similar problem here (Why does Azure give me an intermittent Error 503. The service is unavailable?) without an acceptable resolution.
I am using the free version of Mobile Services but I should be no where near pushing the limits of what the free version can handle; I am the sole user of the app right now.
It will soon be time to make the service live and I'm shuddering at the thought of support calls that will come in during one of these funky states the service gets into. Any help in debugging the problem would be greatly appreciated.
EDIT:
I've narrowed this down to a database problem. I have one main query (sproc) that I use to feed data to the UI. I noticed that when I get the 503 errors the query takes about 13 seconds (when run in SSMS). When things are running "normally", the query takes less than a second.
This doesn't solve my problem though, in fact it makes it more perplexing because I am using the Business Edition of Windows Azure SQL Database and there shouldn't be a 13 second fluctuation in execution time!
This problem seems to happen randomly. Is there some kind of caching in SQL Server that could explain this? Maybe my query really does take 13 seconds to execute and the caching superficially speeds it up.
Could you try transitioning your database/server to one of the "editions"? They have resource governance to promote predictable performance. Web/Business suffer from a noisy neighbor problem. It sounds like that may be your issue, considering it is intermittent.
Here's a link to a page describing the editions. https://msdn.microsoft.com/en-us/library/azure/dn741340.aspx

How to investigate cherrypy crashing?

We have a cherrypy service that integrates with several backend web services. During load testing cherrypy process is regularly crashed after a while (45 minutes). We know the bottleneck is the backend web services we are using. Before crashing we see 500 and 503 errors when accessing the backend services, but I can't figure why cherrypy itself will crash (the whole process was killed). Can you give me ideas how to investigate where the problem is? Is it possible that the thread_poll (50) is queueing up too many requests?
In my early CherryPy days I had it crashing once. I mean a Python process crash caused by a segfault. When I investigated it I found that I messed with MySQLdb connections, caching them in objects which were accessed by CherryPy threads interchangeably. Because a MySQLdb connection is not thread-safe it should be accessed only from the thread in was created in. Also because of concurrency involved the crashes seemed nondeterministic, and only appeared in load-testing. So load-testing can work as a debugging tool here -- try Apache JMeter or Locust (Pythonic).
When a process crashes you can instruct Linux to write a core dump which will have a stack trace (e.g. on MySQLdb C-code side in my example). However alien low-level C environment is to you (it is to me), the stack trace can help find what library is causing the crash or at least narrow a circle of suspects. Here is an article about it.
Also I want to note that unlikely problem is in CherryPy. It is actually very stable.

IIS 6 App Pools not responding to multiple requests / not running multi-threaded

I have a classic ASP application that has been stable for years and now we're having all kinds of problems with it. After moving the app between machines and wiping the original so we could have a fresh install of windows, we've come to the following "symptom". The app pools do not appear to allow for multiple simultaneous requests. Here's what we are seeing:
The app runs normally for most people, but when someone within one of the app pools accesses a long-running script (usually one with lots of DB access) all of the other users in the pool must wait for that script to complete. Once the script completes, everyone else's requests run. This initially made us suspect the DB connection string or something.
UNTIL... we noticed also that large file uploads into our system also cause the app pool to stop responding. What's interesting about this is that we're using the SAFileup COM+ object to do our uploads, which has a progress display in a pop-up window. When you go to upload the file, the progress display comes up, but then never refreshes to show upload progress. If you wait it out, however, the file will eventually upload and the other pending requests will process as normal.
Our app pools are in the default configuration, using the IWAM account to launch. I checked to ensure that the IWAM account has all the appropriate permissions. It does.
We've tried a variety of DB connection strings, none solved the problem (though I'm thinking it's not the DB connection string). Just in case someone thinks it is, here's our connection string: "Provider=SQLNCLI;Trusted_Connection=yes;Server=(local);Database=demo;". It couldn't be simpler. This string was previously not a problem.
I fussed with the web gardens thing and it does, indeed, make the system respond to multiple requests, but each worker thread in the garden has its own session state which causes our users to get booted when their request gets randomly assigned to a new worker thread. Only having a single worker process in the garden was never an issue before anyway.
I've used SQL Profiler and sp_who2 to see if during the long-running scripts there are any deadlocks or blocks on the SQL Server. There are not.
The issues initially started after we had installed some patches from Microsoft. We wiped a machine clean and installed Win2k3 server, then SP2, and then didn't patch anymore after that. The problem remained, so it doesn't appear to have been a patch.
I'm pretty much at a loss now... does anyone have any experience with similar issues? If so, how were they fixed?
Check that you don't have ASP debugging enabled on the server. This will force the ASP script engine to run on a single thread.
Sounds like an limit on the number of concurrent incoming requests to the IIS or the Windows Server..
Check out http://blogs.msdn.com/b/david.wang/archive/2006/04/12/howto-maximize-the-number-of-concurrent-connections-to-iis6.aspx and http://forums.iis.net/p/1152112/1880908.aspx#1880908 on how to tweak the settings.

Resources