Azure search - connection timeout - azure

I've had a couple of random instances in a 1 hour period, of the Azure Search service returning a connection timeout, it is being called from a .net core web application running as an Azure App Service.
App Insights has a dependency failure for the same time (a POST to /indexes('products')/docs/search.post.search?api-version=2019-05-06) with a response of "Faulted".
Any help/idea on why this happened and how I can prevent would be appreciated.

You could be attempting to retrieve too much data at once. Or you may have a throttling issue because of too much traffic. The reason for the timeout is not possible to determine without more context.
To avoid timeouts, you could optimize and resolve your root cause by reducing the response size, limiting the number of requests, or addressing your issue's root cause.
Also, consider implementing a retry mechanism with exponential backoff. See this thread for information: Azure Search .net SDK- How to use "FindFailedActionsToRetry"?

As Dan mentioned, it is recommended to use retries since failures due to network or many other reasons can happen and this will help improve your app availability. However, if you are seeing failures happen repeatedly or need more information then please open a support issue so the support team can investigate it further.

Related

Limit Azure Function restart rate

I already faced similar problem few times:
Azure Function with ServiceBusTrigger by some reason (misconfiguration, infrastructure issues, doesn't really matter) fails to connect to ServiceBus (so it happens on trigger level) and it leads to two issues:
It tries to restart all the time, increasing CPU consumption
It generates literally a millions of exceptions in AppInsights, which leads to quota exceedance
Practically every error in configuration means significantly increased bills and requires thorough monitoring after every deployment, which is annoying and error prone solution.
So, my question: If there is a way to set some delay between restart attempts to (for example) one second? And, as addition - is there way to limit amount of restart attempts and then shut down the Function?
Establishing a connection to the broker to fetch messages is Functions responsibility, Scale Controller. That aspect is entirely abstracted from customers and not configurable. I suggest raising an issue with Azure Functions team, likely under the Runtime repo.

App services on azure seems to be very slow

I am trying to track down when our frontend started to work that slow. Recently I created new app services within the same service plan.
so now I have six apps (2 frontend, 4 backend) running under same App Service plan using Basic pricing tier. Also, we use Kudu for deployments.
Could that be the reason? or how to look for the reason?
this is overview of that service plan
appreciating any ideas and suggestions
#user122222 This is a high CPU issue and not a slow request issue as others have pointed out.
An immediate action you can take is to scale up. If you are using a B1 instance in the basic tier, try to scale up to a B3, which will provide you with more CPU cores and RAM. See if that provides you relief. If so, then you likely need to remain at this instance level. At this point it would also be worth while to analyze your number of requests. You should scale up when you are running many sites or resource intensive sites and you should scale out when you are receiving a high number of requests.
My money is on the fact that you likely have an issue with your code that is causing a deadlock or similar. Your CPU usage graph is stuck at 100% usage over many hours. Even an overloaded ASP will see a few dips over the course of a few hours.
To troubleshoot high CPU usage, start by using the diagnose and solve problems blade in your app service plan. This is the same troubleshooting tool that a support engineer would use in a paid technical support case. Use it to troubleshoot high CPU (not slow requests as based on your screenshot, it would appear the CPU is the culprit of the slow requests).
This can tell you what app in the ASP is causing the issue and sometimes even tell you the process in that app that is causing the issue. Beyond this, I'd suggest creating and analyzing a memory dump of the problematic web app. More steps on how to do that here.
Please try to restart the worker instance.
https://learn.microsoft.com/en-us/rest/api/appservice/app-service-plans/reboot-worker#code-try-0

ASP.NET WebApp in Azure using lots of CPU

We have a long running ASP.NET WebApp in Azure which has no real endpoints exposed – it serves a single functional purpose primarily reading and manipulating database data, effectively a batched, scheduled task, triggered by a timer every 30 seconds.
The app runs fine most of the time but we are seeing occasional issues where the CPU load for the app goes close to the maximum for the AppServicePlan, instantaneously rather than gradually, and stops executing any more timer triggers and we cannot find anything explicitly in the executing code to account for it (no signs of deadlocks etc. and all code paths have try/catch so there should be no unhandled exceptions). More often than not we see errors getting a connection to a database but it’s not clear if those are cause or symptoms.
Note, this is the only resource within the AppService Plan. The Azure SQL database is in the same region and whilst utilised by other apps is very lightly used by them and they also exhibit none of the issues seen by the problem app.
It feels like this is infrastructure related but we have been unable to find anything to explain what is happening so if anyone has any suggestions for where we should be looking they would be gratefully received. We have enabled basic Application Insights (not SDK) but other than seeing CPU load spike prior to loss of app response there is little information of interest given our limited knowledge of how to best utilise Insights.
According to your description, I thought of two points to troubleshoot your problem. First of all, you can track the running status of your program through the code, and put a log at the beginning and end of your batch scheduled tasks to record the status of each run. If possible, record request and response information and start and end information. This can completely record the time and running status of your task.
Secondly, you can record logs before the program starts database operations, and whether the database connection is successful. The best case is to be able to record, what business will trigger CPU load when operating, and track the specific operating conditions, in order to specifically analyze what causes the database connection failure.
Because you cannot reproduce your problem, you can only guess the cause of the problem. If you still can't find where the problem is through the above two points, then modify your timer appropriately, and let the program trigger once every 5 minutes instead of 30s.

Random 503 errors in Azure Mobile Services

At certain times during the week while I'm testing my Mobile Services app I get a 503 error (Service Unavailable). It happens whether I try to call the app from localhost or live on my Azure Website. It hangs around for 10-15 minutes and then goes away on its own. It doesn't seem to be caused by anything in particular that I am doing (i.e. I have not updated any code). The 503 error occurs when I'm trying to call one of my custom APIs in my Mobile Services account. A few of the requests make it through (strangely enough) but the majority return a 503 error.
I've seen that someone had a very similar problem here (Why does Azure give me an intermittent Error 503. The service is unavailable?) without an acceptable resolution.
I am using the free version of Mobile Services but I should be no where near pushing the limits of what the free version can handle; I am the sole user of the app right now.
It will soon be time to make the service live and I'm shuddering at the thought of support calls that will come in during one of these funky states the service gets into. Any help in debugging the problem would be greatly appreciated.
EDIT:
I've narrowed this down to a database problem. I have one main query (sproc) that I use to feed data to the UI. I noticed that when I get the 503 errors the query takes about 13 seconds (when run in SSMS). When things are running "normally", the query takes less than a second.
This doesn't solve my problem though, in fact it makes it more perplexing because I am using the Business Edition of Windows Azure SQL Database and there shouldn't be a 13 second fluctuation in execution time!
This problem seems to happen randomly. Is there some kind of caching in SQL Server that could explain this? Maybe my query really does take 13 seconds to execute and the caching superficially speeds it up.
Could you try transitioning your database/server to one of the "editions"? They have resource governance to promote predictable performance. Web/Business suffer from a noisy neighbor problem. It sounds like that may be your issue, considering it is intermittent.
Here's a link to a page describing the editions. https://msdn.microsoft.com/en-us/library/azure/dn741340.aspx

IIS Connection Pool interrogation/leak tracking

Per this helpful article I have confirmed I have a connection pool leak in some application on my IIS 6 server running W2k3.
The tough part is that I'm serving 300 websites written by 700 developers from this server in 6 application pools, 50% of which are .NET 1.1 which doesn't even show connections in the CLR Data performance counter. I could watch connections grow on my end if everything were .NET 2.0+, but I'm even out of luck on that slim monitoring tool.
My 300 websites connect to probably 100+ databases spread out between Oracle, SQLServer and outliers, so I cannot watch the connections from the database end either.
Right now my best and only plan is to do a loose binary search for my worst offenders. I will kill application pools and slowly remove applications from them until I find which individual applications result in the most connections dropping when I kill their pool. But since this is a production box and I like continued employment, this could take weeks as a tracing method.
Does anyone know of a way to interrogate the IIS connection pools to learn their origin or owner? Is there an MSMQ trigger I might be able to which I might be able to attach when they are created? Anything silly I'm overlooking?
Kevin
(I'll include the error code to facilitate others finding your answers through search:
Exception: System.InvalidOperationException
Message: Timeout expired. The timeout period elapsed prior to obtaining a connection from the pool. This may have occurred because all pooled connections were in use and max pool size was reached.)
Try starting with this first article from Bill Vaughn.
Todd Denlinger wrote a fantastic class http://www.codeproject.com/KB/database/connectionmonitor.aspx which watches Sql Server connections and reports on ones that have not been properly disposed within a period of time. Wire it into your site, and it will let you know when there is a leak.

Resources