Connecting to Azure Cache Service takes about 3.3 seconds - azure

In our still-in-development project we have noticed sudden delays when accessing our ASP.NET Web API services. Using the awesome Mini Profiler we nailed it that these delays are caused when connections to the Azure Data Cache (Preview) services are dropped and they have to be reestablished. This process takes about 3.3 seconds. After reconnecting, getting an object from the cache takes 1.4 ms.
When I increased maxConnectionsToServer from 1 to 20, I noticed another thing. If I don't make requests to the Web API for 1 or 2 minutes (that's usually when the connections are dropped) and then start making calls, next 20 requests are delayed for 3.3 seconds, which is how connection pooling works I guess (round-tripping the connections from the pool).
Both the Web API and Caching service are hosted in the East US region, we have disabled local cache, SSL is disabled, auto discover is enabled.
So, I'm wondering if something is wrong with our configuration or is this a thing because Azure Cache is still in preview?
Any information will be valued.
Thanks!

It sounds like your shared cache is being offloaded due to inactivity. One way to test this would be to add an In-Role Cache to an existing service (if available) and swap your cache usage to this new cache. In-Role cache is described here.
Once the cache is moved off of a shared offering, wait the requisite 1-2 minutes for idle time out and retry the connection, the delay should not be present.
Assuming you want to stick with the shared cache option after isolating the problem, the only current workaround that I am aware of is running a background task that will periodically ping the cache to keep it alive.
If you are running a full Web role you can launch a background task on application start up.
If you deploying via Mobile Services, then you can run the "ping" via Scheduled Jobs. The only issue you may run into here is that the minimum time for a scheduled job is 1 minute, which may not be aggressive enough to keep your cache alive 100% of the time.

Nothing that I see points to you doing anything wrong per se. It may be the Azure is genuinely having problems getting the cache connections up and running quickly. According to several best practices documents and MSDN posts, you want to increase your number of connections to caches to allow for a failover to an active connection, which you've effectively done with your configuration change.
Try making sure that your cache accessor is a static object (another MSDN recommendation) and this may be a long shot but consider using the Sliding Window option for object expiration and see if that not only tells the countdown for the object store to reset, but also prompts the cache service to reset the connection.

Related

Multiple Redis connection exception (No Connection available to service) during App service swap slots

I have a web app in production (.Net Core), I deployed it in Azure as App service which is in premium tier p2v2 4 instances. I am also using Azure Redis cache (Premium Tier) which my app is using it as cache. I have two app services (primary and secondary) configured Traffic Manager for load balancing.
Whenever I am trying to deploy my app into production using swap slot feature, Both the app service response time goes up to 20 secs and it is down for around 1 minute and my CPU utilization goes close to 90%. And I am seeing multiple exceptions from Redis client (For ex: No connection is available to service this operation: EVAL; It was not possible to connect to the Redis server(s). To create a disconnected multiplexer, disable AbortOnConnectFail. ConnectTimeout; IOCP: (Busy=0,Free=1000,Min=8,Max=1000), WORKER: (Busy=452,Free=32315,Min=8,Max=32767), Local-CPU: n/a) and my HttpQueue length goes above 10
I can infer from the above image is that worker thread has been overloaded, Donno why it is happening
I am using .Net StackExchange Redis client version 2.0.601, recently did an update from version 1.2.4
Note:
I didn't use slot specific app setting.
It keeps happening for every swap slots during deployment
I didn't find any app service restart in the logs.
I want to know any of you guys are facing this issue, if yes please suggest me where is the problem or how to debug and it would also better if you can share any of things you tried.
I tried to find any error logs in AZure Redis cache server but couldn't find any.
I am trying to figure out what is causing this issue, how to debug this kind of issues with azure, and whether anybody encountered the same and have implemented any resolution for the same?
Please let me know if you need any additional details.
Here is something which might be worth trying :
Cache metrics are reported using several reporting intervals, including Past hour, Today, Past week, and Custom. The Metric blade for each metrics chart displays the average, minimum, and maximum values for each metric in the chart, and some metrics display a total for the reporting interval.
Each metric includes two versions. One metric measures performance for the entire cache, and for caches that use clustering, a second version of the metric that includes (Shard 0-9) in the name measures performance for a single shard in a cache. For example if a cache has 4 shards, Cache Hits is the total amount of hits for the entire cache, and Cache Hits (Shard 3) is just the hits for that shard of the cache.
Try looking for the Error metric while monitoring.
https://learn.microsoft.com/en-us/azure/azure-cache-for-redis/cache-how-to-monitor#available-metrics-and-reporting-intervals
Additionally , we need to retry for TimeoutException, RedisConnectionException or SocketException even which ensure it will try to connect in case of any exception, you can read about all the best practises arouns Redis Cache usage in below doc:
https://learn.microsoft.com/en-us/azure/azure-cache-for-redis/cache-best-practices
https://learn.microsoft.com/en-us/azure/azure-cache-for-redis/cache-best-practices#when-is-it-safe-to-retry
Hope it helps.

ASP.NET Core 2.2 experiencing high CPU usage

So I have hosted asp.net core 2.2 web service on Azure(S2 plan). The problem is that my application sometimes getting high CPU usage(almost 99%). What I have done for now - checked process explorer on azure. I see there a lot of processes who are consuming CPU. Maybe someone knows if it's okay for these processes consume CPU?
Currently, I don't have an idea where do they come from. Maybe it's normal to have them here.
Shortly about my application:
Currently, there is not much traffic. 500-600 request in a day. Most of the request is used to communicate with MS SQL by querying records, adding, etc.
As well I am using MS Websocket, but high CPU happens when no WebSocket client is connected to web service, so I hardly believe that it's a cause. I tried to use apache ab for load testing, but there isn't any pattern, that after one request's load test, I would get high CPU. So sometimes happens, sometimes don't during load testing.
So I just update screenshot of processes, I see that lots of threads are being locked/used during the time when fluent migrator start running its logging.
Update*
I will remove fluent migrator logging middleware from Configure method. Will look forward with the situation.
UPDATE**
So I removed logging of FluentMigrator. Until now I didn't notice any CPU usage over 90%.
But still, I am confused. My CPU usage is spinning. Is it health CPU usage graph or not?
Also, I tried to make a load test on the websocket server.
I made a script that calls some functions of WebSocket every 100ms from 6-7 clients. So every 100ms there are 7 calls to WebSocket server from different clients, every function within itself queries some data/insert (approximately 3-4 queries of every WebSocket function).
What I did notice, on Azure S1 DTU 20 after 2min I am getting out of SQL pool connections, If I increase DTU to 100, it handles 7 clients properly without any errors of 'no connection pool'.
So the first question: is it a normal CPU spinning?
Second: should I get an error message of 'no SQL connection free' using this kind of load test on DTU 10 Azure SQL. I am afraid that when creating a scoped service on singleton WebSocket Service I am leaking connections.
This topic gets too long, maybe I should move it to a new topic?
-
At this stage I would say you need to profile your application and figure out what areas of your code are CPU intensive. In the past I have used dotTrace, this highlighted methods which are the most expensive with a call tree.
Once you know what areas of your code base are the least efficient, you can begin to refactor them so that they are more efficient. This could simply be changing some small operations, adding caching for queries or using distributed locking for example.
I believe the reason the other DLLs are showing CPU usage is because your code calling methods which are within those DLLs.

IIS application initialization module and memory management

I am researching into the IIS Application Initialization module and from what I can see, when using the AlwaysRunning option for Start Mode setting for the application pool, basically it starts a new worker process that will always run even if there isn't any requests. When applying this option it starts the process automatically.
My concern is memory management and CPU usage, specifically how is this handled since the process always runs.
How can I compare this to setting the Start Mode to OnDemand and increase the Idle Time minutes to couple of days? That way, I guess, the process will run in idle mode for x days before it's terminated, and reinitialized on the next request and keep running for a couple of days. If I set the minutes to let's say 1.5 days, someone is bound to use the application at least once a day, so it will persist the process runtime and it will never be terminated.
Can anyone share experience regarding this topic?
Thanks
I have multisite application that runs few sites under separate app pools. All are set OnDemand for Start Mode and IdleTime for 1740 minutes, also I use Page Output Cache from app with different times for different page types. There is also NHibernate behind scene and DB is MySql.
The most active site have more than 100k visits per day and almost never is idle. When it starts if I recycle, need 30 seconds to 2 minutes to became full operable depending on requests at the moment and CPU usage is going from 40% to 70%. After the site is up CPU usage is very low (0-4%) if there are no new entries in DB and memory usage is around 3GB when all is cached. Sometimes CPU is going to 20% if at that moment are new request (for not cached content) and there is new entry saving.
Also Page Output Cache works on First Come First Served base so maybe this can also cause little problem while caching is done - user must wait, little more CPU to do the caching.
The most biggest problem in my case is using NHibernate and MySql but Page Output Cache resolved the problem for me when I decided to cache the page modules and content. I realize that is better application to starve for memory then for CPU.
3.5k visitors at one moment when everything is cached gave to me same memory usage (3GB) and CPU (server overall) around 40%
Other sites are using around 1-1.5GB memory and CPU never more then 20% at start.
The application with same settings for app pool and using MSSQL with EF I can't even notice that run on server. It is used by 10-60 users in minute there is not much content except embedding codes and it use 1-5% CPU and never more than 8MB memory. On recycle it is up for less then 10 seconds.
With this my experience I can tell you that all depends on what application serves and how it works :) and how much content do you have.
If you use OnDemand with long IdleTime it will be same as AlwaysStart and process is not used at that moment. If you use OnDemand with short IdleTime more often you will need CPU to start the process.

First server call is taking more time than subsequent call in Windows Azure cloud application?

I am working on windows azure cloud service. First time when i click on login button it takes 6 to 7 seconds but after sometime when i click on same login button it takes 2 seconds. I am not able to understand why it is happening so though the server side code is same for both processing but subsequent calls are quiet faster than first call ?.
"First-hit" delay is very common with ASP.NET applications. There is the overhead of JIT compilation, and various "pools" (database connections, threads, etc) may not be initialized. If you have an ASP.NET Web Forms application, each .aspx page is compiled the first time it is accessed, not when the server starts up. Also the various caching mechanisms (server or client) that make subsequent requests faster are not initialized on that first hit. And on the very first hit, any code in Application_Start will be run, setting up routing tables and doing any other initialization.
There are various things you can do to prevent your users from seeing this delay. The simplest is to write some kind of automated process that hits every page and run it after deploying a new release. There are also modules for IIS that will run code ahead of the Application_Start, when the site is actually deployed. Search for "ASP.NET warmup" to find those.
You may also experience delays after a period of inactivity, if your ASP.NET App Pool is recycled - this resets a bunch of things and causes start-up code to be run again on the next request. You can ameliorate this effect by setting up something to ping a page on your site frequently so that if the app pool is recycled it is warmed up again automatically, instead of on the next actual user request. Using an uptime monitoring service will work for this, or a Scheduled Task within the Azure ecosystem itself.

Why Even Recycle an Application Pool?

Maybe someone can shed some light on this simple question:
I have a .NET web application that has been thoroughly vetted. It loads a cache per appdomain (process) whenever one starts and can not fully reply to requests until it completes this cache loading.
I have been examining the settings on my application pools and have started wondering why I was even recycling so often (once every 1,000,000 calls or 2 hours).
What would prevent me from setting auto-recycles to being once every 24 hours or even longer? Why not completely remove the option and just recycle if memory spins out of control for the appdomain?
If your application runs reliably for longer then the threshold set for app pool recycling, then by all means increase the threshold. There is no downside if your app is stable.
For us, we have recycling turned off altogether, and instead have a task that loads a test page every minute and runs an iisreset if it fails to load five times in a row.
You should probably be looking at recycling from the point of view of reliability. Based on historical data, you should have an idea how much memory, CPU and so on your app uses, and the historical patterns and when trouble starts to occur. Knowing that, you can configure recycling to counter those issues. For example, if you know your app has an increasing memory usage pattern* that leads to the app running out of memory after a period of several days, you could configure it to recycle before that would have happened.
* Obviously, you would also want to resolve this bug if possible, but recycling can be used to increase reliability for the customer
The reason they do it is that an application can be "not working" even though it's CPU and memory are fine (think deadlock). The app recycling is a final failsafe measure which can protect flawed code from dying.
Also any code which has failed to implement IDisposable would run finalizers on the recycle which will possibly release held resources.

Resources