The documentation doesn't explain what is the maintenanceWindow is, as during patching a node switch happens, why this window length matter at all? Could it impact the duration that the Redis cache is not available to use?
What is the maintenance window?
When scheduling updates for Azure Redis Cache, you can decide the day of the week, start UTC hour and maintenance window.
Setting a maintenance window allows you to minimize the impact on your application and users.
Only Redis server updates are made during the scheduled maintenance window. The maintenance window does not apply to Azure updates or updates to the VM operating system.
What is the impact of setting a shorter or longer window?
The default and minimum, maintenance window for updates is five hours. The actual time required for maintenance depends on exactly what’s taking place.
You could always set it to a longer timespan, but we recommend selecting a timespan that would have the least impact on your business.
Related
UPDATE: I've figured it out. See the end of this question.
I have an Azure App Service running four sites. One of the sites has two deployment slots in addition to the primary one. Recently I've been seeing really high CPU utilization for the App Service plan as a whole.
The dark orange line shows the CPU percentage. This is just after restarting all my sites, which brought it down to this level.
However, when I look at the CPU use reported by each site, it's really low.
The darker blue line shows the CPU time, which is basically nothing. I did this for all of my sites, and all the graphs look the same. Basically, it seems that none of my sites are causing the issue.
A couple of the sites have web jobs, so I took a look at the logs but everything is running fine there. The jobs run for a few seconds every few hours.
So my question is: how can I determine the source of this CPU utilization? Any pointers would be greatly appreciated.
UPDATE: Thanks to the replies below, I was able to get more detail into what was happening. I ended up getting what I needed from SCM / Kudu tools. You can get here by going to your web app in Azure and choosing Advanced Tools from the side nav. From the Kudu dashboard, choose Process Explorer. The value in the Total CPU Time column is not directly useful, because it's the time in seconds that the process has run since it started, which might have been minutes or days ago.
However, if you make a record of the value at intervals, you can look at the change over time, and one process might jump out at you. In my case, it was my WebJobs process. Every 60 seconds, this one process was consuming about 10 seconds of processor time, just within one environment.
The great thing about this Kudu dashboard is, if you can catch the problem while it is actually happening, you can hit the Start Profiling button and capture a diagnostic session. You can then open this up in Visual Studio and get some nice details about where the CPU time is being spent.
Just in case anyone else is seeing similar issues, I'll provide more details about my particular case. As I mentioned, my WebJobs exe was the culprit, and I found that all the CPU time was being spent in StackExchange.Redis.SocketManager, which manages connections to Azure Redis Cache. In my main web app, I create only one connection, as recommended. But Since my web jobs only run every once in a while, I was creating a new connection to Azure Redis Cache each time one ran, which apparently can lead to issues. I changed my code to create the Redis Cache connection once when the WebJob process starts up and use the existing connection when any individual WebJob runs.
Time will tell if this really fixes the issue, but I think it will. When the problem occurred, it always fit the same pattern: After a few days of running fine, my CPU would slowly ramp up over the course of about 12 hours. My thinking is that each time a WebJob ran, it created a connection object, which at first didn't produce trouble, but gradually as WebJobs ran every hour or two, cruft was building up until finally some critical threshold was met and the CPU usage would take off.
Hope this helps someone out there. Best wishes!
May be you should go to webApp scm?
%yourAppName%.scm.azurewebsites.com;
There is a page, that can show you all process, that runned now on your web app. (something like Console > Process).
Also you can go to support page (from scm right corner).
You can find some more info about your performance there, and make memory dump (not for this problem, but it useful for performance issues).
According to your description, I assumed that you could leverage the Crash Diagnoser extension to capture dump files from your Web Apps and WebJobs when the CPUs usage percentage is higher than the specific threshold to isolate this issue. For more details, you could refer to this official blog.
I am developing an application using Azure Cloud Service and web api. I would like to allow users that create a consultation session the ability to change the price of that session, however I would like to allow all users 30 days to leave the session before the new price affects the price for all members currently signed up for the session. My first thought is to use queue storage and set the visibility timeout for the 30 day time limit, but this seems like this could grow the queue really fast over time, especially if the message should not run for 30 days; not to mention the ordering issues. I am looking at the task scheduler as well but the session pricing changes are not a recurring concept but more random. Is the queue idea a good approach or is there a better and more efficient way to accomplish this?
The stuff you are trying to do should be done with a relational database. You can use timestamps to record when prices for session changed. I wouldn't use a queue at all for this. A queue is more for passing messages in a distributed system. Your problem is just about tracking what prices changed on what sessions and when. That data should be modeled in a database.
I think this scenario is more suitable to use Azure Scheduler. Programatically create a Job with one time recurrence with set date as 30 days later to run once. Once this job gets triggered automatically by scheduler, assign an action to callback to one of your API/Service to do the price & other required updates and also remove this Job from the scheduler as part of this action to have a clean jobs list. Anyways premium plan of Azure Scheduler Job Collection will give you unlimited number of jobs to run.
Hope this is exactly what you were looking for...
I would consider using Azure WebJobs. A WebJob basically gives you the ability to run a .NET console application within the context of an Azure Web App. It can be run on demand, continuously, or in response to a reoccurring schedule. If your processing requirements are low and allow for it they can also run in the same process that your Web App is running in to save you $$$ as they are free that way.
You could schedule the WebJob to run once or twice per day and examine the situation and react as is appropriate. Since it's really just a .NET worker role you have ultimate flexibility.
I have a Node.js application that receives data via a Websocket connection and pushes each message to an Azure Redis cache. It stores a persistent array of messages in a variable for downstream use, and at regular intervals syncs that array from the cache. Bit convoluted, but at a later point I want to separate out the half of the application that writes to the cache from the half of it that reads from it..
At around 02:00 GMT, based on the Azure portal stats, I appear to have started getting "cache misses" on that sync, which last for a couple of hours before I started getting "cache hits" again sometime around 05:00.
The cache misses correspond to a sudden increase in CPU usage, which peaks at around 05:00. And when I say peaks, I mean it hits 81%, vs a previous max of about 6%.
So sometime around 05:00, the CPU peaks, then drops back to normal, the "cache misses" go away, but looking at the cache memory usage, I drop from about 37.4mb used to about 3.85mb used (which I suspect is the "empty" state), and the list that's being used by this application was emptied.
The only functions that the application is running against the cache are LPUSH and LRANGE, there's nothing that has any capability to remove data, and in case anybody was wondering, when the CPU ramped up the memory usage did not so there's nothing to suggest that rogue additions of data cropped up.
It's only on the Basic plan, so I'm not expecting it to be invulnerable or anything, but even without the replication features of the Standard plan I had expected that it wouldn't be in a position to completely wipe itself - I was under the impression that Redis periodically writes itself to disk and restores from that when it recovers from an error.
All of which is my way of asking:
Does anybody have any idea what might have happened here?
If this is something that others have been able to accidentally trigger themselves, are there any gotchas I should be looking out for that I might have in other applications using the same cache that could have caused it to fail so catastrophically?
I would welcome a chorus of people telling me that the Standard plan won't suffer from this sort of issue, because I've already forked out for it and it would be nice to feel like that was the right call.
Many thanks in advance..
Here my thoughts:
Azure Redis Cache stores information in memory. By default, it won't save a "backup" on disk, so, you had information in memory, for some reason the server got restarted and you lost your data.
PS: See this feedback, there is no option to persist information on disk using azure-redis cache yet http://feedback.azure.com/forums/169382-cache/suggestions/6022838-redis-cache-should-also-support-persistence
Make sure you don't use Basic plan. Basic plan doesn't suppose SLA and from my expirience it lost data quite often
Standard plan provides SLA and utilize 2 instances of Redis Cache. It's quite stable and it didn't lose our data, although such case still possible.
Now, if you're going to use Azure Redis as database, but not as a cache you need to utilize data persistance feature, which is already available in Azure Redis Cache Premium Tier: https://azure.microsoft.com/en-us/documentation/articles/cache-premium-tier-intro (see Redis data persistence)
James, using the Standards instance should give you much improved availability.
With the Basic tier any Azure Fabric update to the Master Node (or hardware failure), will cause you to loose all data.
Azure Redis Cache does not support persistence (writing to disk/blob) yet, even in Standard Tier. But the Standard tier does give you a replicated slave node, that can take over if you Master goes down.
In our still-in-development project we have noticed sudden delays when accessing our ASP.NET Web API services. Using the awesome Mini Profiler we nailed it that these delays are caused when connections to the Azure Data Cache (Preview) services are dropped and they have to be reestablished. This process takes about 3.3 seconds. After reconnecting, getting an object from the cache takes 1.4 ms.
When I increased maxConnectionsToServer from 1 to 20, I noticed another thing. If I don't make requests to the Web API for 1 or 2 minutes (that's usually when the connections are dropped) and then start making calls, next 20 requests are delayed for 3.3 seconds, which is how connection pooling works I guess (round-tripping the connections from the pool).
Both the Web API and Caching service are hosted in the East US region, we have disabled local cache, SSL is disabled, auto discover is enabled.
So, I'm wondering if something is wrong with our configuration or is this a thing because Azure Cache is still in preview?
Any information will be valued.
Thanks!
It sounds like your shared cache is being offloaded due to inactivity. One way to test this would be to add an In-Role Cache to an existing service (if available) and swap your cache usage to this new cache. In-Role cache is described here.
Once the cache is moved off of a shared offering, wait the requisite 1-2 minutes for idle time out and retry the connection, the delay should not be present.
Assuming you want to stick with the shared cache option after isolating the problem, the only current workaround that I am aware of is running a background task that will periodically ping the cache to keep it alive.
If you are running a full Web role you can launch a background task on application start up.
If you deploying via Mobile Services, then you can run the "ping" via Scheduled Jobs. The only issue you may run into here is that the minimum time for a scheduled job is 1 minute, which may not be aggressive enough to keep your cache alive 100% of the time.
Nothing that I see points to you doing anything wrong per se. It may be the Azure is genuinely having problems getting the cache connections up and running quickly. According to several best practices documents and MSDN posts, you want to increase your number of connections to caches to allow for a failover to an active connection, which you've effectively done with your configuration change.
Try making sure that your cache accessor is a static object (another MSDN recommendation) and this may be a long shot but consider using the Sliding Window option for object expiration and see if that not only tells the countdown for the object store to reset, but also prompts the cache service to reset the connection.
Maybe someone can shed some light on this simple question:
I have a .NET web application that has been thoroughly vetted. It loads a cache per appdomain (process) whenever one starts and can not fully reply to requests until it completes this cache loading.
I have been examining the settings on my application pools and have started wondering why I was even recycling so often (once every 1,000,000 calls or 2 hours).
What would prevent me from setting auto-recycles to being once every 24 hours or even longer? Why not completely remove the option and just recycle if memory spins out of control for the appdomain?
If your application runs reliably for longer then the threshold set for app pool recycling, then by all means increase the threshold. There is no downside if your app is stable.
For us, we have recycling turned off altogether, and instead have a task that loads a test page every minute and runs an iisreset if it fails to load five times in a row.
You should probably be looking at recycling from the point of view of reliability. Based on historical data, you should have an idea how much memory, CPU and so on your app uses, and the historical patterns and when trouble starts to occur. Knowing that, you can configure recycling to counter those issues. For example, if you know your app has an increasing memory usage pattern* that leads to the app running out of memory after a period of several days, you could configure it to recycle before that would have happened.
* Obviously, you would also want to resolve this bug if possible, but recycling can be used to increase reliability for the customer
The reason they do it is that an application can be "not working" even though it's CPU and memory are fine (think deadlock). The app recycling is a final failsafe measure which can protect flawed code from dying.
Also any code which has failed to implement IDisposable would run finalizers on the recycle which will possibly release held resources.