Send Message to Azure Web Site Instance - azure

we are evaluating Azure right now and I really like the azure web sites, especially because of the very easy and fast deployment, which is helpful in our current situation where we make a lot of tests.
We have some in-memory-caches for information that is accessed very often per request like text-strings for multi-language-support and configuration settings edited by the site administrator. I would like to have a system where each instance of the web site has a copy of this cached data, but i need to send flush-events for cache invalidation to all instances when some settings are changed by the administrator. I guess that the azure service bus is perfect for this with the publish-subscribe-model, but I dont want to pay 3€ per instance just for sending some messages around.
Is there an option to open an individuell endpoint per instance, where I can a wcf-service for example?

This is no good way to direct a request at a specific instance of a Windows Azure Web Site that I'm aware of. The load balancer for Web Sites is defaulted to use sticky sessions (which you can turn off), but there isn't a way to force the request going in to be directed to once instance of a web site over another.
You could look at the Service Bus as you mentioned with a Topic and several subscriptions, which is indeed an option, but as you point out it does cost something. I'm interested in where you got the calculation for the amount though. Brokered messaging is charged per message (with "empty requests also being included"). If you had an instance checking once a minute for a month it's only about 43,000 calls. You can get 1 million calls for a US Dollar. With the long polling that Service Bus has in the managed client you end up with fewer "empty" calls than standard polling.
Another option is to simply use a different polling mechanism. In this case you are just wanting an indicator that you should, or should not flush the cache. You could put a text file in BLOB storage that contains a cache current version value. This could be whatever you want, a number, a guid, doesn't matter. Each instance would then from time to time check this BLOB file. If the value in the file is different than what they last saw they refresh their cache. Then they hold on to the new cache version value for their next call. You can either set this up as a WebJob on a schedule or do your own background polling.
Finally, there is the Windows Azure Cache Service (preview) which is usable by Web Sites, but that would cost additional and, if you really are caching the exact data on all instances, wouldn't be as efficient. It would give you the ability to deal with the cache service directly though, independent of the instances that are using it, allowing you to reset and such as you needed, on demand, in one fell swoop.
Personally I'd suggest taking a look at the Service Bus again.

Related

how to prevent azure from scaling out additional instances until they are ready?

We are having issues with an Azure Application Service. One of our webservices (MVC) caches data from the database at startup (Application_Start) - this takes approximately 3 minutes. Until this is ready we can't handle requests.
This is known so we set it 'always on' and will aim to only restart it during off-peak times if necessary.
However, we expect heavy load to the server next month, and in our testing of the auto-scaling, we have found that when it adds additional instances, each of these instances goes through the same startup delay - but the traffic is split between the current running instance and the new one that's warming up so e.g. half of the requests start failing for that 3 minute period.
How can we configure Azure to delay using the new instance until it is ready? (or should we be using e.g. AWS instead?).
Some of the documentation points to using a custom Load Balancer Probe however it mainly talks about VM's whereas we are using PAAS.
Do try to reduce the data you need to load on app_start and try to lazy load data into Cache on first request. Some times even after doing all of this we do end up with large sets of data that is necessary on start.
There are two ways we can approach this.
One, assuming you are using in-memory caching and every instance of the app needs to hydrate its in-memory cache on App_Start. Try to use a external cache provider like Azure Cache for Redis, your new instance can just point to this external cache without having to reload the data.
Two, you can depend on Application Initialization Module which was introduced in IIS 7.5 (installed on Azure App Services' IIS). To use this feature, you need to add applicationInitialization section under web.server section of web.config. This will help you not make the instance available until the warm-up process is completed. More info on how to use ApplicationInitialization is available in this blog post
The best case would be to use the combination of both, applicationInitialization will point at a page in your application which checks if the external cache is available and hydrated, if yes, complete, else hydrate the external cache.
You can do this in Azure with other resource type than classic VM like an App Service. App Services scale up and down with instances that share the same memory pool and thread pool.
There is a lot of good information, in the link https://www.jan-v.nl/post/warming-up-your-app-service that was included in one of the comments.
Based on that information the functionality that you require is not available in the free tier.
I would approach the problem differently. Why does it take 3 mins to load the data from the database? Since it is only loaded on start it should be data that does not change often.
Could you:
Optimise the reading of data from the database?
Reduce the amount of data you read from the database?
Export the data to a file, and read it from a file?
My recommendation would be to use an Azure Load Balancer with a health probe

What does the Azure Web Apps architecture look like?

I've had a few outages of 10 to 15 minutes, because apparently Microsoft had a 'blip' on their storages. They told me that it is because of a shared file system between the instances (making it a single point of failure?)
I didn't understand it and asked how file share is involved, because I would assume a really dumb stateless IIS app that communicates with SQL Azure for its data.
I would assume the situation below:
This is their reply to my question (I didn't include the drawing)
The file shares are not necessarily for your web app to communicate to
another resources but they are on our end where the app content
resides on. That is what we meant when we suggested that about storage
being unavailable on our file servers. The reason the restarts would
be triggered for your app that is on both the instances is because the
resources are shared, the underlying storage would be the same for
both the instances. That’s the reason if it goes down on one, the
other would also follow eventually. If you really want the
availability of the app to be improved, you can always use a traffic
manager. However, there is no guarantee that even with traffic manager
in place, the app doesn’t go down but it improves overall availability
of your app. Also we have recently rolled out an update to production
that should take care of restarts caused by storage blips ideally, but
for this feature to be kicked it you need to make sure that there is
ample amount of memory needs to be available in the cases where this
feature needs to kick in. We have couple of options that you can have
set up in order to avoid any unexpected restarts of the app because of
a storage blip on our end:
You can evaluate if you want to move to a bigger instance so that
we might have enough memory for the overlap recycling feature to be
kicked in.
If you don’t want to move to a bigger instance, you can always use
local cache feature as outlined by us in our earlier email.
Because of the time differences the communication takes ages. Can anyone tell me what is wrong in my thinking?
The only thing that I think of is that when you've enabled two instances, they run on the same physical server. But that makes really little sense to me.
I have two instances one core, 1.75 GB memory.
My presumption for App Service Plans was that they were automatically split into availability sets (see below for a brief description) Largely based on Web Apps sales spiel which states
App Service provides availability and automatic scale on a global data centre infrastructure. Easily scale applications up or down on demand, and get high availability within and across different geographical regions.
Following on from David Ebbo's answer and comments, the underlying architecture of Web apps appears to be that the VM's themselves are separated into availability sets. However all of the instances use the same fileserver to share the underlying disk space. This file server being a significant single point of failure.
To mitigate this Azure have created the WEBSITE_LOCAL_CACHE_OPTION which will cache the contents of the file server onto the individual Web App instances. Using caching in lieu of solid, high availability engineering principles.
The problem here is that as a customer we have no visibility into this issue, we've no idea if there is a plan to fix it, or if or when it will ever be fixed since it seems unlikely that Azure is going to issue a document that admits to how badly this has been engineered, even if it is to say that it is fixed.
I also can't imagine that this issue would be any different between ASM and ARM. It seems exceptionally unlikely that there was originally a high availability solution at the backend that they scrapped when ARM came along. So it is very likely that cloud services would suffer the exact same issue.
The small upside is that now that we know this is an issue, one possible solution would be to deploy multiple web apps and have a traffic manager between them. Even if they are in the same region, different apps should have different backend file servers.
My first action would be to reply to that email, with a link to the Web Apps page, (and this question) with a copy of the quote and ask how to enable high availability within a geographic region.
After that you'll likely need to rearchitect your solution!
Availability sets
For virtual machines Azure will let you specify an availability set. An availability set will automatically split VMs into separate update and fault domains. Meaning that servers will end up in different server racks, and those server racks won't get updates at the same time. (it is a little more complex than that, but that's the basics!)
Azure Web Apps do used a shared file storage. The best way to think about it is that all the instances of your app map to the same network share that have your files. So if you modify the files by any mean (e.g. FTP, msdeploy, git, ...), all the instances instantly get the new files (since there is only one set of files).
And to answer your final question, each instance does run on a separate VM.

How does one know why an Azure WebSite instance(WebApp) was shutdown?

By looking at my Pingdom reports I have noted that my WebSite instance is getting recycled. Basically Pingdom is used to keep my site warm. When I look deeper into the Azure Logs ie /LogFiles/kudu/trace I notice a number of small xml files with "shutdown" or "startup" suffixes ie:
2015-07-29T20-05-05_abc123_002_Shutdown_0s.xml
While I suspect this might be to do with MS patching VMs, I am not sure. My application is not showing any raised exceptions, hence my suspicions that it is happening at the OS level. Is there a way to find out why my Instance is being shutdown?
I also admit I am using a one S2 instance scalable to three dependent on CPU usage. We may have to review this to use a 2-3 setup. Obviously this doubles the costs.
EDIT
I have looked at my Operation Logs and all I see is "UpdateWebsite" with status of "succeeded", however nothing for the times I saw the above files for. So it seems that the "instance" is being shutdown, but the event is not appearing in the "Operation Log". Why would this be? Had about 5 yesterday, yet the last "Operation Log" entry was 29/7.
An example of one of yesterday's shutdown xml file:
2015-08-05T13-26-18_abc123_002_Shutdown_1s.xml
You should see entries regarding backend maintenance in operation logs like this:
As for keeping your site alive, standard plans allows you to use the "Always On" feature which pretty much do what pingdom is doing to keep your website warm. Just enable it by using the configure tab of portal.
Configure web apps in Azure App Service
https://azure.microsoft.com/en-us/documentation/articles/web-sites-configure/
Every site on Azure runs 2 applications. 1 is yours and the other is the scm endpoint (a.k.a Kudu) these "shutdown" traces are for the kudu app, not for your site.
If you want similar traces for your site, you'll have to implement them yourself just like kudu does. If you don't have Always On enabled, Kudu get's shutdown after an hour of inactivity (as far as I remember).
Aside from that, like you mentioned Azure will shutdown your app during machine upgrade, though I don't think these shutdowns result in operational log events.
Are you seeing any side-effects? is this causing downtime?
When upgrades to the service are going on, your site might get moved to a different machine. We bring the site up on a new machine before shutting it down on the old one and letting connections drain, however this should not result in any perceivable downtime.

Azure WebSite Always On

I have a WebAPI application running on Azure WebSites. It is running in Basic mode and I have the option to make it "Always On". There seems to be conflicting information online about what this means exactly. I know the effect, but the "how" matters a lot here. In particular, does something automatically hit an endpoint in my application periodically? If so, can I control the endpoint it hits?
As I mentioned, it is a Web API application and the default route does non-trivial work and results in a notable amount of outbound traffic and it will also result in items being placed onto a work queue that will eventually be processed. I want the application always on (no cold start times) but I don't want some service making requests of application.
As soon as your Azure Website is marked as AlwaysOn, your site root will be hit within a few seconds. We also make sure your site is up and running on all the workers (if you have configured auto scale option or such). After that, if the worker process crashes, alwaysOn makes sure that it comes back up.
You cannot control the endpoint that it hits.

WF4 Affinity on Windows Azure and other NLB environments

I'm using Windows Azure and WF4 and my workflow service is hosted in a web-role (with N instances). My job now is find out how
to do an affinity, in a way that I can send messages to the right workflow instance. To explain this scenario, my workflow (attached) starts with a "StartWorkflow" receive activity, creates 3 "Person" and, in a parallel-for-each, waits for the confirmation of these 3 people ("ConfirmCreation" Receive Activity).
I then started to search how the affinity is made in others NLB environments (mainly looked for informations about how this works on Windows Server AppFabric), but I didn't find a precise answer. So how is it done in others NLB environments?
My next task is find out how I could implement a system to handle this affinity on Windows Azure and how much would this solution cost (in price, time and amount of work) to see if its viable or if it's better to work with only one web-role instance while we wait for the WF4 host for the Azure AppFabric. The only way I found was to persist the workflow instance. Is there other ways of doing this?
My third, but not last, task is to find out how WF4 handles multiple messages received at the same time. In my scenario, this means how it would handle if the 3 people confirmed at the same time and the confirmation messages are also received at the same time. Since the most logical answer for this problem seems to be to use a queue, I started looking for information about queues on WF4 and found people speaking about MSQM. But what is the native WF4 messages handler system? Is this handler really a queue or is it another system? How is this concurrency handled?
You shouldn't need any affinity. In fact that's kinda the whole point of durable Workflows. Whilst your workflow is waiting for this confirmation it should be persisted and unloaded from any one server.
As far as persistence goes for Windows Azure you would either need to hack the standard SQL persistence scripts so that they work on SQL Azure or write your own InstanceStore implementation that sits on top of Azure Storage. We have done the latter for a workflow we're running in Azure, but I'm unable to share the code. On a scale of 1 to 10 for effort, I'd rank it around an 8.
As far as multiple messages, what will happen is the messages will be received and delivered to the workflow instance one message at a time. Now, it's possible that every one of those messages goes to the same server or maybe each one goes to a diff. server. No matter how it happens, the workflow runtime will attempt to load the workflow from the instance store, see that it is currently locked and block/retry until the workflow becomes available to process the next message. So you don't have to worry about concurrent access to the same workflow instance as long as you configure everything correctly and the InstanceStore implementation is doing its job.
Here's a few other suggestions:
Make sure you use the PersistBeforeSend option on your SendReply actvities
Configure the following workflow service options
<workflowIdle timeToUnload="00:00:00" />
<sqlWorkflowInstanceStore ... instanceLockedExceptionAction="AggressiveRetry" />
Using the out of the box SQL instance store with SQL Azure is a bit of a problem at the moment with the Azure 1.3 SDK as each deployment, even if you made 0 code changes, results in a new service deployment meaning that already persisted workflows can't continue. That is a bug that will be solved but a PITA for now.
As Drew said your workflow instance should just move from server to server as needed, no need to pin it to a specific machine. And even if you could that would hurt scalability and reliability so something to be avoided.
Sending messages through MSMQ using the WCF NetMsmqBinding works just fine. Internally WF uses a completely different mechanism called bookmarks that allow a workflow to stop and resume. Each Receive activity, as well as others like Delay, will create a bookmark and wait for that to be resumed. You can only resume existing bookmarks. Even resuming a bookmark is not a direct action but put into an internal queue, not MSMQ, by the workflow scheduler and executed through a SynchronizationContext. You get no control over the scheduler but you can replace the SynchronizationContext when using the WorkflowApplication and so get some control over how and where activities are executed.

Resources