Refreshing Data on all Azure Instances - azure

What is the best way to refresh the data in all my Azure instances?
I have several instances each running the same web role. On role startup data is read from blob storage into local store. Intermittently there will be a new update to data used by the site. I want to be able to notify each instance programmatically to get the updated data from blob storage into the instance's local storage so that each instance becomes refreshed.
One idea is to write a custom client program to call a web-service on the web role, passing in a role ID to update. If the endpoint is the Role ID then the instance refreshes itself. If not the client tries again until all instances report that they are refreshed. Is this a good approach or is there a built in method in Azure for doing this?
I have considered a separate thread which intermittently checks for an refresh flag, though I'm worried my instances will become out of sync.
There is not a huge amount of data so I could put it in the Azure cache. However, I am concerned about the network latency with using the cache. Does the Azure cache handle this well?
Another idea is to just reboot the instances one after another (with the refresh operation being performed on the role start up).

I think one possible way you could do this is to use a setting (e.g. a timestamp) in a configuration setting - you can then programmatically update the configuration and use the RoleEnvironment.Changing event to do monitor for the change on all your instances - http://msdn.microsoft.com/en-us/library/microsoft.windowsazure.serviceruntime.roleenvironment.changing.aspx
If you do this make sure you intercept the event in all your roles - and make sure you parse the changes (looking for Microsoft.WindowsAzure.ServiceRuntime.RoleEnvironmentConfigurationSettingChange) and return false to the Cancel parameter to prevent your instances from being rebooted.

Adding to Stuart's answer, let me address your proposed techniques:
I don't think the client-calling-web-service technique is practical, as you've now added a web service as well as a client driver to call it, and you have no ability to contact individual web role instances - the load balancer hides the individual instances from you.
You can store data in the cache, but the only way to update your data "now" is to expire items in the cache. If you have all of your data in a single cache item with a well-known key, then it's easy to expire. If it's across multiple keys, then you have a more complex task, and you won't be able to expire the items atomically (unless you clear the entire cache). Plus, cache expires on its own anyway - you'd have to deal with re-loading if that occurs.
You can use a background thread on your role instances, that looks for a blob (maybe with a single zip file), and copy the zip down to local storage when its id changes (you can store a unique id in the blob's metadata, for instance). Then you could just check this periodically (maybe every minute?).
The reboot idea has a good side and a bad side. On the good side, you're going to avoid side-effects caused by changing local content while your role instance is still running. The bad side is that the role instance will be offline for a few minutes during reboot.
Stuart's suggestion of using a configuration setting is a good one. Just be sure you can update your files without breaking your app. If this cannot be done safely, don't handle the Changing event - just let the role instance recycle.

Related

Synchronising in memory data between Azure Front Door back end instances

I have a web application in Azure. There are 2 instances, with Azure Front Door being used to route all traffic to the primary instance only, with the secondary one only ever used if the first is unavailable. We have some in memory data that gets updated by users. The database is subsequently updated with the changes. The data is kept in memory as a function of a genuine performance requirement. The issue we have is the data is fetched from the database only on application start up, meaning that if the primary instance becomes unavailable the secondary one could very well have information that is out of date. I was hoping that Front Door could be configured to trigger an API when any sort of switch - from primary to secondary or vice versa - occurs. However I can't find any reference to this in the documentation.
We have a series on web jobs that run, one of which is triggered every minute. However using this to keep the data fresh still doesn't guarantee that an instance will necessarily have the latest information.
Any thoughts on how to rectify this issue very much appreciated.
[EDIT]. Both web apps talk to the same database instance
Unfortunately Azure Front Door doesn't have any native support for firing events to something like an Event Hub but you could stream your logs to one. For example you could stream "FrontDoorAccessLog" to an Event Hub and have a script receive these events. When the "OriginName" value changes you could inform the failover app to update its state via an API.
Is there a reason why both Webapps have their own database if they have to be somewhat synchronized? Why not have both Webapps talking to the same DB?

Uploading data to Azure App Service's persistent storage (%HOME%)

We have a windows-based app service that requires a large dataset to run (files stored on Azure Blob Storage at around ~30GB). This data is static per app version, and therefore should be accessible to all instances across a given slot (a slot in our case represents a version).
Based on our initial research, it seems like Persistent Storage (%HOME%) would be the ideal place for this, since data stored there is shared across instances, but not across slots.
The next step now is to load the required data as part of our devops deployment pipeline, since the app service cannot operate without the underlying data. However, it seems like the %HOME% directory is only accessible by the app service itself, even though the underlying implementation is using Azure Storage.
At this point, we're considering having the app service download the data during its startup, but then we hit a snag which is that we have two instances. We could implement a Mutex (using blob lease) but this seems to us to be too complicated a solution for a simple need.
Any thoughts about how to best implement this?
The problems I see with loading the file on container startup are the following:
It's going to be really slow, and you might hit one of the built-in App Service timeouts.
Every time your container restarts, or you add another instance, it will re-download all the data, and it might cause issues with blocked writes because of file handle locks, which can make files or directories on %HOME% completely unaccessible for reading and modifying (I just had this happen to me).
For this I would rather suggest connecting the app to Azure Files by SMB, and for example have a directory per each version. This way you can connect to Azure Files and write the data during your build pipeline, and save an ENV variable or file that tells each slot which directory to get the current version's data from.

Is there a way to purge all documents from an CosmosDB Container using the Azure Portal?

I'm developing an app that uses a CosmosDB container. Every so often, I want to purge one of the testing containers. Using the Azure Portal, I drop the container and create a new one with the same name and attributes.
Simple enough, but this feels unnecessary. Every so often I'll type something wrong. Is there a way to delete all documents in a container, without the need to recreate it, via the web Portal? It feels as if this might exist in a menu somewhere and I'm just unable to find it.
You can se the time to live of the container to something like 1 second Link. It will take some time depending on the number of documents and the throughput of your Cosmos DB.
Deletion by TTL will only use left over RU/s so it will not affect your application if your application is live.

Multiple Instances of Azure Worker Roles for non-transaction integration tasks

We have an upcoming project where we'll need to integrate with 3rd parties over a variety of transports to get data from them.
Things like WCF Endpoints & Web API Rest Endpoints are fine.
However in 2 scenario's we'll need to either pick up auto-generated emails containing xml from a pop3 account OR pull the xml files from an External SFTP account.
I'm about to start prototyping these now, but I'm wondering are there any standard practices, patterns or guidelines about how to deal with these non-transactional systems, in a multi-instance worker role environment. i.e.
What happens if 2 workers connect to the pop account at the same time or the same FTP at the same time.
What happens if 1 worker deletes the file from the FTP while another is in mid-download.
Controlling duplication shouldn't be an issue, as we'll be logging everything on application side to a database, and everything should be uniquely identifiable so we'll be able to add if-not-exists-create-else-skip logic to the workers but I'm just wondering is there anything else I should be considering to make it more resilient/idempotent.
Just thinking out loud, since the data is primarily files and emails one possible thing you could do is instead of directly processing them via your worker roles first thing you do is save them in blob storage. So there would be some worker role instances which will periodically poll the POP3 server / SFTP site and pull the data from the there and push them in blob storage. When the blob is written, same instance can delete the data from the source as well. With this approach you don't have to worry about duplicate records because blob will be overwritten (assuming each message/file has a unique identifier and the name of the blob is that identifier).
Once the file is in your blob storage, you can write a message in a Windows Azure Queue which has details about this blob (may be blob URL etc.). Then using 'Get' semantics of Windows Azure Queues, your worker role instances start fetching and processing these messages. Because of Get semantic, once a message is fetched from the queue it becomes invisible to other callers (worker roles instances in this case). This way you could take care of duplicate message processing.
UPDATE
So I'm trying to combat against two competing instances pulling the same file at the same moment from the SFTP
For this, I'll pitch my favorite Master/Slave Concept:). Essentially the idea is that each instance will try to acquire a lease on a single blob. The instance which acquires the lease becomes the master and others slave. Master would fetch the data from SFTP while slaves will wait. I've described this concept in my blog post which you can read here: http://gauravmantri.com/2013/01/23/building-a-simple-task-scheduler-in-windows-azure/, though the context of the blog is somewhat different.
have a look the recently released Cloud Design Patterns. you might be able to find the corresponding pattern and sample code for what you need.

Maintaining Node.js sessions between multiple instances on Azure

I have 3 instances of a Node.js worker role running on Windows Azure. I'm trying to maintain sessions between all the instances.
Azure Queue seems like the recommended approach, but how do I ensure all the instances receive the session, as the queue deletes the session once a single instance has de-queued it?
Azure table isn't really suitable for my application as the sessions are too frequent and need not be stored for more than 10 seconds.
A queue isn't a good mechanism for session state; it's for message-passing. And once one instance reads a queue message, it's no longer visible while a particular role instance is processing that message. Also: what would you do with the message when done with it? Update it and then make it visible again? The issue is that you cannot choose which "session" to read. It's an almost-FIFO queue (messages that aren't processed properly can reappear). It's not like a key/value store.
To create an accessible session repository, you can take advantage of Azure's in-role (or dedicated role) caching, which is a distributed cache across your role instances. You can use Table Storage too - just simple key/value type of reads/writes. And Table Storage is included in the node.js Azure SDK.
That said: let's go the cache route here. Since your sessions are short-lived, and (I'm guessing) don't take up too much memory, you can start with an in-role cache (the cache shares the worker role RAM with your node code, taking a percentage of memory). The cache is also memcache-compatible, which is easy to access from a node application.
If you take a look at this answer, I show where caching is accessed. You'll need to set up the cache this way, but also set up the memcache server gateway by adding an internal endpoint called memcache_default. Then, point your memcache client class to the internal endpoint. Done.
The full instructions (and details around the memcache gateway vs. client shim, which you'd use when setting up a dedicated cache role) are here. You'll see that the instructions are slightly different if using a dedicated cache, as it's then recommended to use a client shim in your node app's worker role.

Resources