In-memory caching in Azure function - azure

There is a need to cache objects to improve the perf of my Azure function. I tried .NET ObjectCache (System.Runtime.Caching) and it worked well in my testing (tested with upto 10min cache retention period).
In order to take this solution forward, I have few quick questions:
What is the recycling policy of Azure function. Is there any default? Can it be configured?
What is the implication in the cost?
Is my approach right or are there any better solutions?
Any questions that you may know, please help.
Thank you.

Javed,
An out-of-process solution such as Redis (or even using Table storage, depending on the workload) would be recommended.
As a rule of thumb, functions should be stateless, particularly if you're running in the dynamic runtime, where scaling operations (up and down) could happen at any time and your host is not guaranteed to stay up.
If you opt to use the classic hosting, you do have a little more flexibility, as you can enable the "always on" feature, but I'd still recommend the out-of-process approach. Running in the classic mode does have a cost implication as well, since you're no longer taking advantage of the consumption based billing model offered by the dynamic hosting.
I hope this helps!

If you just need a smallish key-value cache, you could use the file system. D:\HOME (also found in the environment variable %HOME%) is shared across all instances. I'm not sure if the capacities are any different for Azure Functions, but for Sites and WebJobs, Free and Shared sites get 1GB of space, Basic sites get 10GB, and Standard sites get 50GB.
Alternatively, you could try running .NET ObjectCache in production. It may survive multiple calls to the same instance (file system or static in-memory property). Note, this will not be shared across instances though so only use it as a best effort cache.
Note, both of these approaches pose problems for multi-tenant products as it could be an avenue for unintended cross-tenant data sharing or even more malicious activities like DNS cache poisoning. You'd want to implement authorization controls for these things just as if they came from a database.
As others have suggested, Functions ideally should be stateless and an out of process solution is probably best. I use DocumentDB because it has time-to-live functionality which is ideal for a cache. Redis is likely to be more performant especially if you don't need persistence across stop/restart.

Related

Azure Websites and stateful webApp

I have a naïve version of a PokerApp running as an Azure Website.
The server stores in its memory the state of the tables, (whose turn it is, blinds value, cards…) etc.
The problem here is that I don't know how much I can rely on the WebServer's memory to be "permanent". A simple restart of the server would cause that memory to be lost and therefore all the games in progress before the restart would get lost / cause trouble.
I've read about using TableStorage to keep session data and share it between instances, but in my case it's not just string of text that I want to share but let's say for example, a Lobby objcet which contains all info associated with the games.
This is very roughly the structure of the object I have in memory
After some of your comments, you can see the object that needs to be stored is quite big and is being almost constantly. I don't know how well serializing and deserializing is going to work for me here...
Should I consider an azure VM which I'm hoping is going to have persistent memory instead of a Website?
Or is there a better approach to achieve something like this?
Thanks all for the answers and comments, you've made it clear that one can't rely on local memory when working on the cloud.
I'm going to do some refactoring and optimize the "state" object and then use a caching service.
Two question come to my mind though, and once you throw some light on these ones I promise I will shut up and accept #astaykov's great answer.
CONCURRENCY AT INSTANCE LEVEL - I have classic thread locks in my app to avoid concurrency problems, so I'm hoping there is something equivalent for those caching services you guys propose?
Also, I have a few timeouts per table (increase blinds, number of seconds the players have to act…). Let's say a user has just folded a hand, he's finished interacting with the state object so I update the cache. While that state object (to which the timers belong) is cached, my timers will stop ticking…
I know I'm not explaining myself very well here but I hope you guys see my point.
I'd suggest using the Azure Redis Cache.
Here is a nice sample how to build MVC App with Redis Cache in 15 minutes.
You can, of course use the Azure Managed Cache. Or end up with Azure Tables. And Azure Tables can hold much more then just a string. But I believe the caching solutions would have lower latency in communication.
In either way, your objects have to be serializable. And yes - the objects will get serialized/deserialized by every access. You can do it manually, or let the framework do it for you. From what I've read, NewtonSoft.JSON is quite good and optimized JSON serializerdeserializer.
UPDATE
When you ask for a VM running in the cloud - a VM will be restarted sooner or later! Application Pool will recycle, a planned maintenance will occur, an unplanned maintenance will occur, a hard disk will fail, a memory module will fail, unforeseen disaster will happen.
Only one thing is for sure - if you want your data to survive server crashes, change the way you think and design software, and take data out of (local) the memory. Or just live the fact that application may lose state sometime.
Second update - for the clocks
Well, you have to play with your imagination and experience. I would question that your clocks work anyway in the context of the ASP.NET app (unless all of them being static properties of a static type, which would be a little hell). My approach would be heavily extend my app to the client as well (JavaScript). There are a lot of great frameworks out there - SignalR, AngularJS, KnockoutJS, none of them to be underestimated! By extending your object model to the client, you can maintain players objects lifetime on the client (keeping the clock ticking) and sending updates from the client to the server for all those events. If you take a look at SignalR, you can keep real time communication between multiple clients (say players) and the server. And the server side of SignalR scales out nicely with Azure Service Bus and even Redis.

Sudden Scaling of Simple Node.js App

My website is written in Node.js, has no database or external dependencies, but does have lot of large media files (images and some video) totalling some 2gb. The structure of the website is drawn from a couple of simple JSON files.
My problem is drastic and sudden scaling. Traffic to my site is usually easily handled by any small VPS instance, but occasionally traffic can get to hundreds of times it normal level for short periods. My problem is how to scale quickly, without downtime, and automatically. I know there are issues with autoscaling, but perhaps lacking a database will negate some of that.
What sort of scaling issues and options should I be looking at?
(For context, I am currently using a Digital Ocean VPS, but I can't find a clean way to scale it with no downtime. I am not wedded to my provider.)
Scalability is important, but scaling when you need to is also important. We all do not have the scaling needs of Facebook or Twitter : ) This might just be a case of resource management.
Test the problem
Without a database and using NodeJS, some of the strengths of node are its number of concurrent connections. For simple io load, it would seem you have picked a good choice of framework. And, since your problem set is a particular resource being bombarded, run some load testing on your server. Popular and free tools include:
Apache Bench
httperf
OpenLoad
And there are pay service like NeoLoad, LoadImpact (which is free at small levels), forecastweb, E-Load, etc..
With those results, Determine the Cause
Is it the size of the file being served? Is it the number of concurrent requests? What resources are being used, or maxed out, during a slowdown (ram, ports, file system, some other IO, CPU, bandwidth, etc...)?
Have a look at this question, which defines a few concepts for server load. To implement a solution, you will need to determine the cause of the slowdown. Is it: 1)Some queues fill up? 2)Problem with TCP Connections and Ports? 3) Too slow allocation of resources? That will help shape your solution.
Plan for scaling.
The type of scaling needed for your project may only be the portion needed for another. If you know the root cause in this case, it will increase your options.
Is the problem bandwidth? Perhaps using your web server as a router to multiple cloud instances of file serving would effectively increase the bandwidth your users see. Even just storing your files on a larger cloud that can guarantee the bandwidth you may need.
Is the problem CPU, RAM, etc? You may need multiple instances of the same web app (or an increased allotment for your VFS). This is the "Elastic" portion of Amazon's Elastic Cloud Computing (EC2), and other models like it. Create a "golden image" and duplicate when you see traffic start spiking, using built-in monitoring tools, turning it off when the rush is done. Can be programatic or simply manual.
Is the problem concurrent requests? The bottleneck should not be NodeJS, up to 1000's of concurrent requests anyway. Perhaps just check your implementation to ensure there is not a slowdown of the single node thread. Maybe node clustering or some worker threads would alleviate the bottleneck enough for your purposes.
Last Note: For serving static files I've heard nginx or even Apache Tomcat is a little more well-suited than NodeJS. Depending on your web app's complexity, you might be able to switch or benchmark fairly easily.
In case anyone is reading this rather specific question years later, I have gained some perspective on it. As Clay says, the ultimate answer is to spin up more servers, either manually or programatically based on load.
However, in my case that would be massive overkill - I'm not running Twitter. The problem was a relatively simple mistake in architecture. My app was reading the JSON data files from disk with every page request, and the disk I/O was getting saturated. I changed to loading the data files into memory on startup, and reloading them when they change using fs.watch().
My modest VPS can now easily handle the sorts of traffic that would previously crash it. I've never seen traffic that would make me want to up-size it.

App Fabric Cache Monitor

With the App Fabric Cache there are limits to the number of transactions per hour. Is there any way to monitor this? Firstly for testing, to find out how much cache we will need, and primarily so we don't hit the limit and have the site go down because we can't access the cache any more.
yes, you can use the Azure Application Manager as explained in this blogpost:
http://blogs.msdn.com/b/appfabric/archive/2011/06/20/introducing-windows-azure-appfabric-applications.aspx
there are some opensource tools available on codeplex as well:
http://www.codeplex.com/site/search?query=azure%20monitor&ac=3
i usually stick to the azure provided tools..
At the moment there is nothing in the management portal or in any in of the APIs that you can use to determine how many transactions you've used against the cache in any given hour (or how many connections you're using, the other limiting factor when using app fabric cache).
If this is particularly important you might try building a wrapper around cache access and use a scalable counter which increments every time you perform a transaction.

Windows Azure App Fabric Cache whole Azure Database Table

I'm working on Integration project where third party will call our web service in Azure. For performance reason I would like to store 2 table data (more than 1000 records) on to the app fabric cache.
Could anyone please suggest if this is the right design pattern?
Depending on how much data this is (you don't mention how wide the tables are) you have a couple of options
You could certainly store it in the azure cache, this will cost though.
You might also want to consider storing the data in the http runtime cache which is free but not distributed.
You choice would largely depend on the size of the data, how often it changes and what effect is caused if someone receives slightly out of date data.

mvc-mini-profiler - working with a load balanced web role (azure et al)

I believe that the mvc mini profiler is a bit of a 'God-send'
I have incorporated it in a new MVC project which is targeting the Azure platform.
My question is - how to handle profiling across server (role instance) barriers?
Is this is even possible?
I don't understand why you would need to profile these apps any differently. You want to profile how your app behaves on the production server - go ahead and do it.
A single request will still be executed on a single instance, and you'll get the data from that same instance. If you want to profile services located on a different physical tier as well, that would require different approaches; involving communication through internal endpoints which I'm sure the mini profiler doesn't support out of the box. However, the modification shouldn't be that complicated.
However, would you want to profile physically separated tiers, I would go about it in a different way. Specifically, profile each tier independantly. Because that's how I would go about optimizing it. If you wrap the call to your other tier in a profiler statement, you can see where the problem lies and still be able to solve it.
By default the mvc-mini-profiler stores and delivers its results using HttpRuntime.Cache. This is going to cause some problems in a multi-instance environment.
If you are using multiple instances, then some ways you might be able to make this work are:
to change the Http Cache to an AppFabric Cache implementation (or some MemCached implementation)
to use an alternative Storage strategy for your profile results (the code includes SqlServerStorage as an example?)
Obviously, whichever strategy you choose will require more time/resources than just the single instance implementation.

Resources