Is it worth caching data from Azure Table storage with the Azure Caching Preview?
Or is the table storage fast enough in large scale applications?
Thanks
The short answer is it depends. In the application I am currently working on there is some information that we use caching for to handle both the latency of retrieving data from Table Storage and to accommodate the desired number of transactions per second.
We started out serving the information from Table Storage and moved to caching only when our performance requirements dictated it. I'd recommend a similar approach: make it work, then make it fast.
In addition to what Robert said, you should also consider following points:
Windows Azure Table Storage allows to store up to 100TB in size (in chunks). At first glance, that size of data may seem overwhelming. However, Table Storage can be partitioned. Each partition of Table Storage can be moved to a separate server by the Azure controller thereby reducing the load on any single server and improving performance.
If you have very high load on your application, you cache with frequent inserts will approach the maximum cache size very quickly and then cache items eviction process starts. In most cases frequent inserts into cache and frequent cache items eviction processes end up with performance degradation instead of improvement. Then you would need to increase cache maximum size, which in turn will affect your application cost (sometimes this might be a blocker).
Last but not least, you can access Windows Azure Table Storage data using the OData protocol and LINQ queries with WCF Data Service .NET Libraries; you do not have that ability with Azure Cache.
Please bear in mind that those points may or may not be valid in your case. All depends on your system architecture; expected load etc.
I hope my answer will help you in making good system architecture decisions.
Related
We currently use Redis as our persistent cache for our web application but with it's limited memory and cost I'm starting to consider whether Table storage is a viable option.
The data we store is fairly basic json data with a clear 2 part key which we'd use for the partition and row key in table storage so I'm hoping that would mean fast querying.
I appreciate one is in memory and one is out so table storage will be a bit slower but as we scale I believe there is only one CPU serving data from a Redis cache whereas with Table storage we wouldn't have that issue as it would be down to the number of web servers we have running.
Does anyone have any experience of using Table storage in this way or comparisons between the 2.
I should add we use Redis in a very minimalist way get/set and nothing more, we evict our own data and failing that leave the eviction to Redis when it runs out of space.
This is a fairly broad/opinion-soliciting question. But from an objective perspective, these are the attributes you'll want to consider when deciding which to use:
Table Storage is a durable, key/value store. As such, content doesn't expire. You'll be responsible for clearing out data.
Table Storage scales to 500TB.
Redis is scalable horizontally across multiple nodes (or, scalable via Redis Service). In contrast, Table Storage will provide up to 2,000 transactions / sec on a partition, 20,000 transactions / sec across the storage account, and to scale beyond, you'd need to utilize multiple storage accounts.
Table Storage will have a significantly lower cost footprint than a VM or Redis service.
Redis provides features beyond Azure Storage tables (such as pub/sub, content eviction, etc).
Both Table Storage and Redis Cache are accessible via an endpoint, with many language-specific SDK wrappers around the API's.
I find some metrials about the azure redis and table, hope that it can help you.There is a video about Azure Redis that also including a demo to compare between table storage and redis about from 50th minute in the videos.
Perhaps it can be as reference. But detail performance it depends on your application, data records and so on.
The pricing of the table storage depends on the capacity of table storage, please refer to details. It is much cheaper than redis.
There are many differences you might care about, including price, performance, and feature set. And, persistence of data, and data consistency.
Because redis is an in-memory data store it is pretty expensive. This is so that you may get low latency. Check out Azure's planning FAQ here for a general understanding of redis performance in a throughput sense.
Azure Redis planning FAQ
Redis does have an optional persistence feature, that you can turn on, if you want your data persisted and restored when the servers have rare downtime. But it doesn't have a strong consistency guarantee.
Azure Table Storage is not a caching solution. It’s a persistent storage solution, and saves the data permanently on some kind of disk. Historically (disclaimer I have not look for the latest and greatest performance numbers) it has much higher read and write latency. It is also strictly a key-value store model (with two-part keys). Values can have properties but with many strict limitations, around size of objects you can store, length of properties, and so on. These limitations are inflexible and painful if your app runs up against them.
Redis has a larger feature set. It can do key-value but also has a bunch of other data structures like sets and lists, and many apps can find ways to benefit from that added flexibility.
See 'Introduction to Redis' (redis docs) .
CosmosDB could be yet another alternative to consider if you're leaning primarily towards Azure technologies. It is pretty expensive, but quite fast and feature-rich. While also being primarily intended to be a persistent store.
I've got an application that's outgrowing SQL Azure - at the price I'm willing to pay, at any rate - and I'm interested in investigating Azure DocumentDB. The preview clearly has distinct scalability limits (as described here, for instance), but I think I could probably get away with those for the preview period, provided I'm using it correctly.
So here's the question I've got. How do I need to design my application to take advantage of the built-in scalability of the Azure DocumentDB? For instance, I know that with Azure Table Storage - that cheap but awful highly limited alternative - you need to structure all your data in a two-step hierarchy: PartitionKey and RowKey. Provided you do that (which is nigh well impossible in a real-world application), ATS (as I understand it) moves partitions around behind the scenes, from machine to machine, so that you get near-infinite scalability. Awesome, and you never have to think about it.
Scaling out with SQL Server is obviously much more complicated - you need to design your own sharding system, deal with figuring out which server the shard in question sits on, and so forth. Possible, and done right quite scalable, but complex and painful.
So how does scalability work with DocumentDB? It promises arbitrary scalability, but how does the storage engine work behind the scenes? I see that it has "Databases", and each database can have some number of "Collections", and so forth. But how does its arbitrary scalability map to these other concepts? If I have a SQL table that contains hundreds of millions of rows, am I going to get the scalability I need if I put all this data into one collection? Or do I need to manually spread it across multiple collections, sharded somehow? Or across multiple DB's? Or is DocumentDB somehow smart enough to coalesce queries in a performant way from across multiple machines, without me having to think about any of it? Or...?
I've been looking around, and haven't yet found any guidance on how to approach this. Very interested in what other people have found or what MS recommends.
Update: As of April 2016, DocumentDB has introduced the concept of a partitioned collection which allows you scale-out and take advantage of server-side partitioning.
A single DocumentDB database can scale practically to an unlimited amount of document storage partitioned by collections (in other words, you can scale out by adding more collections).
Each collection provides 10 GB of storage, and an variable amount of throughput (based on performance level). A collection also provides the scope for document storage and query execution; and is also the transaction domain for all the documents contained within it.
Source: http://azure.microsoft.com/en-us/documentation/articles/documentdb-manage/
Here's a link to a blog post I wrote on scaling and partitioning data for a multi-tenant application on DocumentDB.
With the latest version of DocumentDB, things have changed. There is still the 10GB limit per collection but in the past, it was up to you to figure out how to split up your data into multiple collections to avoid hitting the 10 GB limit.
Instead, you can now, specify a partition key and DocumentDB now handles the partitioning for you e.g. If you have log data, you may want to partition the data on the date value in your JSON document, so that each day a new partition is created.
You can fan out queries like this - http://stuartmcleantech.blogspot.co.uk/2016/03/scalable-querying-multiple-azure.html
We want to implement caching in Azure for two main reasons:
Speed up repetive data access
Reduce stress on the database
Here are the characteristics of the data we are planning to cache:
Relatively small (1 - 100 kb)
Specific to each customer
Not private, but we don't really want random people navigating through our entire cache
XML or JSON
Consumed by C# (i.e. not linked to directly in the html)
Most weeks the data will not change, although some days the data could change several times
For this specific purpose Table storage appears better than Blob storage (we did just implement Blob storage for images, CSS, and JavaScript) and Windows Azure Caching appears better than Windows Azure Shared Cache (perhaps almost always better and the shared caching is mostly a legacy feature at this point).
The programming API of both appears straight forward. Compared to what we pay for cloud sites the cost of each seems to be negligible.
So far we are leaning toward Table Storage due to what we perceive to be the pros and cons of Azure Caching. As old .Net guys we are much more familiar with In-Memory Cache than NoSql style solutions:
Problems with Windows Azure Caching:
If the VM is moved to a different server (by Microsoft for load balancing or whatever reasons) is the in-memory cache moved intact?
We are guessing that whenever we publish changes to the cloud it wipes out the existing in-memory cache
While the users rarely make changes to the cached data when they do make changes it is likely that they may make multiple updates within seconds and we are not sure how this is going to work with cache located across multiple nodes running web roles especially with increased traffic. (this is probably a concern with table storage as well!)
Table storage appears like it will be easier to debug
Advantages of Windows Azure Caching
somewhat faster
Your familiarity with in-memory caching is the model that you need to understand to implement caching on Windows Azure. The 'NoSql style' is not caching, but storage. So table storage rather replaces SQL than it replaces caching. Table storage is for persistent, reliable storage — with all of the latency and other disadvantages of persistence that do not exist with in-memory cache.
Writing to cache is always secondary. When your users 'make changes to the cached data' you will always be writing out the data to disk (e.g. SQL), and then writing out the same data to the cache because you might as well, since you have the data on-hand (although secondary effects on written data may mean that you should invalidate or re-read the cached item).
The wiping out of data when a machine recycles should not be much of a concern, as the data is stored elsewhere. Every read from the cache should be followed by an 'if not found then read from database' kind of statement. You can warm-up the cache when a role starts to pre-populate items that you know that you are going to need.
Caching on Azure is distributed across the nodes and updating an existing item will always update on the node that it resides. Quick updates may be less of a problem than you think.
For in-memory caching use Windows Azure caching (you are right about shared caching being legacy) and, depending on your needs, look at other caching technologies like memcached. Caching and table storage are not comparable. Table storage is for long-term persistence. Don't unnecessarily hack table storage to do caching — making table storage temporary creates a whole bunch of things that you need to worry about yourself, like expiry and invalidation.
SQL Azure storage is a lot more expensive than Windows Azure Storage. Would implementing a no-sql solution like RavenDB allow me to store data on the cheaper Azure Storage?
Are there other things to consider, like backup, speed or security?
Thank you.
You have to consider that with SQL Azure you not only get the storage, but the database server too. If you implement RavenDB, you will will need a worker role to host it in and, in order to allow for failure of that worker role, another worker role (replica), which also doubles up the storage.
Bear in mind that with SQL Azure you get a highly available (3x replicated with failover) SQL solution that surfaces a familiar (ADO.NET) API. Make your choices based on aspects other than storage cost, such as operational effort and development effort. If you choose RavenDB it should be because of the potential cost savings in development effort (because of the closeness on the document API to the object graph) and operational cost, because RavenDB is 'administered' as part of the application. Cost of storage of actual bytes, particularly at scale, is a marginal consideration.
Adding a bit to #Simon's answer: When considering Table Storage and its low cost, also consider whether you can use it directly, instead of going with an installed-and-managed-by-you NoSQL database engine. As it stands, Table Storage offers a schemaless solution that lets you store essentially a property bag within a row, indexed by partitionkey+rowkey. Does that work for you? Could you work with a few extra tables to give you additional indexing? If so, your storage cost is going to be really low (and still durable, triple-replicated).
If you find yourself writing significant code to manage Table Storage, then it may be more efficient to invest in the Compute instances needed to run RavenDB. When considering this, also consider that you'll likely want larger VM sizes if you're moving significant data (as you get approx. 100Mbps per core). A database like MongoDB, working with memory-mapped files, really ramps up speed-wise with more RAM. Not sure if this is the same with RavenDB.
I'm working on Integration project where third party will call our web service in Azure. For performance reason I would like to store 2 table data (more than 1000 records) on to the app fabric cache.
Could anyone please suggest if this is the right design pattern?
Depending on how much data this is (you don't mention how wide the tables are) you have a couple of options
You could certainly store it in the azure cache, this will cost though.
You might also want to consider storing the data in the http runtime cache which is free but not distributed.
You choice would largely depend on the size of the data, how often it changes and what effect is caused if someone receives slightly out of date data.