Is there a single word for a cache that is read-through, but not a write-back cache? - naming

I am looking for a single word (or a phrase) which may be standard to describe a cache that is read-through, but not a write-back, Basically, it reads data from the underlying DB or data store, but does not write the results back to the DB.

Read-only cache seems to capture what you're describing.
For example, one IBM database defines it this way.
When configured as a read-only cache, the data is owned by the backend database. This ownership means that the data stored in the cache cannot be modified by the application. In read-only configuration, applications can modify the data directly in the backend database and changes can be synchronized to the cache, transaction by transaction, automatically or on-demand. This configuration is ideal for applications that require fast access to data that changes occasionally, such as price lists, or reference or lookup data.
The Wikipedia article on caching also uses the phrase read-only a couple of times.

Related

Best persistent data storage system for an alternative to global variables?

I am building a Node.js application which uses a few global variables to track data such as online users and statuses, information about other servers, and ongoing events, but having this information be lost in the event of server restart/crash is not ideal.
As these things are frequently read & modified, I figure it would not be a good idea to put that extra strain on my existing MySQL database. I have looked into Redis but unfortunately my application is hosted on a Windows server so I would have to use an old unsupported version of it which isn't ideal.
I'm currently considering setting up a NoSQL database such as MongoDB, but I'm not sure if this is an efficient solution and if it would be too much on my relatively weak server to have an application and 2 different databases running.
What would be the best solution for persistent storage of data that needs to be frequently accessed and updated by an application?
Making my comments into an answer...
If it's a reasonable amount of data, you can just write JSON to a single data file. No database required. Just overwrite the file with a new block of JSON to save the new state. This is very fast, efficient and simple. I've used this before as a quick and easy way to regularly save snapshots of state that you want to be able to reload if your server restarts. Read the state into memory upon server start, then use it from memory, then regularly save a new snapshot to disk however often your application desires.
If some data changes a lot and some data doesn't change very much, you can break the data into multiple files so you're writing less data on the more frequent interval. Obviously, there is a threshold of amount of data or frequency of writes or complexity of data access where a database would be warranted, but you should at least consider the simpler option first and only add a new database when you think you really need it.
If you cluster your servers in the future, that would speak to a multi-user database (one with appropriate concurrency management features) to be your master keeper of state, but you're going to have other design issues to work through if you're trying to share multi-user state (like online status) across all clustered servers as you can no longer keep that in memory for any server unless all state changes are broadcast to all servers so they can update their in-memory copy of the data or unless you make users sticky to a particular server (which complicates load balancing in clustering). That does somewhat call for a redis-like central store that all clustered servers can access.

node.js keep a small in-memory database

I have an API-service in Node.js, basically what it does is gets id from request, reads record with this id from the database and returns it back in response.
While there are many clients with different ids usually only about 10-20 of them are used in a given timespan.
Is it a good idea to create an object with ids as keys and store the resulting record along with last_requested time to emulate a small database with fast-access? Whenever a record is requested I will update the last_requested field with new Date(). Also, create a setInterval() to delete those keys which were not used for some time.
Records in the database do not change often, and when they do I can restart the service (there are several instances running simultaneously via PM2, so they can be gracefully restarted).
If the required id is not found in this "database" a request to real database will be performed and the result will be stored in the object in a new key.
You're talking about caching. And it's very useful, if
You have a lot of reads, but not a lot of writes. i.e. Lots of people request a record, and it changes rarely.
You have a lot of free memory, or not many records.
You have a good indication of when to invalidate the cache.
For trivial usecases (i.e. under 50 requests / second), you probably don't need an in-memory cache for the database. Moreover, database access is very fast if you use the tools the database gives you (like persistent connection pools, consistent parameterized queries, query cache, etc).
It all depends on your specific usecase. But I wouldn't do it until I actually start encountering performance problems, and determine that the database is the bottleneck.
It's not just a good idea, caching is a necessity in different level of a computational system. Caching start from the CPU level (L1, L2, L3), OS Level up to application level which must be done by the developer.
Even if you have a well structured Database with good indexes, still there is an overhead for TCP-IP communication between your app and database. So if you are going to access some row frequently it's a must to have them in your app process.
The good news is Node.js apps are single process resident in memory (unlike PHP or other scripting programs which come and go). So you can load frequent required data and omit the database access.
The best mechanism to store the record can be an LRU (least-recently-used) cache. There are several LRU cache packages available for node.js:
https://github.com/adzerk/node-lru-native
https://github.com/isaacs/node-lru-cache
https://www.npmjs.com/package/simple-lru-cache
In an LRU cache you can define how much memory the cache can use, expiry age of each item, and how many item it can store! or you can write your own!

design pattern to expire documents on cloudant

So when a document is deleted, the metadata is actually preserved forever. For a hosted service like cloudant, where storage costs every month, I instead would like to completely purge the deleted documents.
I read somewhere about a design pattern where you use dbcopy in a view to put the docs into a 'current' db then periodically delete the expired dbs. But I cant find the article, and I don't quite understand how database naming would work. How would the cloudant clients always know the 'current' database name?
Cloudant does not expose the _purge endpoint (the loose consistency guarantees between the clustered nodes make purging tricky).
The most common solution to this problem is to create a second database and use replication with a validate_document_update so that deleted documents with no existing entry in the target database are rejected. When replication is complete (or acceptably up-to-date if using continuous replication), switch your application to use the new database and delete the old one. There is currently no way to rename databases but you could use a virtual host which points to the "current" database.
I'd caution that a workload which generates a high ratio of deleted:active documents is generally an anti-pattern in Cloudant. I would first consider whether you can change your document model to avoid it.
Deleted documents are kept for ever in couchdb. Even after compaction .Though the size of document is pretty small as it contains only three fields
{_id:234wer,_rev:123,deleted:true}
The reason for this is to make sure that all the replicated databases are consistent. If a document that is replicated on several databases is deleted from one location there is no way to tell it to other replicated stores.
There is _purge but as explained in the wiki it is only to be used in special cases.

Table Storage Service (Azure's implementation of nosql) vs Windows Azure Caching (unstructured in memory cached)

We want to implement caching in Azure for two main reasons:
Speed up repetive data access
Reduce stress on the database
Here are the characteristics of the data we are planning to cache:
Relatively small (1 - 100 kb)
Specific to each customer
Not private, but we don't really want random people navigating through our entire cache
XML or JSON
Consumed by C# (i.e. not linked to directly in the html)
Most weeks the data will not change, although some days the data could change several times
For this specific purpose Table storage appears better than Blob storage (we did just implement Blob storage for images, CSS, and JavaScript) and Windows Azure Caching appears better than Windows Azure Shared Cache (perhaps almost always better and the shared caching is mostly a legacy feature at this point).
The programming API of both appears straight forward. Compared to what we pay for cloud sites the cost of each seems to be negligible.
So far we are leaning toward Table Storage due to what we perceive to be the pros and cons of Azure Caching. As old .Net guys we are much more familiar with In-Memory Cache than NoSql style solutions:
Problems with Windows Azure Caching:
If the VM is moved to a different server (by Microsoft for load balancing or whatever reasons) is the in-memory cache moved intact?
We are guessing that whenever we publish changes to the cloud it wipes out the existing in-memory cache
While the users rarely make changes to the cached data when they do make changes it is likely that they may make multiple updates within seconds and we are not sure how this is going to work with cache located across multiple nodes running web roles especially with increased traffic. (this is probably a concern with table storage as well!)
Table storage appears like it will be easier to debug
Advantages of Windows Azure Caching
somewhat faster
Your familiarity with in-memory caching is the model that you need to understand to implement caching on Windows Azure. The 'NoSql style' is not caching, but storage. So table storage rather replaces SQL than it replaces caching. Table storage is for persistent, reliable storage — with all of the latency and other disadvantages of persistence that do not exist with in-memory cache.
Writing to cache is always secondary. When your users 'make changes to the cached data' you will always be writing out the data to disk (e.g. SQL), and then writing out the same data to the cache because you might as well, since you have the data on-hand (although secondary effects on written data may mean that you should invalidate or re-read the cached item).
The wiping out of data when a machine recycles should not be much of a concern, as the data is stored elsewhere. Every read from the cache should be followed by an 'if not found then read from database' kind of statement. You can warm-up the cache when a role starts to pre-populate items that you know that you are going to need.
Caching on Azure is distributed across the nodes and updating an existing item will always update on the node that it resides. Quick updates may be less of a problem than you think.
For in-memory caching use Windows Azure caching (you are right about shared caching being legacy) and, depending on your needs, look at other caching technologies like memcached. Caching and table storage are not comparable. Table storage is for long-term persistence. Don't unnecessarily hack table storage to do caching — making table storage temporary creates a whole bunch of things that you need to worry about yourself, like expiry and invalidation.

SaaS, central database, database per user, or combination?

Problem at hand is as follows:
SaaS to keep maintenance records
95% of data would be specific to each user i.e. no need to be accessed by other users
5% of data shared (and contributed by all users), like parts that are used in maintenance
SaaS to be delivered as CouchApp i.e. with public facing CouchDB
So I am torn between database per user, and single database for all users.
Database per user seems to offer much easier backup and maintenance, smaller data set, and easier access control. On the negative side how could I handle shared data?
Is it possible to have database per user, and one common database for shared information (parts)? Then replicate parts documents from all user databases to central one, from there back to all user databases? How to handle conflicts in that case (or even better avoid if possible)?
Or any much simpler approach? Or bite the bullet and go with just one central database?
It depends on the nature of the shared data, I guess. It seems natural to have filtered replication flowing from the user databases to the shared databases and unfiltered replication from the shared database to the user databases; I think that covers your requirements? It makes it so that each user only has to read/write from/to their specific database, while you can still distribute out the shared docs.
It may be easier to query from the shared database directly instead of replicating it back into the user databases, but that really depends on what kind of data would be in there.

Resources