How do I share a cache across Node workers with Redis? - node.js

Forgive me if this is a really dumb question. I have been googling for the past hour and can't seem to find it answered anywhere.
Our application needs to query our CMS database every hour or so to update all of its non-user-specfic CMS content. I would like to store that data in one place and have all the workers have access to it - w/o each worker having to call the API every hour. Also I would like this cache to persist in the event of a node worker crash. Since we're pretty new to node here I predict we might have some of those.
I will handle all the cache expiration logic. I just want a store that can be shared between users, can handle worker crashing and restarting, and is at the application level - not the user level. So user sessions are no good for this.
Is Redis even what I'm looking for? Sadly it may be too late to install mongo on our web layer for this release anyway. Pub/sub looks promising but really seems like it's made for messaging - not a shared cache. Maybe I am reading that wrong though.
Thank you so much stack overflow! I promise to be a good citizen now that I have registered.

Redis is a great solution for your problem. Not sure why you are considering pub/sub though. Doesn't sound like the workers need to be notified when the cache is updated, they just need to be able to read the latest value written to the cache. You can use a simple string value in redis for this stored under a consistent key.
In summary, you'd have a process that would update a redis key (say, cms-cache-stuff) every hour. Each worker which needs that data will just GET cms-cache-stuff from redis every time it needs that cached info.
This solution will survive both the cache refresh process crashing or workers crashing, since the key in redis will always have data in it (though that data will be stale if the refresh process doesn't come back up).
If for some wild reason you don't want the workers continually reading from redis (why not? its plenty fast enough) you could still store the latest cached data in cms-cache-stuff and then publish a message through pub/sub to your workers letting them know the cache is updated, so they can read cms-cache-stuff again. This gives you durability and recovery, since crashed workers can just read cms-cache-stuff again at startup and then start listening on the pub/sub channel for additional updates.
Pub/sub alone is pretty useless for caching since it provides no durability. If a worker is crashed and not listening on the channel, the messages are simply discarded.

Well as I suspected my problem was a super-basic noob mistake that's hard to even even explain well enough to get the "duh" answer. I was using the connect-redis package, which is really designed for sessions, not a cache. Once someone pointed to node_redis client I was able to pretty easily get it set up and do what I wanted to do.
Thanks a lot - hopefully this helps some redis noob in the future!

Related

node: persist data after process termination

I'm using node-cache to cache data from a CLI application that observes changes in files and caches them to avoid new data processing.
the problem is that I noticed that this cache is destroyed on each command, since each time the tool is called in the terminal a new instance is generated and the old one is destroyed. probably, the data is also destroyed.
I need to keep, for a specific TTL, two things in cache/memory, even if the process ends:
the processed data
the specific instance of fs.watcher, watching and executing caching operations
the question is: how do i do it? I've been searching for days on the internet and trying alternatives and I can't find a solution.
I need to keep ... things in cache/memory, even if the process ends
That's, pretty much by definition, not possible. When a process terminates, all its resources are freed up for use by something else (barring a memory-leak bug in the OS itself).
It sounds like you need to refactor your app into a service that can run in the background and separate front-end that can communicate with it.

Scheduling function calls in a stateless Node.js application

I'm trying to figure out a design pattern for scheduling events in a stateless Node back-end with multiple instances running simultaneously.
Use case example:
Create an message object with a publish date/time and save it to a database
Optionally update the publishing time or delete the object
When the publish time is reached, the message content is sent to a 3rd party API endpoint
Right now my best idea is to use bee-queue or bull to queue delayed jobs. It should be able to store the state and ensure that the job is executed only once. However, I feel like it might introduce a single point of failure, especially when maintaining state on Redis for months and then hoping that the future version of the queue library is still working.
Another option is a service worker that polls the database for upcoming events every n minutes, but this seems like a potential scaling issue down the line for multi-tenant SaaS.
Are there more robust design patterns for solving this?
Don't worry about redis breaking. It's pretty stable, and eventually you can decide to freeze the version.
If there are jobs that will be executed in the future I would suggest a database, like Mongo or Redis, with a disk-store. So you will survive a reboot, you don't have to reinvent the wheel, and already have a nice set of tools for scalability.

Is node.js suitable for long polling which requires an open connection at all times?

I have an online now feature which requires me to set a field in my database which I have integrated into getting notification updates. As such this is being done via long polling (since short polling it isn't much better and this results in less connections to the server).
I used to do this on PHP but as those of you who know about PHP will understand, PHP will lose all it's available connections quite quickly, even under fpm.
So I turned to node.js which is supposed to be able to handle thousands, if not millions, of concurrent connections but the more I look it seems node.js handles these via event based programming. Of course event based programming has massive benefits.
This is fine for chat apps and what not but what if I have an online now feature that I have integrated into long polling to mark that a user is still online?
Would node.js still get saturated quickly or is it actually able to handle these open connections still?
Long Polling with Node.js
Long Polling will eat up some of your connection pool, so be sure to set your ulimit high if using a Linux or Unix variety.
Ideally you'll maintain state in something like memcached or redis. A prefered approach would be to use Redis. For this you'll subscribe to a pub/sub channel, and everytime the user state updates you'll publish an event. This will trigger a handler which will cause your long-poll to respond with the updated status/s. This is typically prefered to scheduling and much cleaner, but as long as you're not looping or otherwise blocking node's thread of execution you shouldn't see any problems.
Short Polling with PHP
As you're already using a PHP stack it might be prefered to not move away from that. PHP's(more so php-fpm) paradigm starts a process per connection, and these processes are set to timeout. So long polling isn't really an option.
Short polling on intervals can update the state on the front-end. As you specified that you are using cronjob, it might be cleaner to just hold the state in memory on the front-end and update it periodically.
This should work, however this might increase your need to scale earlier, as each user will be sending n more requests. However, this might be the easiest approach, and you're not adding unnecessary complexity to your stack.
Websockets
Adding websockets for such a simple feature is likely overkill, and websockets themselves can only have a limited amount of connections(depending on your host and configurations) so you're not really solving any of the issues that long polling presents. If you don't plan to use websockets for more than just maintaining user state then you're adding another technology to your stack to solve a simple problem.

Azure Redis Cache data loss?

I have a Node.js application that receives data via a Websocket connection and pushes each message to an Azure Redis cache. It stores a persistent array of messages in a variable for downstream use, and at regular intervals syncs that array from the cache. Bit convoluted, but at a later point I want to separate out the half of the application that writes to the cache from the half of it that reads from it..
At around 02:00 GMT, based on the Azure portal stats, I appear to have started getting "cache misses" on that sync, which last for a couple of hours before I started getting "cache hits" again sometime around 05:00.
The cache misses correspond to a sudden increase in CPU usage, which peaks at around 05:00. And when I say peaks, I mean it hits 81%, vs a previous max of about 6%.
So sometime around 05:00, the CPU peaks, then drops back to normal, the "cache misses" go away, but looking at the cache memory usage, I drop from about 37.4mb used to about 3.85mb used (which I suspect is the "empty" state), and the list that's being used by this application was emptied.
The only functions that the application is running against the cache are LPUSH and LRANGE, there's nothing that has any capability to remove data, and in case anybody was wondering, when the CPU ramped up the memory usage did not so there's nothing to suggest that rogue additions of data cropped up.
It's only on the Basic plan, so I'm not expecting it to be invulnerable or anything, but even without the replication features of the Standard plan I had expected that it wouldn't be in a position to completely wipe itself - I was under the impression that Redis periodically writes itself to disk and restores from that when it recovers from an error.
All of which is my way of asking:
Does anybody have any idea what might have happened here?
If this is something that others have been able to accidentally trigger themselves, are there any gotchas I should be looking out for that I might have in other applications using the same cache that could have caused it to fail so catastrophically?
I would welcome a chorus of people telling me that the Standard plan won't suffer from this sort of issue, because I've already forked out for it and it would be nice to feel like that was the right call.
Many thanks in advance..
Here my thoughts:
Azure Redis Cache stores information in memory. By default, it won't save a "backup" on disk, so, you had information in memory, for some reason the server got restarted and you lost your data.
PS: See this feedback, there is no option to persist information on disk using azure-redis cache yet http://feedback.azure.com/forums/169382-cache/suggestions/6022838-redis-cache-should-also-support-persistence
Make sure you don't use Basic plan. Basic plan doesn't suppose SLA and from my expirience it lost data quite often
Standard plan provides SLA and utilize 2 instances of Redis Cache. It's quite stable and it didn't lose our data, although such case still possible.
Now, if you're going to use Azure Redis as database, but not as a cache you need to utilize data persistance feature, which is already available in Azure Redis Cache Premium Tier: https://azure.microsoft.com/en-us/documentation/articles/cache-premium-tier-intro (see Redis data persistence)
James, using the Standards instance should give you much improved availability.
With the Basic tier any Azure Fabric update to the Master Node (or hardware failure), will cause you to loose all data.
Azure Redis Cache does not support persistence (writing to disk/blob) yet, even in Standard Tier. But the Standard tier does give you a replicated slave node, that can take over if you Master goes down.

Should I keep MongoDB connection alive if I'm adding a record every 10 seconds?

I've written a small Node.js script that tests a part my company's API latency - specifically latency for chat messages every 10 seconds. I need to store this data and I think MongoDB is probably my only realistic option. It's a tiny bit of data that I need to store, only 1 number and a timestamp.
So that made me wonder... Should I leave the MongoDB connection alive, constantly, only having a rest when I happen to restart the script (or when it crashes), or should I connect and disconnect every 10 seconds? If it helps, the MongoDB will be on the same server and it will only be storing this data for now (I mean it's a fresh install).
Thanks
If you are going to store only a number and a timestamp, I would have a look at Redis. Redis is a key/value store and since redis is rather memory oriented it's really good for frequently updated real-time data, such as statistics for example.
I've been mostly using mongodb lately, and its really fast aswell but a litte more feature-rich nosql db than Redis, and therefor maybe a bit slower in this scenario.
- Should I leave the MongoDB connection alive
Yes, that is fine and typical behavior. Open the connection once, and make queries against it for as long as you want. When your node.js process dies it will automatically be closed.

Resources