Amazon ElasticCache for Redis with Node.js server - node.js

I am using Redis in my Node.js application. I don't use it for caching and I don't want to. I want my data in the Redis to be persistent at any point. Also my every call to redis write to the disk. Is it helpful to use the Amazon elastic cache in such case? Because I understand that Amazon elastic cache handles standby replication and automatic failover which is very important to me. I am running my Node.js server on Amazon EC2. Any help or suggestion would be appreciated.

Currently Amazon ElasticCache's way for keeping a persistent state is through snapshotting which means it uses the Backup and Restore feature to keep a copy in an S3 bucket that you can use for loading your data again in the case of losing it or warming up a new instance.
The backup and restore feature uses BGSAVE in the background, and as a heavy operation on your instance if setup to be done periodically , it is recommended to be running on a read replica.
So to answer your question; I do not think Amazon ElasticCache is a solution for your problem. it was meant for solutions who are looking for a cache layer to scale/speed up lookups for their apps that are running on other storage engines.
Update: As a manually setup alternative (taken from the comments)
If you are opened to setting up your own instance of Redis cluster redis.io/topics/cluster-spec that will be your best bet, it takes care of AFO, and replication, with persistence options enabled as append only file or backing up to RDB files

Related

AWS multiple services inside one EC2 instance for great speed?

Since I used separate hosting for the database and node.js server, the speed would not be good. If everything is on one machine, the local data exchange will be faster. How can I run AWS services on one instance (node.js, redis, mongodb). In production It is not recommended to run the database together with the server. Is it possible to fine-tune AWS to ensure the same speed between the databases and the server as on a single computer?
Please Help me do not spare your advice!

How to reload tensorflow model in Google Cloud Run server?

I have a webserver hosted on cloud run that loads a tensorflow model from cloud file store on start. To know which model to load, it looks up the latest reference in a psql db.
Occasionally a retrain script runs using google cloud functions. This stores a new model in cloud file store and a new reference in the psql db.
Currently, in order to use this new model I would need to redeploy the cloud run instance so it grabs the new model on start. How can I automate using the newest model instead? Of course something elegant, robust, and scalable is ideal, but if something hacky/clunky but functional is much easier that would be preferred. This is a throw-away prototype but it needs to be available and usable.
I have considered a few options but I'm not sure how possible either of them are:
Create some sort of postgres trigger/notification that the cloud run server listens to. Guess this would require another thread. This ups complexity and I'm unsure how multiple threads works with Cloud Run.
Similar, but use a http pub/sub. Make an endpoint on the server to re-lookup and get the latest model. Publish on retrainer finish.
could deploy a new instance and remove the old one after the retrainer runs. Simple in some regards, but seems riskier and it might be hard to accomplish programmatically.
Your current pattern should implement cache management (because you cache a model). How can you invalidate the cache?
Restart the instance? Cloud Run doesn't allow you to control the instances. The easiest way is to redeploy a new revision to force the current instance to stop and new ones to start.
Setting a TTL? It's an option: load a model for XX hours, and then reload it from the source. Problem: you could have glitches (instances with new models and instances with the old one, up to the cache TTL expires for all the instances)
Offering cache invalidation mechanism? As said before, it's hard because Cloud Run doesn't allow you to communicate with all the instances directly. So, push mechanism is very hard and tricky to implement (not impossible, but I don't recommend you to waste time with that). Pull mechanism is an option: check a "latest updated date" somewhere (a record in Firestore, a file in Cloud Storage, an entry in CLoud SQL,...) and compare it with your model updated date. If similar, great. If not, reload the latest model
You have several solutions, all depend on your wish.
But you have another solution, my preference. In fact, every time that you have a new model, recreate a new container with the new model already loaded in it (with Cloud Build) and deploy that new container on Cloud Run.
That solution solves your cache management issue, and you will have a better cold start latency for all your new instances. (In addition of easier roll back, A/B testing or canary release capability, version management and control, portability, local/other env testing,...)

Nodejs API, Docker (Swarm), scalability and storage

I programmed an API with nodejs and express like million others out there and it will go live in a few weeks. The API currently runs on a single docker host with one volume for persistent data containing images uploaded by users.
I'm now thinking about scalability and a high availability setup where the question about network volumes come in. I've read a lot about NFS volumes and potentially the S3 Driver for a docker swarm.
From the information I gathered, I sorted out two possible solutions for the swarm setup:
Docker Volume Driver
I could connect each docker host either to an S3 Bucket or EFS Storage with the compose file
Connection should work even if I move VPS Provider
Better security if I put a NFS storage on the private part of the network (no S3 or EFS)
API Multer S3 Middleware
No attached volume required since the S3 connection is done from within the container
Easier swarm and docker management
Things have to be re-programmed and a few files needs to be migrated
On a GET request, the files will be provided by AWS directly instead of the API
Please, tell me your opposition on this. Am I getting this right or do I miss something? Which route should I take? Is there something to consider with latency or permissions when mounting from different hosts?
Tipps on S3, EFS are definitely welcome, since I have no knowledge yet.
I would not recommend saving to disk, instead use S3 API directly - create buckets and write in your app code.
If you're thinking of mounting a single S3 bucket as your drive there are severe limitations with that. The 5Gb limit. Anytime you modify contents in any way the driver will reupload the entire bucket. If there's any contention it'll have to retry. Years ago when I tried this the fuse drivers weren't stable enough to use as part of a production system, they'd crash and you have to remount. It was a nice idea but could only be used as an ad hoc kind of thing on the command line.
As far as NFS for the love of god don't do this to yourself you're taking on responsibility for this on yourself.
EFS can't really comment, by the time it was available most people just learned to use S3 and it is cheaper.

Best way to implement server side cache in Node JS

I'm trying to implement the server-side cache in Node JS, I've read about express-redis-cache, but how would this solution work with load balanced node servers? I might use something like AWS Redis Service, but it loses the whole purpose of using Redis on some external server as it increases latency. Can you suggest the best approach for this?
PS - I have some .md & .json files using which I generate the .html files and return. So, instead of doing this, I want to have some caching which will store this generated .html files. I'll update the cached content only when my .md & .json files are updated.
I've read about express-redis-cache, but how would this solution work
with load balanced node servers?
It wouldn't be a problem because all your load balanced node servers would connect to the same Redis host, which is fine.
I might use something like AWS Redis Service, but it loses the whole
purpose of using Redis on some external server as it increases latency
It depends how you architect your app. If you are fully hosted on AWS, Elasticache is designed for this, latency would be minimal as connection will be within the VPC which is fast. If you need to connect to elasticache from a client on premise, you still have options: VPN (not ideal) or DirectConnect which would be much faster than a VPN.
Having said that, if you are looking to cache .html files
probably then look at CloudFront instead of a bespoke caching solution using Redis.

cloudfoundry: how to use filesystem

I am planning to use cloudfoundry paas service (from VMWare) for hosting my node.js application. I have seen that it has support for mongo and redis in the service layer and node.js framework. So far so good.
Now I need to store my mediafiles(images uploaded by users) to a filesystem. I have the metadata stored in Mongo.
I have been searching internet, but have not yet got good information.
You cannot do that for the following reasons:
There are multiple host machines running your application. They each have their own filesystems. Each running process in your application would see a different set of files.
The host machines on which your particular application is running can change moment-to-moment. Indeed, they will change every time you re-deploy your application. Every time a process is started on a new host machine, it will see an empty set of files. Every time a process is stopped on an old host machine, all the files would be permanently deleted.
You absolutely must solve this problem in another way.
Store the media files in MongoDB GridFS.
Store the media files in an object store such as Amazon S3 or Rackspace Cloud Files.
Filesystem in most cloud solutions are "ephemeral", so you can not use FS. You will have to use solutions like S3/ DB for such purpose

Resources