Persist data for 24 hours - node.js

I need to build a microservice that scrapes a message once a day and persists it somewhere. It does not need to be accessible after 24 hours (it can be deleted). It doesn't really matter where or how, but I need to access it from an Express.js endpoint and return the message. Currently we use Redis and MongoDB for data persistence. It feels wrong to create a whole collection for one tiny service, and I'm not sure of an application of Redis that would fulfill this task. What's my best option? Open to any suggestions, thank you!

You can use YUGABYTE DB, and you can set TABLE LIVE= 24 Hours, then data will be deleted.

Redis provide an expiration mechanism out of the box. You can associate a timeout to a key, and it will be automatically deleted after the timeout has expired. Some official documentation here
Redis also provides logical databases, if you want to keep this expiring keys separated from the rest of your application. So you do not need to spin up another machine. Some official documentation here

Related

Redis stored procedures like functionality

I'm trying to implement a basic service that receives a msg and time in the future and once the time arrives, it prints the msg.
I want to implement it with Redis.
While investigating the capabilities of Redis I've found that I can use https://redis.io/topics/notifications on expired keys together with subscribing I get what I want.
But I am facing a problem, if the service is down for any reason, I might lose those expiry triggers.
To resolve that issue, I thought of having a queue (in Redis as well) which will store expired keys and once the service is up, it will pull them all the expired values, but for that, I need some kind of "stored procedure" that will handle the expiry routing.
Unfortunately, I couldn't find a way to do that.
So the question is: is it possible to implement with the current capabilities of Redis, and also, do I have alternatives?

Caching posts using redis

I have a forum which contains groups, new groups are created all the time by users, currently I'm using node-cache with ttl to cache groups and it's content (posts, likes and comments).
The server worked great at the begging but the performance decreased when more people start using the app, so I decided to use the node.js Cluster module as the next step to improve performance.
The node-cache will cause a consistency problem, the same group could be cached in two workers, so if one of them changed, the other will not know (unless you do).
The first solution that came to my mind is using redis to store the whole group and it's content with the help of redis datatypes (sets and hash objects), but I don't know how efficient this could be.
The other solution is using redis to map requests to the correct worker, in this case the cached data is distributed randomly in workers, so when a worker receives a request that related to some group, he checks the group owner(the worker that holds this group instance in-memory) in redis and ask him to get the wanted data using node-ipc and then return it to the user.
Is there any problem with the first solution?
The second solution does not provides a fairness (if all the popular groups landed in the same worker), is there a solution for this?
Any suggestions?
Thanks in advance

Solution for database updation without hitting db after some time intervals

Hitting a db again and again on some time intervals is a big mess as if there are 100k users logged in db will get 1 million request every 10 seconds which i cant afford. I have researched a lot about this issue and need a perfect solution for this.
(Working in NODEJS & PostgreSQL)
Postgres 9.4+ provides logical decoding which gives access to row level changes. You can listen to the write ahead log of postgres and have your application receive data as push from the database.
You may have to build a middleware that does it for you. I found a good write up that talks about utilizing logical decoding and apache kafka streams.
https://www.confluent.io/blog/bottled-water-real-time-integration-of-postgresql-and-kafka/

MongoDB Multiple database vs single database

I have a NodeJS project that using mongodb as main database.
Regular, I just use one database for containing all information (users, organization, messages,...)
But now, I need to store one more thing - log data - which grow very very fast.
So I consider store log in other database to keep current database safe and fast.
Does anyone has experience in this, Is that better than single database?
Not a real question the mods will certainly say. You have a few options depending on your log data and how / how often you want to access it.
Capped collections if you don't need to store the logs for a long time
Something like Redis to delay writing to the log and keep the app responding fast
Use a replica set to distribute the database load.

Is this MEAN stack design-pattern suitable at the 1,000-10,000 user scale?

Let's say that when a user logs into a webapp, he sees a list of information.
Let's say that list of information is served by one of two dynos (via heroku), but that the list of information originates from a single mongo database (i.e., the nodejs dynos are just passing the mongo information to a user when he logs into the webapp).
Question: Suppose I want to make it possible for a user to both modify and add to that list of information.
At a scale of 1,000-10,000 users, is the following strategy suitable:
User modifies/adds to data; HTTP POST sent to one of the two nodejs dynos with the updated data.
Dyno (whichever one it may be) takes modification/addition of data and makes a direct query into the mongo database to update the data.
Dyno sends confirmation back to the client that the update was successful.
Is this OK? Would I have to likely add more dynos (heroku)? I'm basically worried that if a bunch of users are trying to access a single database at once, it will be slow, or I'm somehow risking corrupting the entire database at the 1,000-10,000 person scale. Is this fear reasonable?
Short answer: Yes, it's a reasonable fear. Longer answer, depends.
MongoDB will queue the responses, and handle them in the order it receives. Depending on how much of it is being served from memory, it may or maybe not be fast enough.
NodeJS has the same design pattern, where it will queue responses it doesn't process, and execute them when the resources become available.
The only way to tell if performance is being hindered is by monitoring it, and seeing if resources consistently hit a threshold you're uncomfortable with passing. On the upside, during your discovery phase your clients will probably only notice a few milliseconds of delay.
The proper way to implement that is to spin up a new instance as the resources get consumed to handle the traffic.
Your database likely won't corrupt, but if your data is important (and why would you collect it if it isn't?), you should be creating a replica set. I would probably go with a replica set of data before I go with a second instance of node.

Resources