How to share an object in multiple instances of nodejs? - node.js

I have a functionality where user post data containing few userid and some data related to those userid and I am saving it into postgresql database. I want to save this returned userid in some object.
I just want to check if userid is present in this object and then only call database. This check happen very frequently so I can not hit db every time just to check is there any data present for that userid.
Problem is, I have multiple nodejs instances running on different server so how can I have a common object.
I know I can use redis/riak for storing key-value on server, but don't want to increase complexity/learning just for a single case.(I have never used redis/riak before.)
Any suggestion ?

If your data is in different node.js processes on different servers, then the ONLY option is to use networking to communicate across servers with some common server to get the value. There are lots of different ways to do that.
Put the value in a database and always read the value from the common database
Designate one of your node.js instances as the master and have all the other node.js instances ask the value is on the master anytime they need it
Synchronize the value to each node.js process using networking so each node.js instance always has a current value in its own process
Use a shared file system (kind of like a poor man's database)
Since you already have a database, you probably want to just store it in the database you already have and query it from there rather than introduce another data store with redis just for this one use. If possible, you can have each process cache the value over some interval of time to improve performance for frequent requests.

Related

Is writing multiple INSERTS versus UPDATE faster for temporary POSTGRES databases?

I am re-designing a project I built a year ago when I was just starting to learn how to code. I used MEAN stack, back then and want to convert it to a PERN stack now. My AWS knowledge has also grown a bit and I'd like to expand on these new skills.
The application receives real-time data from an api which I clean up to write to a database as well as broadcast that data to connected clients.
To better conceptualize this question I will refer to the following items:
api-m1 : this receives the incoming data and passes it to my schema I then send it to my socket-server.
socket-server: handles the WSS connection to the application's front-end clients. It also will write this data to a postgres database which it gets from Scraper and api-m1. I would like to turn this into clusters eventually as I am using nodejs and will incorporate Redis. Then I will run it behind an ALB using sticky-sessions etc.. for multiple EC2 instances.
RDS: postgres table which socket-server writes incoming scraper and api-m1 data to. RDS is used to fetch the most recent data stored along with user profile config data. NOTE: RDS main data table will have max 120-150 UID records with 6-7 columns
To help better visualize this see img below.
From a database perspective, what would be the quickest way to write my data to RDS.
Assuming we have during peak times 20-40 records/s from the api-m1 + another 20-40 records/s from the scraper? After each day I tear down the database using a lambda function and start again (as the data is only temporary and does not need to be saved for any prolonged period of time).
1.Should I INSERT each record using a SERIAL id, then from the frontend fetch the most recent rows based off of the uid?
2.a Should I UPDATE each UID so i'd have a fixed N rows of data which I just search and update? (I can see this bottlenecking with my Postgres client.
2.b Still use UPDATE but do BATCHED updates (what issues will I run into if I make multiple clusters i.e will I run into concurrency problems where table record XYZ will have an older value overwrite a more recent value because i'm using BATCH UPDATE with Node Clusters?
My concern is UPDATES are slower than INSERTS and I don't want to make it as fast as possible. This section of the application isn't CPU heavy, and the rt-data isn't that intensive.
To make my comments an answer:
You don't seem to need SQL semantics for anything here, so I'd just toss RDS and use e.g. Redis (or DynamoDB, I guess) for that data store.

MongoDB Multiple database vs single database

I have a NodeJS project that using mongodb as main database.
Regular, I just use one database for containing all information (users, organization, messages,...)
But now, I need to store one more thing - log data - which grow very very fast.
So I consider store log in other database to keep current database safe and fast.
Does anyone has experience in this, Is that better than single database?
Not a real question the mods will certainly say. You have a few options depending on your log data and how / how often you want to access it.
Capped collections if you don't need to store the logs for a long time
Something like Redis to delay writing to the log and keep the app responding fast
Use a replica set to distribute the database load.

Preventing duplicate entries in Multi Instance Application Environment

I am writing an application to serve facebook APIs; share, like etc.. I am keeping all those shared objects from my appliction in a database and I do not want to share the same object if it already been shared.
Considering I will deploy application on different servers there could be a case where both instance tries to insert the same object to table.
How can I manage this concurrency problem with blocking the applications fully ? I mean two threads will try to insert same object and they must sync but they should not block a 3rd thread where it is inserting totally different object.
If there's a way to derive primary key of data entry from data itself, database will resolve such concurrency issue by itself -- 2nd insert will fail with 'Primary Key constraint violation'. Perhaps, data supplied by Facebook API already have some unique ID?
Or, you can consider some distributed lock solution, for example, based on Hazelcast or on similar data grid. This would allow to have record state shared by different JVMs, so it will be possible to avoid unneeded INSERTS.

Is this MEAN stack design-pattern suitable at the 1,000-10,000 user scale?

Let's say that when a user logs into a webapp, he sees a list of information.
Let's say that list of information is served by one of two dynos (via heroku), but that the list of information originates from a single mongo database (i.e., the nodejs dynos are just passing the mongo information to a user when he logs into the webapp).
Question: Suppose I want to make it possible for a user to both modify and add to that list of information.
At a scale of 1,000-10,000 users, is the following strategy suitable:
User modifies/adds to data; HTTP POST sent to one of the two nodejs dynos with the updated data.
Dyno (whichever one it may be) takes modification/addition of data and makes a direct query into the mongo database to update the data.
Dyno sends confirmation back to the client that the update was successful.
Is this OK? Would I have to likely add more dynos (heroku)? I'm basically worried that if a bunch of users are trying to access a single database at once, it will be slow, or I'm somehow risking corrupting the entire database at the 1,000-10,000 person scale. Is this fear reasonable?
Short answer: Yes, it's a reasonable fear. Longer answer, depends.
MongoDB will queue the responses, and handle them in the order it receives. Depending on how much of it is being served from memory, it may or maybe not be fast enough.
NodeJS has the same design pattern, where it will queue responses it doesn't process, and execute them when the resources become available.
The only way to tell if performance is being hindered is by monitoring it, and seeing if resources consistently hit a threshold you're uncomfortable with passing. On the upside, during your discovery phase your clients will probably only notice a few milliseconds of delay.
The proper way to implement that is to spin up a new instance as the resources get consumed to handle the traffic.
Your database likely won't corrupt, but if your data is important (and why would you collect it if it isn't?), you should be creating a replica set. I would probably go with a replica set of data before I go with a second instance of node.

NodeJS, MongoDB: Database read/write strategy for performance

This is my first attempt at a web application with a DB access so I'm not sure what is the accepted way of doing a DB write/read.
In basic terms, my application will have one user updating a field in the DB (a number) and many other users will read it (through a REST api). The updating of the number will not be frequent (maybe once per minute) but the reads can be more than that, about 100/minute. I understand this is a very low rate of DB write/reads that it wouldn't really matter I do direct read from the DB, but I want to know what are the strategies typically used in web applications.
For example, is it better to maintain this number as a variable in memory and serve for reads, so that I don't need to access the DB each time, and then only write to DB (and re-fetch the value to memory) when there is an update to this field. Or is it better to read from the DB for each read entry.
I apologize if the question is vague. I put NodeJS and MonogDB as tags because that's what I'm using in the app.
Thank you.

Resources