How do I manage groups/rooms with node WebSockets? - node.js

TL;DR below.
I am currently developing a React/Redux SPA that is driven by real-time data. I've decided to use ws, instead of socket.io since socket.io feels a bit high level for what I'm doing, I'd rather manage sockets myself.
In saying that, I'm struggling to find a way to manage the separation of updates/messages per view/route. Since I'm using client-side routing it's per express route won't really work...
Messages between the server and client via WebSockets are JSON with actions like GET_ITEMS then a response of GET_ITEMS_SUCCESS with an array of 'items' and for errors: ..._ERROR etc. This is all fine, since it's just 1 to 1 transaction. Though the problem arises when broadcasting (1 to all) to all relevant clients when the server receives an update.
So, I assume it best practice to limit these broadcasts to the clients that are viewing/want the data. So when viewing, for example, the Item page, there is no point in broadcasting updates to the User data since that is only used on the User page.
I haven't been able to find any common practices when dealing with this sort of situation, just a few small outdated/barely used wrappers for ws that just add a few basic functions to leave/join but don't offer much flexibility with implementation.
What I think MIGHT work is to have an object/array for each 'group'/'room', which stores the clients that are currently listening to updates from a given section. So a user would send an action to INIT_LISTEN (& ``) with a param of category, e.g. ITEM for updates and other actions related to items.
TL;DR
What my question really boils down to is: How do I store a reference to a single socket? (ws client object? ws client ID?) Then, can I store this in an object/array to iterate through like below.
const ClientRooms = {
Items: {
{
...ws
}
/* ...rest of the client */
}
}
or
const ClientRooms = {
Items: [ "xyz" ] /* Array of ws ids */
}
I have a "ping--pong" heartbeat function to keep clients active and prevent silent connection failures/disconnections. I can't find if ws.terminate() still fires the ws close event so I can iterate 'group'/'room' the object/array to find and remove instances of that client.

Related

Socket.io offline behaviour server-side

I have run into an unforeseen problem with my socket.io setup.
I use socket.io to live load data from my database (mongoDB, nodejs, react).
To accomplish this, I use mongoDB's changestream to detect changes and then push them to the front-end via socket.io.
Now this works perfectly as long as the user is connected. And right now, when the user reconnects, it just reloads all data. While this is fine for most users, there is a small group with very bad network connection and thus the front-end is reloading data all the time. Which causes the front-end to be unresponsive for some time.
So, I am looking for a way to only send events that occurred during the front-end being offline. While the front-end can do this quite easily: https://socket.io/docs/v4/client-offline-behavior/
It doesn't seem possible to do this at the server side. Since socket.io (server side) immediately forgets sockets that have disconnected and thus cant buffer events.
So, I was wondering if there is a good way do this? Or would this need a full "wrapper" around socket.io that caches disconnected sockets?
Any help or advice would be appreciated!
I find it is a really interesting and painful problem ! ^^'
If you can give more variables, it may help people to give you a better answer
For instance
How many data are stored in database, how much a typical user will receive, and how many events are triggered on a time frame ?
How long should an event take to be visible ? I mean, if users receive an event with a 10s,30s,... delay, is it harmfull for the service they provide.
How your data is structured ? is it a simple json array with the same field, custom field, dynamic json object, etc..
How your react app is structured, do you put heavy logic when your data is update, etc..
I think you should put more controls in your front end code and update only when new datas.
Some paths to explore
1. Put more controls in your front end
As you stated, for the users with bad connection, the react client seems to update his state too quickly, when they reload data after the websocket is connected, again and again. Ui may freeze in this case, yes.
For this, I think of two approaches :
Before updating the state, check if react current state is the same as the data you receive from websocket connection. If the reconnection is quick enough and no new data arrived, it should be the same. So in this case do not update react state.
If too many events are triggered and after each reconnection new data arrived, you can buffer the datas from the websocket and display it only once per time frame. What i mean by time frame, is you can use functions like setInterval or requestAnimationFrame to trigger react update. A pseudo react code to illustrate this.
function App() {
const [events, setEvents] = useState({ datas: [] });
const bufferedEvents = useRef([]);
useEffect(() => {
websocket.on("connected", (newEvents) => {
bufferedEvents.current = bufferedEvents.current.concat(newEvents);
})
websocket.on("data", (newEvent) => {
bufferedEvents.current = bufferedEvents.current.concat(newEvent);
})
// In the setInterval function you take all the events receive at the connection + new events. to update the react state. You clean the bufferedEvents at the same time.
const intervalId=setInterval(() => {
const events = bufferedEvents.current;
bufferedEvents.current = [];
//update if new datas
if (events.length > 0) {
setEvents((prevState) => { return { datas: prevState.datas.concat(events) } });
}
// console.log(events)
}, 1000) // trigger data update every second. You could replace this approach with a requestAnimationFrame. You can adapt the time refresh as you need.
//Do not forget to clear the interval when the component is unmount
return ()=>{
clearInterval(intervalId)
}
}, []);
return (
<div>
<span>Total events : {events.datas.length}</span>
<br />
{
events.datas.map(event => {
return <div>{event.data}</div>
})
}
</div>
)
}
You can look at this article for details on using requestAnimation frame.
I think that modifying the front end is needed in all case, but still alone, not really good on performance.
2. Fetch only new data in your back end
For this approach, it really depends how your data is structured in the database.
If the data have some timestamp in it, I can think of a naive but simple cookie with a timestamp in it.
When user connects the first time, this cookie is null.
When they fetch the data, on the websocket connection, they receive all the datas. When datas arrived, you update the cookie timestamp with the most recent date in the data.
Websocket is disconnected, you open a new websocket with the cookie timestamp on it. With this information you can query all the datas more recent than the timestamp on the cookie.
Like this, you don't have to download the entirity of data, but only fresh ones.
Other approaches may be more helpfull but without more informations on your datas and more precise requirements, it is hard to say.
If you have a lot of data, I will personally check some pagination mechanism and maybe combine some classic http request for fetching the data, and websocket, sse, or long polling for live events.
You can put a comment if needed and I will update my response !
Cheers

Mongo Change Streams running multiple times (kind of): Node app running multiple instances

My Node app uses Mongo change streams, and the app runs 3+ instances in production (more eventually, so this will become more of an issue as it grows). So, when a change comes in the change stream functionality runs as many times as there are processes.
How to set things up so that the change stream only runs once?
Here's what I've got:
const options = { fullDocument: "updateLookup" };
const filter = [
{
$match: {
$and: [
{ "updateDescription.updatedFields.sites": { $exists: true } },
{ operationType: "update" }
]
}
}
];
const sitesStream = Client.watch(sitesFilter, options);
// Start listening to site stream
sitesStream.on("change", async change => {
console.log("in site change stream", change);
console.log(
"in site change stream, update desc",
change.updateDescription
);
// Do work...
console.log("site change stream done.");
return;
});
It can easily be done with only Mongodb query operators. You can add a modulo query on the ID field where the divisor is the number of your app instances (N). The remainder is then an element of {0, 1, 2, ..., N-1}. If your app instances are numbered in ascending order from zero to N-1 you can write the filter like this:
const filter = [
{
"$match": {
"$and": [
// Other filters
{ "_id": { "$mod": [<number of instances>, <this instance's id>]}}
]
}
}
];
Doing this with strong guarantees is difficult but not impossible. I wrote about the details of one solution here: https://www.alechenninger.com/2020/05/building-kafka-like-message-queue-with.html
The examples are in Java but the important part is the algorithm.
It comes down to a few techniques:
Each process attempts to obtain a lock
Each lock (or each change) has an associated fencing token
Processing each change must be idempotent
While processing the change, the token is used to ensure ordered, effectively-once updates.
More details in the blog post.
It sounds like you need a way to partition updates between instances. Have you looked into Apache Kafka? Basically what you would do is have a single application that writes the change data to a partitioned Kafka Topic and have your node application be a Kafka consumer. This would ensure only one application instance ever receives an update.
Depending on your partitioning strategy, you could even ensure that updates for the same record always go to the same node app (if your application needs to maintain its own state). Otherwise, you can spread out the updates in a round robin fashion.
The biggest benefit to using Kafka is that you can add and remove instances without having to adjust configurations. For example, you could start one instance and it would handle all updates. Then, as soon as you start another instance, they each start handling half of the load. You can continue this pattern for as many instances as there are partitions (and you can configure the topic to have 1000s of partitions if you want), that is the power of the Kafka consumer group. Scaling down works in the reverse.
While the Kafka option sounded interesting, it was a lot of infrastructure work on a platform I'm not familiar with, so I decided to go with something a little closer to home for me, sending an MQTT message to a little stand alone app, and letting the MQTT server monitor messages for uniqueness.
siteStream.on("change", async change => {
console.log("in site change stream);
const mqttClient = mqtt.connect("mqtt://localhost:1883");
const id = JSON.stringify(change._id._data);
// You'll want to push more than just the change stream id obviously...
mqttClient.on("connect", function() {
mqttClient.publish("myTopic", id);
mqttClient.end();
});
});
I'm still working out the final version of the MQTT server, but the method to evaluate uniqueness of messages will probably store an array of change stream IDs in application memory, as there is no need to persist them, and evaluate whether to proceed any further based on whether that change stream ID has been seen before.
var mqtt = require("mqtt");
var client = mqtt.connect("mqtt://localhost:1883");
var seen = [];
client.on("connect", function() {
client.subscribe("myTopic");
});
client.on("message", function(topic, message) {
context = message.toString().replace(/"/g, "");
if (seen.indexOf(context) < 0) {
seen.push(context);
// Do stuff
}
});
This doesn't include security, etc., but you get the idea.
Will that having a field in DB called status which will be updated using findAnUpdate based on the event received from change stream. So lets say you get 2 events at the same time from change stream. First event will update the status to start and the other will throw error if status is start. So the second event will not process any business logic.
I'm not claiming those are rock-solid production grade solutions, but I believe something like this could work
Solution 1
applying Read-Modify-Write:
Add version field to the document, all the created docs have version=0
Receive ChangeStream event
Read the document that needs to be updated
Perform the update on the model
Increment version
Update the document where both id and version match, otherwise discard the change
Yes, it creates 2 * n_application_replicas useless queries, so there is another option
Solution 2
Create collection of ResumeTokens in mongo which would store collection -> token mapping
In the changeStream handler code, after successful write, update ResumeToken in the collection
Create a feature toggle that will disable reading ChangeStream in your application
Configure only a single instance of your application to be a "reader"
In case of "reader" failure you might either enable reading on another node, or redeploy the "reader" node.
As a result: there might be an infinite amount of non-reader replicas and there won't be any useless queries

Would the values inside request be mixed up in callback?

I am new to Node.js, and I have been reading questions and answers related with this issue, but still not very sure if I fully understand the concept in my case.
Suggested Code
router.post('/test123', function(req, res) {
someAsyncFunction1(parameter1, function(result1) {
someAsyncFunction2(parameter2, function(result2) {
someAsyncFunction3(parameter3, function(result3) {
var theVariable1 = req.body.something1;
var theVariable2 = req.body.something2;
)}
)}
});
Question
I assume there will be multiple (can be 10+, 100+, or whatever) requests to one certain place (for example, ajax request to /test123, as shown above) at the same time with some variables (something1 and something2). According to this, it would be impossible that one user's theVariable1 and theVariable2 are mixed up with (i.e, overwritten by) the other user's req.body.something1 and req.body.something2. I am wondering if this is true when there are multiple callbacks (three like the above, or ten, just in case).
And, I also consider using res.locals to save some data from callbacks (instead of using theVariable1 and theVariable2, but is it good idea to do so given that the data will not be overwritten due to multiple simultaneous requests from clients?
Each request an Node.js/Express server gets generated a new req object.
So in the line router.post('/test123', function(req, res), the req object that's being passed in as an argument is unique to that HTTP connection.
You don't have to worry about multiple functions or callbacks. In a traditional application, if I have two objects cat and dog that I can pass to the listen function, I would get back meow and bark. Even though there's only one listen function. That's sort of how you can view an Express app. Even though you have all these get and post functions, every user's request is passed to them as a unique entity.

Sails 0.10 ModelIdentity.subscribe or .watch and for publishCreate()?

I'm studying the SailsCasts and I'm working on SailsJS beta version 0.10.
Everything works except when in 0.9.7, I use this:
//subscribe this socket to the User model classroom
User.subscribe(req.socket);
// subscribe this socket to the user instance rooms
User.subscribe(req.socket, users);
How to do this for 0.10?
Especially, the subscribe for publishUpdate and publishDestroy works fine.
For publishCreate, I need 'User.subscribe(req.socket)' and I have in console the warning:
debug: Deprecated: Model.subscribe(socket, null, ...)
debug: (see http://links.sailsjs.org/docs/config/pubsub)
debug: Please use instance rooms instead (or raw sails.sockets.*() methods.)
And then:
What are the differences between 'model.watch()' and 'model.subscribe()'?
Question 1
I'd prefer to comment (lack rep...), but have you given the docs a good read?
You can't call .subscribe like User.subscribe(req.socket). It requires a second param records. So your User.subscribe(req.socket, users); should work if users is a list of user model instances.
Question 2
I'm no expert (at all...) with node or sails, but the docs - watch claimmodel.watch() subscribes clients to publishCreate events for the model instances. I see no mention of publishUpdate,publishDestory, etc. I think it only watches creation events. .subscribe() takes a list of models (or a model) and subscribes the client to publishAdd, publishDestroy, publishRemove, publishUpdate events (by default) for that list of model instances. You can also specify what contexts you want to subscribe to.
So, it seems you actually want to use User.watch(req.socket) instead of .subscribe() if you only want to send the socket publishCreate events. If you need all of them, use something like User.subscribe(req.socket,users,[create,update,destroy,]).
If you want to be cool, you can set the autosubscribe property to contain the list of the contexts you care about and just use User.subscribe(req.socket,users) the docs - context.
Cheers
In order to subscribe to a model:
subscribe: function(req, res) {
Model.find().exec(function(err, records) {
YourModel.subscribe(req.socket, records);
YourModel.watch(req);
});
}
This way, you get a message each time a record is created, destroyed or updated.

Resolve MongoDB reference

I am currently building a chatting app with nodejs and mongoDB.
Basically I have two collections to maintain in the db.
user = {
_id: ObjectId("1234"),
account: "stan123"
}
thread = {
_user: ObjectId("1234"),
messages: [
{
body:"hi"
_user:ObjectId("1234")
},
{
body:"second msg"
_user:ObjectId("1234")
}
]
}
I am planning to pass the thread model with all resolved info (user) to the client side, so that I can construct my widget with it.
I searched for solutions for this.Some suggests to make extra calls from client side to get the data.
However, I am worried that when the amount of message grows, there will be considerable http calls that might hurt site speed.
I know some drivers can resolve DBRefs automatically and make the code clean.
However, according to
http://docs.mongodb.org/manual/applications/database-references/
I decided to just use id to maintain reference that make it's as simple as possible.
My plan is resolving all references on server side. Current approach is getting the length of message array first.
Then loop through the message array and make a second query to resolve user info separately.
In each query callback, do a messageToResolve++ and if(messageToResolve >= thread.messages.length)
If the condition meets, send the resolved model to client and end the response.
This is not a case I would consider embedded because it would be painful when you need to update user data.
(message is embedded because it exists only when thread exists)
I am not sure if it's a good way to do it.
Does anyone has a better solution?
Sorry if I didn't explain my problem and solution clear enough.
And thanks in advance.

Resources