performance of real time chatting techniques

performance of real time chatting techniques - node.js

I assigned myself with the task of implementing the chat app (1:1) for my curriculum.Among the various options I used SSE for real time chats.From the example projects I am able to implement the non persistent chat between two clients.In every examples they uses js object and array to store the res object and by iterating them they sent events to particular user.But when implementing the real time chat app the users may increase dramatically So it is not good to exhaust server resources.
I found the some of the other ways to achieve same
functionality but not sure about the performance
SSE+setInterval
I used redis Queue to push offline messages to the user.
when the user establishes the connection push all the unread chats to client.
This process happens immediately when client establishes connection with server.
I faced some problem here, as I have no way of triggering the messages in real time(when both users online).
So I used setInterval with time interval of 1 second for real time communication and write a callback function to check if the Queue is empty else pop message from Queue and sent to user as an event.
Will the above solutions affect performance ? Because I am calling the function for each connected user x 1 second interval.
Long polling
In long polling how can I find if there is new message for user and complete the request ?
Still here setInterval should be used in server side but what about performance?
Websockets
In websockets we have an unique id to find the client in the pool of clients, so we can forward message to particular user when event occurs.
Still websockets uses some ping pong mechanism to make connection persistent but resource utilization is very small as they are network calls with comparatively small data and handled asynchronously so no wastage in server resource.
Questions
How to trigger res.write only when the new message arrives to particular user?
Does SSE+setInterval or longpolling+setInterval degrades performance when user increases?
Else is there any design pattern to achieve this functionality?

Simply use websocket.
It's fast, convinient and simple.
To send message in realtime when both users are logged, find second user by id in users Array or Map and send received message to his websocket.
If you have buffered messages for disconnected user (in memory/database/redis) check it when user connects and send if it exists.

Related

How to use socket.io properly with express app

I wonder how do I use socket.io properly with my express app.
I have a REST API written in express/node.js and I want to use socket.io to add real-time feature for my app. Consider that I want to do something I can do just by sending a request to my REST API. What should I do with socket.io? Should I send request to the REST API and send socket.io client the result of the process or handle the whole process within socket.io emitter and then send the result to socket.io client?
Thanks in advance.

Question is not that clear but from what I'm getting from it, is that you want to know what you would use it for that you cant already do with your current API?
The short answer is, well nothing really.. Websockets are just the natural progression of API's and the need for a more 'real-time' interface between systems.
Old methods (and still used and relevant for the right use case) is long polling where you keep checking back to the server for updated items and if so grab them.. This works but it can be expensive in terms of establishing a connection, performing a lookup, then closing a connection.
websockets keep that connection open, allowing both the client and server to communicate real time. So for example, lets say you make an update to your backend data and want users to get that update, using long polling you would rely on each client to ping back to the server, check if there is an update and if so grab it. This can cause lags between updates, some users have updated data while other do not etc.
Now, take the same scenario with websockets, you make an update to the backend data, hit submit, this then emits to your socket server. Socket server takes the call, performs the task ( grabs updated data ) and emits it to the users, each connected user instantly gets that update.
Socket servers are typically used for things like real time chats or polling where packets are smaller but they are also used for web games etc. Depending on the size of your payloads will determine how best to send data back and forth because the larger the payload the more resources / bandwidth it will take on the socket server so its something to consider.

How to build a scalable realtime chat messaging with Websocket?

I'm trying to build a realtime (private) chat between users of a video game with 25K+ concurrent connections. We currently run 32 nodes where users can connect through a load balancer. The problem I'm trying to solve is how to route messages to each user?
Currently, we are using socket.io & socket.io-redis, where each websocket joins a room with its user ID, and we emit each message they should receive to that room. The problem with this design is that we are reaching the limits of Redis Pubsub, and Socket.io which doesn't scale well (socket.io emit messages to all nodes which check if the user is connected, this is not viable).
Our current stack is composed of Postgres, Redis & RabbitMQ. I have been thinking about this problem a lot and have come up with 3 different solutions :
Route all messages with RabbitMQ. When a user connects, we create an exchange with type fanout with the user ID and a queue per websocket connection (we have to handle multiple connections per user). When we want to emit to that user, we simply publish to that exchange. The problem with that approach is that we have to create a lot of queues, and I heard that this may not be very efficient.
Create a queue for each node in RabbitMQ. When a user connects, we save the node & socket ID in a Redis Set, so that when we have to send a message to that specific user, we first get the list of nodes, emit to each node queue, which then handle routing to specific client in the app. The problems with that approach is that in the case of a node failure, we may store that a user is connected when this is not the case. To fix that, we would need to expire the users's Redis entry but this is not a perfect fix. Also, if we later want to implement group chat, it would mean we have to send duplicates messages in Rabbit, this is not ideal.
Go all in with Firebase Cloud Messaging. We have a mobile app, and we plan to use it for push notifications when the user isn't connected, but would it be a good fit even if the user is connected?
What do you think is the best fit for our use case? Do you have any other idea?

I found a better solution : create a binding for each user but using only one queue on each node, then we route each messages to each user.

Sending a notification to user using Node.js

I am looking for a solution to my problem. I have Node.js server serving my web application where user can log in. I want to handle a situation where one user A performs specific action and user B associated with this action gets real life notification. Is there a module that would help me or there is some other solution?

What you are describing is "server push" where the server proactively notifies a user on their site of some activity or event. In the web browser world these days, there are basically two underlying technology options:
webSocket (or some use socket.io, a more feature rich library built on top of webSocket)
server sent events (SSE).
For webSocket or socket.io, the basic idea is that the web page connects back to the server with a webSocket or socket.io connection. That connection stays live (unlike a typical http connection that would connect, send a request, receive a response, then close the connection). So, with that live connection, the server is free to send the client (which is the web page in a user's browser), notifications at any time. The Javascript in the web page then listens for incoming data on the connection and, based on what data it receives, then uses Javascript to update the currently displayed web page to show something to the user.
For server sent events, you open an event source on the client-side and that also creates a lasting connection to the server, but this connection is one-way only (the server can send events to the client) and it's completely built on HTTP. This is a newer technology than webSocket, but is more limited in purpose.
In both of these cases, the server has to keep track of which connection belongs to which user so when something interesting happens on the server, it can know which connection to notify of the event.
Another solution occasionally used is client-side polling. In this case, the web page just regularly sends an ajax call to the server asking if there are any new events. Anything new yet? Anything new yet? Anything new yet? While this is conceptually a bit simpler, it's typically far less efficient unless the polling intervals are spaced far apart, say 10 or 15 minutes which limits the timeliness of any notifications. This is because most polling requests (particularly when done rapidly) return no data and are just wasted cycles on your server.

If you want to notify userB, when both of you are simultaneously online during the action, then use websockets to pass message to a two-way channel to notify userB.
If you want to notify them whenever, regardless of online status, use a message queue.

EventStreams (SSE) - Broadcasting updates to clients. Is it possible?

I have React web application and REST API (Express.js).
I found that usage of EventStream is better choice if you do not want to use long-polling or sockets (no need to send data client->server).
Usecase:
User opens page where is empty table where other users can add data by POST /data.
This table is filled with initial data from API by GET /data.
Then page is connected to EventStream on /data/stream and listen for updates
Someone add new row and table needs to be updated...
Is possible to broadcast this change (new row added) from backend (controller for adding rows) to all users what are connected to /data/stream?

It is generally not good practice to have a fetch for the initial data, then a separate live stream for updates. That's because there is a window where data can arrive on the server between the initial fetch and the live update stream.
Usually, that means you either miss messages or you get duplicates that are published to both. You can eliminate duplicates by tracking some kind of id or sequence number, but that means additional coding and computation.
SSE can be used for both the initial fetch and the live updates on a single stream, avoiding the aforementioned sync challenges.
The client creates an EventSource to initiate an SSE stream. The server responds with the data that is already there, and thereafter publishes any new data that arrives on the server.
If you want, the server can include an event-id with each message. Then if a client becomes disconnected, the SSE client will automatically reconnect with the last-event-id, and the data flow resumes from where it left off. On the client-side, the auto-reconnect and resume from last-event-id is automatic as it is spec-ed by the standard. The developer doesn't have to do anything.
SSE is kind of like a HTTP / REST / XHR request that stays open and continues to stream data, so you get the best of both worlds. The API is lightweight, easy to understand, and standards-based.

I will try to answer myself :)
I never thought I can use just whatever pub/sub system on backend. Every user what connects to stream (/data/stream) gets subscribed and server will just publish when receive new row from POST /data

Chat / System Communication App (Nodejs + RabbitMQ)

So i currently have a chat system running NodeJS that passes messages via rabbit and each connected user has their own unique queue that subscribed and only listening to messages (for only them). The backend can also use this chat pipeline to communicate other system messages like notifications/friend requests and other user event driven information.
Currently the backend would have to loop and publish each message 1 by 1 per user even if the payload of the message is the same for let's say 1000 users. I would like to get away from that and be able to send the same message to multiple different users but not EVERY user who's connected.
(example : notifying certain users their friend has come online).
I considered implementing a rabbit queue system where all messages are pooled into the same queue and instead of rabbit sending all user queues node takes these messages and emit's the message to the appropriate user via socket connections (to whoever is online).
Proposed - infrastructure
This way the backend does not need to loop for 100s and 1000s of users and can send a single payload containing all users this message should go to. I do plan to cluster the nodejs servers together.
I was also wondering since ive never done this in a production environment, will i need to track each socketID.
Potential pitfalls i've identified so far:
slower since 1000s of messages can pile up in a single queue.
manually storing socket IDs to manually trasmit to users.
offloading routing to NodeJS instead of RabbitMQ
Has anyone done anything like this before? If so, what are your recommendations. Is it better to scale with user unique queues, or pool all grouped messages for all users into smaller (but larger pools) of queues.

as a general rule, queue-per-user is an anti-pattern. there are some valid uses of this, but i've never seen it be a good idea for a chat app (in spite of all the demos that use this example)
RabbitMQ can be a great tool for facilitating the delivery of messages between systems, but it shouldn't be used to push messages to users.
I considered implementing a rabbit queue system where all messages are pooled into the same queue and instead of rabbit sending all user queues node takes these messages and emit's the message to the appropriate user via socket connections (to whoever is online).
this is heading down the right direction, but you have to remember that RabbitMQ is not a database (see previous link, again).
you can't randomly seek specific messages that are sitting in the queue and then leave them there. they are first in, first out.
in a chat app, i would have rabbitmq handling the message delivery between your systems, but not involved in delivery to the user.
your thoughts on using web sockets are going to be the direction you want to head for this. either that, or Server Sent Events.
if you need persistence of messages (history, search, last-viewed location, etc) then use a database for that. keep a timestamp or other marker of where the user left off, and push messages to them starting at that spot.
you're concerns about tracking sockets for the users are definitely something to think about.
if you have multiple instances of your node server running sockets with different users connected, you'll need a way to know which users are connected to which node server.
this may be a good use case for rabbitmq - but not in a queue-per-user manner. rather, in a binding-per-user. you could have each node server create a queue to receive messages from the exchange where messages are published. the node server would then create a binding between the exchange and queue based on the user id that is logged in to that particular node server
this could lead to an overwhelming number of bindings in rmq, though.
you may need a more intelligent method of tracking which server has which users connected, or just ignore that entirely and broadcast every message to every node server. in that case, each server would publish an event through the websocket based on the who the message should be delivered to.
if you're using a smart enough websocket library, it will only send the message to the people that need it. socket.io did this, i know, and i'm sure other websocket libraries are smart like this, as well.
...
I probably haven't given you a concrete answer to your situation, and I'm sure you have a lot more context to consider. hopefully this will get you down the right path, though.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string