Node Websocket: Best way to send large array data to clients - node.js

I have a websocket server made with express in nodejs
Websocket documentation states that websocket.send() can pass string or array buffer. Now I want to send a big array of objects (200k lines when formatted) over websocket to clients. What is the best way to send up data such as this.
What I've tried: I've tried sending it directly by stringifying it, Now this works completely fine but delay is very long.
So is there any way to send such a massive array data to clients while keeping speed intact? I think array buffers might help but I couldn't find any suggested examples
Also if this issue is code specific then let me know in comment so that I can share code snippets as well.

Usually sockets are used to handle real time messaging, sacrificing common http features with the goal of to be light and fast. Check:
How websockets can be faster than a simple HTTP request?
Transfer large amount of data goes against this. Check :
Sending large files over socket
Advice 1
Use socket to detect the real time notification of some change in crypto ticker should use socket.
After tha when client knows (thank to the socket) that there are a new or and updated crypto ticker (which is large), download it using common http instead socket as #jfriend00 also recommends in comments.
Also if data is large, you should use an approach to split the data and send chunk by chunk using common http, not sockets.
Advice 2
As #jfriend00 said and also the major file host services do, implement and algorithm to split the data and send part by part to the client.
If the chunk or part is small, maybe you could use sockets but this is a common feature , so use the known way: common http

Related

Node.js sockets.io need for synchronization?

I'm currently working with node.js, using the socket.io library, to implement a simple chat application. In this applicatio, for irrelevant reasons, I want to setup a system in which a client can ask for a piece of information to the server. The server then broadcasts this request to all other online sockets which will respond with the the answer if they have it. The server then finally returns (the first) response it receives to this original client socket that made the request.
Naturally, the client might receive multiple responses, while only one is needed. Therefor, as soon as one has been received, the others should be discarded. However, it feels like I should use some kind of synchronized datastructure/code to make sure this check for "If an answer has already been received" works as intended.
I've done some searching on this subject but I've seen several mentions of node.js using an event-driven model and not requiring any synchronized code/datastructures, as there are no multiple threads. Is this true? Would my scenario not require any kind of special attention to synchronization and would it just work? Or would I need to use some synchronization methods and if so, which ones?
Code example:
socket.on('new_response', async data => {
await processResponse(data)
});
Due to the fact I am working with encryption I have to make use of async/await, which further complicates things. The processResponse function does a check whether a response has been received already, if not, it processes it, else, it ignores it.
I would suggest something as simple as including a uniqueID in each broadcast to the clients asking if they have a piece of information. The clients then include that same uniqueID in any response they send.
With that, your server can receive answers from the clients and just keep track of which uniqueID values it has already received an answer for and then, if an answer has already been received for that uniqueID, then it would just ignore the later clients that respond.
The uniqueID is server-side generated so it can literally just be an increasing number. You can store the numbers used so far in a server-side Set object so you can quickly look up if you've already received a response for that uniqueID.
Then, the only thing left to do is to age these uniqueIDs away in the Set at some point so they don't accumulate forever. A simple way to do that would be just replace the Set object with a second one every 15 minutes or so, keeping one older generation around so you can check both of them.

Sending messages between clients socket.io

I'm working on a chat application and using socket.io / node for that. Basically I came up with the following strategies:
Send message from the client which is received by the socket server which then sends it to the receiving client. On the background I store that to the message on the DB to be retrieved later if the user wishes to seee his old conversations.
The pros of this approach is that the user gets the message almost instantly since we don't wait for the DB operation to complete, but the con is that if the DB operation failed and exactly that time the client refreshed its page to fetch the message, it won't get that.
Send message form the client to the server, the server then stores it on the DB first and then only sends it to the receiving client.
The pros is that we make sure that the message will be received to the client only if its stored in the DB. The con is that it will be no way close to real time since we'll be doing a DB operation in between slowing down the message passing.
Send message to the client which then is stored on a cache layer(redis for example) and then instantly broadcast it to the receiving client. On background keep fetching records from redis and updating DB. If the client refreshes the page, we first look into the DB and then the redis layer.
The pros is that we make the communication faster and also make sure messages are presented correctly on demand. The con is that this is quite complex as compared to above implementations, and I'm wondering if there's any easier way to achieve this?
My question is whats the way to go if you're building a serious chat application that ensures both - faster communication and data persistence. What are some strategies that app like facebook, whatsapp etc. use for the same? I'm not looking for exact example, but a few pointers will help.
Thanks.
I would go for the option number 2. I've been doing myself Chat apps in node and I found out that this is the best option. Saving in a database takes few milliseconds, which includes the 0.x milliseconds to write in the databse and the few milliseconds of latency in communication ( https://blog.serverdensity.com/mongodb-benchmarks/ ).
SO I would consider this approach realtime. The good thing with this is that if it fails, you can display a message to the sender that it failed, for whatever reason.
Facebook, whatsapp and many other big messaging apps are based on XMPP (jabber) which is a very, very big protocol for instant messaging and everything is very well documented on how to do things but it is based in XML, so you still have to parse everything etc but luckily there are very good libraries to handle with xmpp. So if you want to go the common way, using XMPP you can, but most of the big players in this area are not following anymore all the standards, since does not have all the features we are used to use today.
I would go with doing my own version, actually, I already something made (similar to slack), if you want I could give you access to it in private.
So to end this, number 2 is the way to go (for me). XMPP is cool but brings also a lot of complexity.

XMLHttpRequest detect new data from server before res.end()

Is it possible to detect new data from the server as it is sent? For example, with express.js:
res.write('Processing 14% or something');
and then display that on the page with a progress bar.
Edit:
My original question was a bit confusing so let me explain the situation. I have created a page where users can upload song files. These files are then converted (using ffmpeg) to .ogg and .mp3 files for the web. The conversion takes a long time. Is it possible to send real time data about the conversion back to the client using the same XMLHttpRequest that sent the files?
If i understand correctly you are trying to implement event based actions. Yes node.js has got some excellent web socket libraries such as socket.io and sack.js
You need to understand nodejs event driven pattern.
Websocket protocol helps maintain full duplex connection between server and client. You can notify clients when any action happens in server and similar you can notify server when any action happens in client. Libraries provide flexibility to broadcast event to all connected client or selected ones.
So it is basically emit and on that you will be using often.
Go through the documentation, it will not take much time to learn. Let me know if you need any help.

Node.js best socket.io practice (request-response vs broadcast)

I am new to node.js/socket.io and may not be asking this the right way or even asking the right question.
The goal of my app is to get data from an API convert it to a JSON Object and store it in a mongodb. Then serve the client side the data when needed. I have thought about two possibilities and was wondering what the best practice would be. My first notion was to broadcast all of the entries in the database every so often to all of the connections. The other idea was to have the client request to server what data it needed and then send the requested data to client.
The data being stored in the database is around 100 entries. The data would be updated from the API approximately every 30 seconds. If method 1 was chosen the data would be broadcast every 5-10 seconds. If method 2 was chosen then the data would be sent when requested. The client side will have different situations where not all data will be needed all the time. The client side will have to request data every so often to make sure the data is "fresh".
So my question is what is the best practice broadcast a large chunk every x seconds or broadcast smaller chunks when requested.
Sorry if this doesn't make sense.
Thanks for your time.
DDP protocol is definitely an interesting way to go, but it might be overkill. Simpler solution is to take the best from both method 1 and 2. If latency is not so important and you have spare bandwidth you can broadcast a "update" message to all clients when new data arrives. The client considers if the update affects it and downloads data it needs.
Slightly more complicated and more effective approach is subscription procedure very similar to DDP. For smaller projects you can implement it yourself in a while. This is how it could work:
Client subscribes to a chunk of data.
Server sends this chunk to the client and remembers which clients subscribed to what.
If the chunk is updated the server goes through the subscription list and sends the new data to subscribers.
Client can unsubscribe at any time by sending a special message, disconnecting (or optionally by subscribing to different chunk).
By chunk I mean any way how to identify some data. It can be record ID, time range a filter or anything that makes sense in your project.

Broadcasting Messages at High Frequency. Using HTTP POST or something else?

We're looking at speccing out a system which broadcasts small amounts of frequently changing data (using JSON or XML or something) to multiple recipients at a reasonably high frequency (our updates will be 1000s per second).
We were initially thinking of using HTTP POST to broadcast the data to each endpoint, maybe once every few seconds (the clients will vary as they're other people's webapps), but we're now wondering if there's a better way to hold up to the load/frequency we're hoping. I imagine we'd need to version/timestamp the messages in some way at the very least.
We're using RabbitMQ for preparing all the things ready for sending and to choose what needs to go where (from a Django app, if that matters), but we can't get all of the endpoints to use a MQ.
The HTTP POST thing just doesn't seem quite right. What else should we be looking in to? Is this where things like node or socket.io or some of the new real time frameworks fit in? We're happy to find the right expertise to help with this, just need steering the correct direction.
Thanks!
You don't want to do thousands of POSTs per second to multiple clients. You're going to introduce the HTTP overhead on your end pushing it out, and for all you know, you might end up flooding the server on the other end with POSTs that just swamp it.
Option 1: For clients that can't or won't read a queue, POSTS could work, but to avoid killing the server and all the HTTP overhead, could you bundle updates? Once every minute or two, take all the aggregate data and then post it to the client? This way, you don't have 60+ POST requests going to one client every minute or two for time and eternity. It'll help save on bandwidth as well, since you only send all the header info once with more data instead of sending all the header information and pieces of data.
Option 2: Have you thought about using a good 'ole socket connection? Either you open a socket to the client, or vice versa, and push the data over that? That avoids the overhead of HTTP and lets the client read at the rate data arrives. If the client no longer wants to receive data, they can just close the connection. It's on the arcane side, but it'd avoid completely killing the target server.
If you can get clients to read a MQ, set up a group just for them and make your life easier so you only have to deal with those that can't or won't read the queue instead of trying for a one size fits all solution.

Resources