Broadcasting Messages at High Frequency. Using HTTP POST or something else? - node.js

We're looking at speccing out a system which broadcasts small amounts of frequently changing data (using JSON or XML or something) to multiple recipients at a reasonably high frequency (our updates will be 1000s per second).
We were initially thinking of using HTTP POST to broadcast the data to each endpoint, maybe once every few seconds (the clients will vary as they're other people's webapps), but we're now wondering if there's a better way to hold up to the load/frequency we're hoping. I imagine we'd need to version/timestamp the messages in some way at the very least.
We're using RabbitMQ for preparing all the things ready for sending and to choose what needs to go where (from a Django app, if that matters), but we can't get all of the endpoints to use a MQ.
The HTTP POST thing just doesn't seem quite right. What else should we be looking in to? Is this where things like node or socket.io or some of the new real time frameworks fit in? We're happy to find the right expertise to help with this, just need steering the correct direction.
Thanks!

You don't want to do thousands of POSTs per second to multiple clients. You're going to introduce the HTTP overhead on your end pushing it out, and for all you know, you might end up flooding the server on the other end with POSTs that just swamp it.
Option 1: For clients that can't or won't read a queue, POSTS could work, but to avoid killing the server and all the HTTP overhead, could you bundle updates? Once every minute or two, take all the aggregate data and then post it to the client? This way, you don't have 60+ POST requests going to one client every minute or two for time and eternity. It'll help save on bandwidth as well, since you only send all the header info once with more data instead of sending all the header information and pieces of data.
Option 2: Have you thought about using a good 'ole socket connection? Either you open a socket to the client, or vice versa, and push the data over that? That avoids the overhead of HTTP and lets the client read at the rate data arrives. If the client no longer wants to receive data, they can just close the connection. It's on the arcane side, but it'd avoid completely killing the target server.
If you can get clients to read a MQ, set up a group just for them and make your life easier so you only have to deal with those that can't or won't read the queue instead of trying for a one size fits all solution.

Related

Node.js design approach. Server polling periodically from clients

I'm trying to learn Node.js and adequate design approaches.
I've implemented a little API server (using express) that fetches a set of data from several remote sites, according to client requests that use the API.
This process can take some time (several fecth / await), so I want the user to know how is his request doing. I've read about socket.io / websockets but maybe that's somewhat an overkill solution for this case.
So what I did is:
For each client request, a requestID is generated and returned to the client.
With that ID, the client can query the API (via another endpoint) to know his request status at any time.
Using setTimeout() on the client page and some DOM manipulation, I can update and display the current request status every X, like a polling approach.
Although the solution works fine, even with several clients connecting concurrently, maybe there's a better solution?. Are there any caveats I'm not considering?
TL;DR The approach you're using is just fine, although it may not scale very well. Websockets are a different approach to solve the same problem, but again, may not scale very well.
You've identified what are basically the only two options for real-time (or close to it) updates on a web site:
polling the server - the client requests information periodically
using Websockets - the server can push updates to the client when something happens
There are a couple of things to consider.
How important are "real time" updates? If the user can wait several seconds (or longer), then go with polling.
What sort of load can the server handle? If load is a concern, then Websockets might be the way to go.
That last question is really the crux of the issue. If you're expecting a few or a few dozen clients to use this functionality, then either solution will work just fine.
If you're expecting thousands or more to be connecting, then polling starts to become a concern, because now we're talking about many repeated requests to the server. Of course, if the interval is longer, the load will be lower.
It is my understanding that the overhead for Websockets is lower, but still can be a concern when you're talking about large numbers of clients. Again, a lot of clients means the server is managing a lot of open connections.
The way large services handle this is to design their applications in such a way that they can be distributed over many identical servers and which server you connect to is managed by a load balancer. This is true for either polling or Websockets.

How many fetch requests are too much?

I'm on the newer side of web development. I use React so obviously I focus on client side rendering, but for a certain application I was thinking to make a request to my server for every page (this probably isn't necessary but just a workaround due to ignorance on my part); however, the thought came to me, how many fetch requests are too much?
I want to divide this a little bit, I know different fetch requests can take different amounts of time, a GET request for 1 item of data is faster than a POST request that adds 20 rows and you can't account for all the variations.
But in general,
How long does a fetch request to a server (performing some sort of CRUD operation on the database) take?
** 2. How long does a fetch request to a server (NOT performing any operation to a database) take?**
Option 2 is obviously faster (if we're just imagining simple requests) and I know this can probably vary from server to server, but I think it would be helpful to know, so I can structure my site more efficiently and have some sort of knowledge about this topic?
There is no general way to know how long a request will take, it depends on so many factors like internet speed (assuming it is not a local server), the amount of data being submitted (in a POST) or retrieved, the amount of processing done on the server before returning the response etc.
If you have the answers to all of the above questions you can just calculate the time as a simple calculation.

Websockets, SSE, or HTTP when auto updating a constantly open dashboard page

My app is built in Angular (2+) and NodeJS. One of the pages is basically a dashboard that shows current tasks of a company, where this dashboard is shown all day on a TV to the companies staff.
Rarely is it refreshed or reloaded manually.
Tasks are updated every 5-10 mins by a staff member from another computer.
Tasks on dashboard need to be updated asap after any task is updated.
App should limit data transfer when updating dashboard, after task update.
I initially tried websockets but had a problem with connection reliability as sometimes the board would never get updated because the websocket would lose its connection. I could never figure this problem out and read that websockets can be unreliable.
Currently I'm just running an http call every 15 seconds to retrieve a new set of data from the backend. But this can be costly with data transfer as the app scales.
I've just recently heard about SSE but know nothing about it.
At the moment my next plan is to set up a "last updated" status check where I still run an http call every 15 seconds, passing a "last updated" time from the frontend and comparing that to the backends "last updated" time (which is updated whenever a task is changed) and only returning data if the frontend's time is outdated, to reduce data transfer.
Does that sounds like a good idea, or should I try websockets again, or SSE?
I initially tried websockets but had a problem with connection reliability as sometimes the board would never get updated because the websocket would lose its connection.
Handle the event for when the connection is lost and reconnect it.
I could never figure this problem out and read that websockets can be unreliable.
Don't let some random nonsense you read on the internet keep you from owning the problem and figuring it out. Web Sockets are reliable as anything else. And, like anything else, they can get disconnected. And, like many of the newer APIs, they leave re-connection logic up to you... the app developer. If you simply don't want to deal with it, there are many packages on NPM for auto-reconnecting Web Sockets which do exactly I suggested. They handle the events for disconnection, and immediately reconnect.
Currently I'm just running an http call every 15 seconds to retrieve a new set of data from the backend. But this can be costly with data transfer as the app scales.
It can be, yes.
I've just recently heard about SSE but know nothing about it.
From what little we know about your problem, SSE sounds like the right way to go. SSE is best for:
Evented data
Data that can be somehow serialized to text (JSON is fine, but don't base64 encode large binary streams as you'll make them too big)
Unidirectional messages, from server to client
Most implementations will reconnect for you, and it even supports a method of picking up where it left off, if a disconnection actually occurs.
If you need to push only data from the server to the client, it may worth having a look at Server-Sent Events.
You can have a look at this article (https://streamdata.io/blog/push-sse-vs-websockets/) and this video (https://www.youtube.com/watch?v=NDDp7BiSad4) to get an insight about this technology and whether it could fit your needs. They summarize pros & cons of both SSE and WebSockets.

What is expected to be faster when collecting data from multiple apis: websockets or http requests?

I have to collect real timer ticker data on trading pairs (usd/eur etc) from APIs of different sites. The data is usually a small JSON object with mostly ~10 numbers. The naive strategy is to make a request every 5 or so seconds to get the up-to-date ticker data of each of those sites. Some of them, though, provide a websocket option, which allows them to notify me directly when a change occurs, and, I believe, is more efficient. The issue is some of those sites don't offer that option, so the overall code will be simpler to organize and read if I use the same method for all sites (i.e., http requests). I'm also not sure the data is heavyweight enough to justify that choice.
From the experts who dealt with similar situations, is this case one where a relevant performance improvement can be expected from using sockets instead of timed http requests when it is available?
It depends:
WebSockets make only sense if you keep them open most of the time. If you instead open a new WebSocket connection each time you want to have new data the overhead is larger compared to a simple HTTP request. It is not so much the bandwidth (but this too) but you need more round trips to get your data which makes everything slower.
WebSockets take more resources at your end because you have to keep a TCP connection open for each open WebSocket connection. If there are only a small number of sites you need to ask it does not matter, if there are a lot it will matter. While it can be an advantage (less latency) to keep normal HTTP connections alive too you can close them if you have less resources.
If most of the time the data you get is the same then WebSockets might be more efficient because you only get send the new data when it actually changes.
If you want to be informed of new data as soon as possible then WebSockets perform better. If you only need a 5 second precision anyway it does not matter much.

Node.js best socket.io practice (request-response vs broadcast)

I am new to node.js/socket.io and may not be asking this the right way or even asking the right question.
The goal of my app is to get data from an API convert it to a JSON Object and store it in a mongodb. Then serve the client side the data when needed. I have thought about two possibilities and was wondering what the best practice would be. My first notion was to broadcast all of the entries in the database every so often to all of the connections. The other idea was to have the client request to server what data it needed and then send the requested data to client.
The data being stored in the database is around 100 entries. The data would be updated from the API approximately every 30 seconds. If method 1 was chosen the data would be broadcast every 5-10 seconds. If method 2 was chosen then the data would be sent when requested. The client side will have different situations where not all data will be needed all the time. The client side will have to request data every so often to make sure the data is "fresh".
So my question is what is the best practice broadcast a large chunk every x seconds or broadcast smaller chunks when requested.
Sorry if this doesn't make sense.
Thanks for your time.
DDP protocol is definitely an interesting way to go, but it might be overkill. Simpler solution is to take the best from both method 1 and 2. If latency is not so important and you have spare bandwidth you can broadcast a "update" message to all clients when new data arrives. The client considers if the update affects it and downloads data it needs.
Slightly more complicated and more effective approach is subscription procedure very similar to DDP. For smaller projects you can implement it yourself in a while. This is how it could work:
Client subscribes to a chunk of data.
Server sends this chunk to the client and remembers which clients subscribed to what.
If the chunk is updated the server goes through the subscription list and sends the new data to subscribers.
Client can unsubscribe at any time by sending a special message, disconnecting (or optionally by subscribing to different chunk).
By chunk I mean any way how to identify some data. It can be record ID, time range a filter or anything that makes sense in your project.

Resources