Node.js sockets.io need for synchronization? - node.js

I'm currently working with node.js, using the socket.io library, to implement a simple chat application. In this applicatio, for irrelevant reasons, I want to setup a system in which a client can ask for a piece of information to the server. The server then broadcasts this request to all other online sockets which will respond with the the answer if they have it. The server then finally returns (the first) response it receives to this original client socket that made the request.
Naturally, the client might receive multiple responses, while only one is needed. Therefor, as soon as one has been received, the others should be discarded. However, it feels like I should use some kind of synchronized datastructure/code to make sure this check for "If an answer has already been received" works as intended.
I've done some searching on this subject but I've seen several mentions of node.js using an event-driven model and not requiring any synchronized code/datastructures, as there are no multiple threads. Is this true? Would my scenario not require any kind of special attention to synchronization and would it just work? Or would I need to use some synchronization methods and if so, which ones?
Code example:
socket.on('new_response', async data => {
await processResponse(data)
});
Due to the fact I am working with encryption I have to make use of async/await, which further complicates things. The processResponse function does a check whether a response has been received already, if not, it processes it, else, it ignores it.

I would suggest something as simple as including a uniqueID in each broadcast to the clients asking if they have a piece of information. The clients then include that same uniqueID in any response they send.
With that, your server can receive answers from the clients and just keep track of which uniqueID values it has already received an answer for and then, if an answer has already been received for that uniqueID, then it would just ignore the later clients that respond.
The uniqueID is server-side generated so it can literally just be an increasing number. You can store the numbers used so far in a server-side Set object so you can quickly look up if you've already received a response for that uniqueID.
Then, the only thing left to do is to age these uniqueIDs away in the Set at some point so they don't accumulate forever. A simple way to do that would be just replace the Set object with a second one every 15 minutes or so, keeping one older generation around so you can check both of them.

Related

NodeJS Polling per User Structure best practice

My project is a full stack application where a web client subscribes to an unready object. When the subscription is triggered, the backend will run an observation loop to that unready object until it becomes ready. When that happens it sends a message to the frontend through socketIO (suggestions are welcome, I'm not quite sure if it's the best method). My question is how do I construct the observation loop.
My frontend basically subscribes to the backend, and gets a return 200 and will connect to the server per Websocket (socketIO) if it got subscribed correctly, or an error 4XX code if there was something that went wrong. On the backend, when the user subscribes, it should start for that user, a "thread" (I know Nodejs doesn't support threads, it's just for the mental image) that polls an information from an api every 10 or so seconds.
I do that, because the API that I poll from does not support WebHooks, so I need to observe the API response until it's at the state that I want it (this part I already got cleared).
What I'm asking, is there a third party library that actually is meant for those kinds of tasks? Should I use worker threads or simple setTimeouts abstracted by Classes? The response will be sent over SocketIO, that part I already got working as well, it's just the method I'm using im not quite sure how to build.
I'm also open to use another fitting programming language that makes solving this case easier. I'm not in a hurry.
A polling network request (which it sounds like this is) is non-blocking and asynchronous so it doesn't really take much of your nodejs CPU unless you're doing some heavy-weight computation of the result.
So, a single nodejs thread can make a lot of network requests (for your polling and for sending data over socket.io connection) without adding WorkerThreads or clustering. This is something that nodejs is very, very good at.
I'm not aware of any third party library specifically for this as you have to custom code looking at the results of the network request anyway and that's most of the coding. There are a bunch of libraries for making http requests of other servers from nodejs listed here. My favorite in that list is got(), but you can look at the choices and decide what you like.
As for making the repeated requests, I would probably just use either repeated setTimeout() calls or a setInterval() call.
You don't say whether you have to make separate requests for every single client that is subscribed to something or whether you can somehow combine all clients watching the same resource so that you use the same polling interval for all of them. If you can do the latter, that would certainly be more efficient.
If, as you scale, you run into scaling issues, you can then move the polling code to one or more child processes or WorkerThreads and then just communicate back to the main thread via messaging when you have found a new state that needs to be sent to the client. But, I would not anticipate you would need to code that extra step until you reach larger scale. As with most scaling things, you would need to code up the more basic option (which should scale well by itself) and then measure and benchmark and see where any bottlenecks are and modify the architecture based on data, not speculation. Far too often, the architecture is over-designed and over-implemented based on where people think the bottlenecks might be rather than where they actually turn out to be. Not only does this make the development take longer and end up with more complicated implementation than required, but it can target development at the wrong part of the problem. Profile, measure, then decide.

Building Websites only on NodeJs and Express blocking requests over http

I have a question regarding the examples out there when using Nodejs, Express and Jade for templates.
All the examples show how to build some sort of a user administrative interface where you can add user profiles, delete them and manage them.
Those are considered beginner's guides to NodeJs. My question is around the fact that if I have have 10 users concurrently accessing the same interface and doing the same operations, surely NodeJs will block the requests for the other users as they are running on the same port.
So let's say I am pulling out a list of users which may be something like 10000. Yes I can do paging, but that is not the point. While I am getting the list from the server another 4 users want to access the application. They have to wait for my process to end. That is my question - how can one avoid that using NodeJS & Express?
I am on this issue for a couple of months! I currently have something in place that does the following:
Run the main processing of stuff on a port
Run a Socket.io process on a different port
Use a sticky session
The idea is that I do a request (like getting a list of items), and immediately respond with some request reference but without the requested items, thus releasing the port.
In the background "asynchronously" I then do the process of getting the items. Upon which when completed, I do an http request from one node to the socket node port node SENDING the items through.
When that is done I then perform a socket.io emit WITH the data and the initial request reference so that the correct user gets the message.
On the client side I have an event listening for the socket which then completes the ajax request by populating the list.
I have SOME success in doing this! It actually works to a degree! I have an issue online which complicates matters due to ip addresses, and socket.io playing funny.
I also have multiple workers using clustering. I use it in the following manner:
I create a master worker
I spawn workers
I take any connection request and pass it to the relevant worker.
I do that for the main node request as well as for the socket requests. Like I said I use 2 ports!
As you can see I have had a lot of work done on this and I am not getting a proper solution!
My question is this - have I gone all around the world 10 times only to have missed something simple? This sounds way to complicated to achieve a non-blocking nodejs only website.
I asked myself - surely all these tutorials would have not missed on something as important as this! But they did!
I have researched, read, and tested a lot of code - this is my very first time I ask anything on stackoverflow!
Thank you for any assistance.
P.S. One example of the same approach is this: I request a report using jasper, I pass parameters, and with the "delayed ajax response" approach as described above I simply release the port, and in the background a very intensive report is being generated (and this can be very intensive process as a lot of calculations are being performed)..! I really don't see a better approach - any help will be super appreciated!
Thank you for taking the time to read!
I'm sorry to say it, but yes, you have been going around the world 10 times only to have been missing something simple.
It's obvious that your previous knowledge/experience with webservers are from a blocking point of view, and if this was the case, your concerns had been valid.
Node.js is a framework focused around using a single thread to execute code, which means if it does any blocking operations, no one else would be able to get anything done.
There are some operations that can do this in node, like reading/writing to disk. However, most node operations will be asynchronous.
I believe you are familiar with the term, so I won't go into details. What asynchronous operations allows node to do, is to keep this single thread idle as much as possible. By idle I mean open for other work. If your code is fully asynchronous, then handling 4 concurrent users (or even 400) shouldn't be a problem, even for a single thread.
Now, in regards to your initial problem of ports: Once a request is received on a given port, node.js execute whatever code you have written for it, until it encounters an asynchronous operation as soon as that happens, it is available to to pick up more requests on the same port.
The second problem you inquire about, is the database operation. In this case, node-js would send the query to the database (which takes no time at all) and the database does that actual execution of the query. In the meantime, node is free to do whatever it wants, until the database is finished, and lets node know there is a result to fetch.
You can recognize async operations by their structure: my_function(..., ..., callback). Function that uses a callback function, is in most cases asynch.
So bottom line: Don't worry about the problems around blocking IO, as you will hardly encounter any in node. Use a single port if you want (By creating multiple child processes, you can even have multiple node instances on the same port).
Hope this explains it good enough. If you have any further questions, let me know :)

Sending messages between clients socket.io

I'm working on a chat application and using socket.io / node for that. Basically I came up with the following strategies:
Send message from the client which is received by the socket server which then sends it to the receiving client. On the background I store that to the message on the DB to be retrieved later if the user wishes to seee his old conversations.
The pros of this approach is that the user gets the message almost instantly since we don't wait for the DB operation to complete, but the con is that if the DB operation failed and exactly that time the client refreshed its page to fetch the message, it won't get that.
Send message form the client to the server, the server then stores it on the DB first and then only sends it to the receiving client.
The pros is that we make sure that the message will be received to the client only if its stored in the DB. The con is that it will be no way close to real time since we'll be doing a DB operation in between slowing down the message passing.
Send message to the client which then is stored on a cache layer(redis for example) and then instantly broadcast it to the receiving client. On background keep fetching records from redis and updating DB. If the client refreshes the page, we first look into the DB and then the redis layer.
The pros is that we make the communication faster and also make sure messages are presented correctly on demand. The con is that this is quite complex as compared to above implementations, and I'm wondering if there's any easier way to achieve this?
My question is whats the way to go if you're building a serious chat application that ensures both - faster communication and data persistence. What are some strategies that app like facebook, whatsapp etc. use for the same? I'm not looking for exact example, but a few pointers will help.
Thanks.
I would go for the option number 2. I've been doing myself Chat apps in node and I found out that this is the best option. Saving in a database takes few milliseconds, which includes the 0.x milliseconds to write in the databse and the few milliseconds of latency in communication ( https://blog.serverdensity.com/mongodb-benchmarks/ ).
SO I would consider this approach realtime. The good thing with this is that if it fails, you can display a message to the sender that it failed, for whatever reason.
Facebook, whatsapp and many other big messaging apps are based on XMPP (jabber) which is a very, very big protocol for instant messaging and everything is very well documented on how to do things but it is based in XML, so you still have to parse everything etc but luckily there are very good libraries to handle with xmpp. So if you want to go the common way, using XMPP you can, but most of the big players in this area are not following anymore all the standards, since does not have all the features we are used to use today.
I would go with doing my own version, actually, I already something made (similar to slack), if you want I could give you access to it in private.
So to end this, number 2 is the way to go (for me). XMPP is cool but brings also a lot of complexity.

How do I track and handle an event in node

So let's say I want to make a twitter bot. I want to send a certain message to whoever has sent it a reply, so I need to make an event for it. Obviously one way is to get all the replies (or last n replies) in a certain time interval, find out which ones are new, etc; but first of all it's not live, and it requires an extra query to find new tweets.
Say we want to track some changes in a website. For instance, we want to handle an event when that change happens, instantly.
I used socket.io to handle some other kind of events, like when some changes happen in a particular port, but I couldn't figure out how I can handle these types of events.
The word "event" does not mean what you think it means!
In a DOM environment, an Event is a very specific (and core) concept which allows you to write code based on user interactions with elements on the screen.
In NodeJS, an Event is something that can be generated and announced by an instance of events.EventEmitter
In your question, an Event seems to refer to anything that happens on the internet, potentially anywhere.
Under that last definition, there is simply no single answer for how to "track an event."
If you want to write code that can respond to change (which is just a more specific version of "react to an input") you need to create a mechanism to identify that a change has occurred, followed by a mechanism to trigger whatever code you want to be run in response (this last part is you would normally call "emitting" an "event").
SocketIO accomplishes both of these things for certain situations, using a graceful degredataion of protocols in order to explicitly emit local events that you can listen for and handle. It starts trying to use WebSockets, and eventually falls back to more expensive techniques such as polling.
SocketIO only works if the source of the information or change has decided to support the protocol. In those cases, the source is actually emitting the event (over websockets) and socketIO listens for it.
In cases where the source of the information you are looking for does not support websockets (and hasn't been coded to explicitly notify your servers of changes), you are going to have to come up with your own solutions. However: You shouldn't think of this as a case of tracking "events". Rather, you are watching for changes.
How you watch for changes will depend on the nature of the change. Generally you'll probably have to poll for it.

Broadcasting Messages at High Frequency. Using HTTP POST or something else?

We're looking at speccing out a system which broadcasts small amounts of frequently changing data (using JSON or XML or something) to multiple recipients at a reasonably high frequency (our updates will be 1000s per second).
We were initially thinking of using HTTP POST to broadcast the data to each endpoint, maybe once every few seconds (the clients will vary as they're other people's webapps), but we're now wondering if there's a better way to hold up to the load/frequency we're hoping. I imagine we'd need to version/timestamp the messages in some way at the very least.
We're using RabbitMQ for preparing all the things ready for sending and to choose what needs to go where (from a Django app, if that matters), but we can't get all of the endpoints to use a MQ.
The HTTP POST thing just doesn't seem quite right. What else should we be looking in to? Is this where things like node or socket.io or some of the new real time frameworks fit in? We're happy to find the right expertise to help with this, just need steering the correct direction.
Thanks!
You don't want to do thousands of POSTs per second to multiple clients. You're going to introduce the HTTP overhead on your end pushing it out, and for all you know, you might end up flooding the server on the other end with POSTs that just swamp it.
Option 1: For clients that can't or won't read a queue, POSTS could work, but to avoid killing the server and all the HTTP overhead, could you bundle updates? Once every minute or two, take all the aggregate data and then post it to the client? This way, you don't have 60+ POST requests going to one client every minute or two for time and eternity. It'll help save on bandwidth as well, since you only send all the header info once with more data instead of sending all the header information and pieces of data.
Option 2: Have you thought about using a good 'ole socket connection? Either you open a socket to the client, or vice versa, and push the data over that? That avoids the overhead of HTTP and lets the client read at the rate data arrives. If the client no longer wants to receive data, they can just close the connection. It's on the arcane side, but it'd avoid completely killing the target server.
If you can get clients to read a MQ, set up a group just for them and make your life easier so you only have to deal with those that can't or won't read the queue instead of trying for a one size fits all solution.

Resources