socket.io send data to client based on id passed by client - node.js

Say I have a rest end point which when called starts a long running process server side e.g.
http://host/api/program/start
and I want to push any updates / output from that process from the server side to a client.
I'm thinking the rest call would return some sort of unique id which the client could then use when connecting to the websocket to only receive updates about that particular process.
I'd have to think about buffering the output / updates from the process to send to the client if they didn't connect before the first output from the process but irrespective of that, what would be the best way of achieving the socket data handling for this? Could I make use of the socket.io rooms / namespaces in some way?

If you really want to do it this way, I would suggest generating the ID via the initial start call, then passing that to the long running process as an argument. Then that process publishes all messages to that ID (which appropriate clients are listening to as well).
However, I would discourage you from going from this approach. There are plenty of ways to go about handling a child process in Node, so you might want to look into these options a little more so you don't end up dealing with zombie processes all over the place.
The first that comes to mind is ChildProcess. Another option would be something like WebWorker Threads. Either of these would be right in the vein of what (I think) you're trying to do, but allow you to maintain much more control over the child processes.

Related

Building Websites only on NodeJs and Express blocking requests over http

I have a question regarding the examples out there when using Nodejs, Express and Jade for templates.
All the examples show how to build some sort of a user administrative interface where you can add user profiles, delete them and manage them.
Those are considered beginner's guides to NodeJs. My question is around the fact that if I have have 10 users concurrently accessing the same interface and doing the same operations, surely NodeJs will block the requests for the other users as they are running on the same port.
So let's say I am pulling out a list of users which may be something like 10000. Yes I can do paging, but that is not the point. While I am getting the list from the server another 4 users want to access the application. They have to wait for my process to end. That is my question - how can one avoid that using NodeJS & Express?
I am on this issue for a couple of months! I currently have something in place that does the following:
Run the main processing of stuff on a port
Run a Socket.io process on a different port
Use a sticky session
The idea is that I do a request (like getting a list of items), and immediately respond with some request reference but without the requested items, thus releasing the port.
In the background "asynchronously" I then do the process of getting the items. Upon which when completed, I do an http request from one node to the socket node port node SENDING the items through.
When that is done I then perform a socket.io emit WITH the data and the initial request reference so that the correct user gets the message.
On the client side I have an event listening for the socket which then completes the ajax request by populating the list.
I have SOME success in doing this! It actually works to a degree! I have an issue online which complicates matters due to ip addresses, and socket.io playing funny.
I also have multiple workers using clustering. I use it in the following manner:
I create a master worker
I spawn workers
I take any connection request and pass it to the relevant worker.
I do that for the main node request as well as for the socket requests. Like I said I use 2 ports!
As you can see I have had a lot of work done on this and I am not getting a proper solution!
My question is this - have I gone all around the world 10 times only to have missed something simple? This sounds way to complicated to achieve a non-blocking nodejs only website.
I asked myself - surely all these tutorials would have not missed on something as important as this! But they did!
I have researched, read, and tested a lot of code - this is my very first time I ask anything on stackoverflow!
Thank you for any assistance.
P.S. One example of the same approach is this: I request a report using jasper, I pass parameters, and with the "delayed ajax response" approach as described above I simply release the port, and in the background a very intensive report is being generated (and this can be very intensive process as a lot of calculations are being performed)..! I really don't see a better approach - any help will be super appreciated!
Thank you for taking the time to read!
I'm sorry to say it, but yes, you have been going around the world 10 times only to have been missing something simple.
It's obvious that your previous knowledge/experience with webservers are from a blocking point of view, and if this was the case, your concerns had been valid.
Node.js is a framework focused around using a single thread to execute code, which means if it does any blocking operations, no one else would be able to get anything done.
There are some operations that can do this in node, like reading/writing to disk. However, most node operations will be asynchronous.
I believe you are familiar with the term, so I won't go into details. What asynchronous operations allows node to do, is to keep this single thread idle as much as possible. By idle I mean open for other work. If your code is fully asynchronous, then handling 4 concurrent users (or even 400) shouldn't be a problem, even for a single thread.
Now, in regards to your initial problem of ports: Once a request is received on a given port, node.js execute whatever code you have written for it, until it encounters an asynchronous operation as soon as that happens, it is available to to pick up more requests on the same port.
The second problem you inquire about, is the database operation. In this case, node-js would send the query to the database (which takes no time at all) and the database does that actual execution of the query. In the meantime, node is free to do whatever it wants, until the database is finished, and lets node know there is a result to fetch.
You can recognize async operations by their structure: my_function(..., ..., callback). Function that uses a callback function, is in most cases asynch.
So bottom line: Don't worry about the problems around blocking IO, as you will hardly encounter any in node. Use a single port if you want (By creating multiple child processes, you can even have multiple node instances on the same port).
Hope this explains it good enough. If you have any further questions, let me know :)

Can I mq_send to reply after I mq_recieve?

I have one or more daemon app running and to communicate with it I have a client app. The client app is something simple executed on the command line. Chances are only one will be up at a given moment. When I do a command such as daemon update-config the client does mq_open and sends the command. Some commands like list I'd want results. It appears that if I run mq_send in my daemon after I receive I may receive the message within the daemon app.
What's the best way to send the reply to the client w/o accidentally processing it in the daemon? After a quick lookup there didn't appear to be an obvious solution so I do sleep(1) which seems to solve my problem completely even though it's a 'hack'. Whats the best solution? is sleep the most understandable and straightforward solution? I don't feel like generating random/unique values, passing it in and opening another mq to send it. The sleep for a second feels like the best solution but I wonder what your solutions may be.
When using messaging systems, you can do RPC calls even if it is not the best paradigm to use messaging in general. The general approach to RPC with messaging is:
have distinct queues for requests and for replies (the latter ones can be ephemeral queues, created for each request, or persistent queues);
give to each message a unique ID, that will be used in the replies to identify which message it was replying to. (it's called correlation_id in AMQP for example).
I do guess that you can use the same approach with Posix queues as well.

Node.js App with API endpoints which take 20sec+ :: Connections Left Open :: How to Optimize?

I have a Node.js RESTful API returning JSON data. One of the API calls can (and frequently does) take 10 - 20 seconds to finish. This long RTT is due to connecting to external APIs, like DiffBot, MailChimp, Facebook, Twitter, etc. I wish I could make the API call shorter, but I cannot.
Of course, I've implemented the node code in a nice async way, but the problem is that the client's inbound connection (to the node app) is alive while it waits for the server to finish, and thus might be killing my performance. In fact, I'm currently guessing that this may explain my long-running timeout issue in node.
I've already increased maxSockets to a huge number...
require('http').globalAgent.maxSockets = 9999;
For the sake of interest, I'm printing out the active sockets each time a new connection is made (here's the code).
Which gives me output like this:
SOCKETS: {} { 'graph.facebook.com:443': 5, 'api.instagram.com:443': 1 }
Nothing too enlightening there. The max connections I ever see is around 20 or so, total, across all hosts. But this doesn't really tell me anything about incoming connections, or how to optimize them so that my server does not choke when there are many of them alive at once (which I suspect it is).
You should optimize your architecture, not just the code.
First, I would change the way the client/server interact with each other. The server should end the request upon recept and notify the client once all the tasks for that request are truly complete.
There are different ways to achieve that. For example, the client can query the stats of the request using AJAX (poll) every X seconds. Another example would be to use WebSocket.
If you're going with this approach, look into Socket.IO. It supports many transports with the same API, if WebSocket is available, it would use that, otherwise, it would fall back to other transports such as Flash Socket, long-polling, etc.
Second, you shouldn't use one process to do all this work. You should use a queue (preferably a messaging system that supports queues), then, run workers (separate processes) to do the "heavy lifting".
Personally, I use AMQP due to its features and portability (it's an open-standard) but feel free to use any other queue system with a persistant backend.
That way, if one or more process(es) crash(es) and you use the right queue, you wouldn't lose any data (such as the API tasks you mentioned).
Hope it helps.

How to get Node.js processes communicate with one another

I have an nodejs chat app where multiple clients connect to a common chat room using socketio. I want to scale this to multiple node processes, possibly on different machines. However, clients that connect to the same room will not be guaranteed to hit the same node process. For example user 1 will hit node process A and user 2 will hit node process B. They are in the same room so if user 1 sends a message, user 2 should get it. What's the best way to make this happen since their connections are managed by different processes?
I thought about just having the node processes connect to redis. This at least solves the problem that process A will know there's another user, user 2, in the room but it still can't send to user 2 because process B controls that connection. Is there a way to register a "value changed" callback for redis?
I'm in a server environment where I can't control any of the routing or load balancing.
Both node.js processes can be subscribed to some channel through redis pub/sub and listen to messages which you pass to this channel. For example, when user 1 connects to process A on the first machine, you can store in redis information about this user along with the information which process on which machine manages it. Then when user 2, which is connected to process B on the second machine, sends a message to user 1, you can publish it to this channel and check which process on which machine is responsible for managing communication with user 1 and respond accordingly.
I have done(did) some research on this. Below my findings:
Like yojimbo87 said you first just use redis pub/sub(is very optimized).
http://comments.gmane.org/gmane.comp.lang.javascript.nodejs/22348
Tim Caswell wrote:
It's been my experience that the bottleneck is the serialization and
de-serialization of the data, not the actual channel. I'm pretty sure
you can use named pipes, but I'm not sure what the API is. msgpack
seems like a good format for the data interchange. There are a few
libraries out there that implement msgpack or ipc frameworks on top of
it.
But when serialization / deserialization becomes your bottle-neck I would try to use https://github.com/pgriess/node-msgpack. I would also like to test this out, because I think the sooner you have this the better?

NodeJS - Child node process?

I'm using NodeJS to run a socket server (using socket.io). When a client connects, I want am opening and running a module which does a bunch of stuff. Even though I am careful to try and catch as much as possible, when this module throws an error, it obviously takes down the entire socket server with it.
Is there a way I can separate the two so if the connected clients module script fails, it doesn't necessarily take down the entire server?
I'm assuming this is what child process is for, but the documentation doesn't mention starting other node instances.
I'd obviously need to kill the process if the client disconnected too.
I'm assuming these modules you're talking about are JS code. If so, you might want to try the vm module. This lets you run code in a separate context, and also gives you the ability to do a try / catch around execution of the specific code.
You can run node as a separate process and watch the data go by using spawn, then watch the stderr/stdout/exit events to track any progress. Then kill can be used to kill the process if the client disconnects. You're going to have to map clients and spawned processes though so their disconnect event will trigger the process close properly.
Finally the uncaughtException event can be used as a "catch-all" for any missed exceptions, making it so that the server doesn't get completely killed (signals are a bit of an exception of course).
As the other poster noted, you could leverage the 'vm' module, but as you might be able to tell from the rest of the response, doing so adds significant complexity.
Also, from the 'vm' doc:
Note that running untrusted code is a tricky business requiring great care.
To prevent accidental global variable leakage, vm.runInNewContext is quite
useful, but safely running untrusted code requires a separate process.
While I'm sure you could run a new nodejs instance in a child process, the best practice here is to understand where your application can and will fail, and then program defensively to handle all possible error conditions.
If some part of your code "take(s) down the entire ... server", then you really to understand why this occurred and solve that problem rather than rely on another process to shield you from the work required to design and build a production-quality service.

Resources