I am building a web app which has two parts. In one part it uses a real time connection between the server and the client and in the other part it does some cpu intensive task to provide relevant data.
Implementing the real time communication in nodejs and the cpu intensive part in python/java. What is the best way the nodejs server can participate in a duplex communication with the other server ?

For a basic solution you can use Socket.IO if you are already using it and know how it works, it will get the job done since it allows for communication between a client and server where the client can be a different server in a different language.
If you want a more robust solution with additional options and controls or which can handle higher traffic throughput (though this shouldn't be an issue if you are ultimately just sending it through the relatively slow internet) you can look at something like ØMQ (ZeroMQ). It is a messaging queue which gives you more control and lots of different communications methods beyond just request-response.
When you set either up I would recommend using your CPU intensive server as the stable end(server) and your web server(s) as your client. Assuming that you are using a single server for your CPU intensive tasks and you are running several NodeJS server instances to take advantage of multi-cores for your web server. This simplifies your communication since you want to have a single point to connect to.
If you foresee needing multiple CPU servers you will want to setup a routing server that can route between multiple web servers and multiple CPU servers and in this case I would recommend the extra work of learning ØMQ.

You can use http.request method provided to make curl request within node's code.
http.request method is also used for implementing Authentication api.
You can put your callback in the success of request and when you get the response data in node, you can send it back to user.
While in backgrount java/python server can utilize node's request for CPU intensive task.

I maintain a node.js application that intercommunicates among 34 tasks spread across 2 servers.
In your case, for communication between the web server and the app server you might consider mqtt.
I use mqtt for this kind of communication. There are mqtt clients for most languages, including node/javascript, python and java. In my case I publish json messages using mqtt 'topics' and any task that has registered to subscribe to a 'topic' receives it's data when published. If you google "pub sub", "mqtt" and "mosquitto" you'll find lots of references and examples. Mosquitto (now an Eclipse project) is only one of a number of mqtt brokers that are available. Another very good broker that is written in Java is called hivemq.
This is a very simple, reliable solution that scales well. In my case literally millions of messages reliably pass through mqtt every day.

You must be looking for socketio
Socket.IO enables real-time bidirectional event-based communication.
It works on every platform, browser or device, focusing equally on reliability and speed.
Sockets have traditionally been the solution around which most
realtime systems are architected, providing a bi-directional
communication channel between a client and a server.


Node.js design approach. Server polling periodically from clients

I'm trying to learn Node.js and adequate design approaches.
I've implemented a little API server (using express) that fetches a set of data from several remote sites, according to client requests that use the API.
This process can take some time (several fecth / await), so I want the user to know how is his request doing. I've read about socket.io / websockets but maybe that's somewhat an overkill solution for this case.
So what I did is:
For each client request, a requestID is generated and returned to the client.
With that ID, the client can query the API (via another endpoint) to know his request status at any time.
Using setTimeout() on the client page and some DOM manipulation, I can update and display the current request status every X, like a polling approach.
Although the solution works fine, even with several clients connecting concurrently, maybe there's a better solution?. Are there any caveats I'm not considering?
TL;DR The approach you're using is just fine, although it may not scale very well. Websockets are a different approach to solve the same problem, but again, may not scale very well.
You've identified what are basically the only two options for real-time (or close to it) updates on a web site:
polling the server - the client requests information periodically
using Websockets - the server can push updates to the client when something happens
There are a couple of things to consider.
How important are "real time" updates? If the user can wait several seconds (or longer), then go with polling.
What sort of load can the server handle? If load is a concern, then Websockets might be the way to go.
That last question is really the crux of the issue. If you're expecting a few or a few dozen clients to use this functionality, then either solution will work just fine.
If you're expecting thousands or more to be connecting, then polling starts to become a concern, because now we're talking about many repeated requests to the server. Of course, if the interval is longer, the load will be lower.
It is my understanding that the overhead for Websockets is lower, but still can be a concern when you're talking about large numbers of clients. Again, a lot of clients means the server is managing a lot of open connections.
The way large services handle this is to design their applications in such a way that they can be distributed over many identical servers and which server you connect to is managed by a load balancer. This is true for either polling or Websockets.

Is a node.js app that both servs a rest-api and handles web sockets a good idea?

Disclaimer: I'm new to node.js so I am sorry if this is a weird question :)
I have a node.js using express.js to serv a REST-API. The data served by the REST-API is fetched from a nosql database by the node.js app. All clients only use HTTP-GET. There is one exception though: Data is PUT and DELETEd from the master database (a relational database on another server).
The thought for this setup is of course to let the 'node.js/nosql database' server(s) be a public front end and thereby protecting the master database from heavy traffic.
Potentially a number of different client applications will use the REST-API, but mainly it will be used by a client app with a long lifetime (typically 0.5 to 2 hours). Instead of letting this app constantly polling the REST-API for possible new data I want to use websockets so that data is only sent to client when there is any new data. I will use a node.js app for this and probably socket.io so that it could fall back to api-polling if websockets are not supported by the client. New data should be sent to clients each time the master database PUTs or DELETEs objects in the nosql database.
The question is if I should use one node.js for both the API and the websockets or one for the API and one for the websockets.
Things to consider:
- Performance: The app(s) will be hosted on a cluster of servers with a load balancer and a HTTP accelerator in front. Would one app handling everything perform better than two apps with distinct tasks?
- Traffic between app: If I choose a two app solution the api app that receives PUTs and DELETEs from the master database will have to notice the websocket app every time it receives new data (or the master database will have to notice both apps). Could the doubled traffic be a performance issue?
- Code cleanlines: I believe two apps will result in cleaner and better code, but then again there will surely be some common code for both apps which will lead to having two copies it.
As to how heavy the load can be it is very difficult to say, but a possible peak can involve:
50000 clients
each listening to up to 5 different channels
new data being sent from master each 5th second
new data should be sent to approximately 25% of the clients (for some data it should be sent to all clients and other data probably below 1% of the clients)
Thanks for the answers guys. More food for thoughts here. I have decided to have two node.js apps, one for the REST-API and one for web sockets. The reason is that I belive it will be easier to scale them. To begin with the whole system will be hosted on three physical servers and one node.js app for the REST-API on each server should bu sufficient, but for the websocket app there probably needs to several instances of it on each physical server.
This is a very good question.
If you are looking at a legacy system, and you already have a REST interface defined, there is not a lot of advantages to adding WebSockets. Things that may point you to WebSockets would be:
a demand for server-to-client or client-to-client real-time data
a need to integrate with server-components using a classic bi-directional protocol (e.g. you want to write an FTP or sendmail client in javascript).
If you are starting a new project, I would try to have a hard split in the project between:
the serving of static content (images, js, css) using HTTP (that was what it was designed for) and
the serving of dynamic content (real-time data) using WebSockets (load-balanced, subscription/messaging based, automatic reconnect enabled to handle network blips).
So, why should we try to have a hard separation? Let's consider the advantages of a HTTP-based REST protocol.
The use of the HTTP protocol for REST semantics is an invention that has certain advantages
Stateless Interactions: none of the client's context is to be stored on the server side between the requests.
Cacheable: Clients can cache the responses.
Layered System: undetectability of intermediaries
Easy testing: it's easy to use curl to test an HTTP-based protocol
On the other hand...
The use of a messaging protocol (e.g. AMQP, JMS/STOMP) on top of WebSockets does not preclude any of these advantages.
WebSockets can be transparently load-balanced, messages and state can be cached, efficient stateful or stateless interactions can be defined.
A basic reactive analysis style can define which events trigger which messages between the client and the server.
Key additional advantages are:
a WebSocket is intended to be a long-term persistent connection, usable for multiple different messaging purpose over a single connection
a WebSocket connection allows for full bi-directional communication, allowing data to be sent in either direction in sympathy with network characteristics.
one can use connection offloading to share subscriptions to common topics using intermediaries. This means with very few connections to a core message broker, you can serve millions of connected users efficiently at scale.
monitoring and testing can be implemented with an admin interface to send/recieve messages (provided with all message brokers).
the cost of all this is that one needs to deal with re-establishment of state when the WebSocket needs to reconnect after being dropped. Many protocol designers build in the notion of a "sync" message to provide context from the server to the client.
Either way, your model object could be the same whether you use REST or WebSockets, but that might mean you are still thinking too much in terms of request-response rather than publish/subscribe.
The first thing you must think about, is how you're going to scale the servers and manage their state. With a REST API this is largely straightforward, as they are for the most part stateless, and every load balancer knows how to proxy http requests. Hence, REST APIs can be scaled horizontally, leaving the few bits of state to the persistence layer (database) to deal with. With websockets, often times its a different matter. You need to research what load balancer you're going to use (if its a cloud deployment, often times it depends on the cloud provider). Then figure out what type of websocket support or configuration the load balancer will need. Then depending on your application, you need to figure out how to manage the state of your websocket connections across the cluster. Think about the different use cases, e.g. if a websocket event on one server alters the state of the data, will you need to propagate this change to a different user on a different connection? If the answer is yes, then you'll probably need something like Redis to manage your ws connections and communicate changes between the servers.
As for performance, at the end of the day its still just HTTP connections, so I doubt there will be a big difference in separating the server functionality. However, I think two servers would go a big way in improving code cleanliness, as long as you have another 'core' module to isolate code common to both servers.
Personally I would do them together, this is because you can share the models and most of the code between the REST and the WS.
At the end of the day what Yuri said in his answer is correct, but is not so much work to load balance WS any way, everyone does it nowadays. The approach I took is have REST for everything and then create some WS "endpoints" for subscribing for realtime data server-client.
So for what I understood, your client would just get notifications from the server, with updates, so definitely I would go with WS. You subscribe to some events and then you get new results when there are. Keep asking with HTTP calls is not the best way.
We had this need and basically built a small framework around this idea http://devalien.github.io/Axolot/
Basically you can understand our approach in the controller (this is just an example, in our real world app we have subscriptions so we can notify when we have new data or when we finish a procedure). In actions there are the rest endpoints and in sockets the websockets endpoints.
module.exports = {
model: 'user', // We are attaching the user to the model, so CRUD operations are there (good for dev purposes)
path: '/user', // Tthis is the end point
actions: {
'get /': [
function (req, res) {
var query = {};
Model.user.find(query).then(function(user) { // Find from the User Model declared above
}).catch(function (err){
res.send(400, err);
sockets: {
getSingle: function(userId, cb) { // This one is callable from socket.io using "user:getSingle
Model.user.findOne(userId).then(function(user) {
}).catch(function (err){
cb({error: err})

redis in Node.js app environment

I am building an app with several Node.js instances as a Backend (http server, socket server and several a pool of domain servers). Now I am trying to cover several communication and configuration aspects and am wondering if redis makes an appropriate solution.
So, I would use it for the following purposes:
Implementation of a shared run-time lookup table. It's a table of several hundreds of relativelly simple records, accessed and manipulated by 2 node-instances.
Implementation of message queues. Each domain server receives commands from the http server and should execute them sequentially. Domain server should be able to listen on a redis-event, and execute each new command upon its arival
socket sever also has a regis message queue and listen to its event, in order to push notification to connected clients
Is redis "too heavy" for such a purpose?
Does it offer all needed functionality?
I can definitelly implement a look-up in a file and/or memory and a queue using sockets. However, it might make a code cleaner and a solution more robust with redis.
Redis is definitely not a heavy solution, on the contrary.
It's small, insanely fast (when using pipelining), easy to deploy. I consider it as a light solution, a kind of swiss knife that may solves many problems.
Redis based message queues are OK if you don't expect any guarantee on the message delivery. That is to say Redis based queues can't assure you the client has received the message. If it's a problem for your application you should consider using an heavier solution, like 0mq or Rabbitmq.

multiple child_process with node.js / socket.io

This is more of a design question rather than implementation but I am kind of wondering if I can design something like this. I have an interactive app (similar to python shell). I want to host a server (lets say using either node.js http server or socket.io since I am not sure which one would be better) which would spawn a new child_process for every client that connects to it and maintains a different context for that particular client. I am a complete noob in terms of node.js or socket.io. The max I have managed is to have one child process on a socket.io server and connect the client to it.
So the question is, would this work ? If not is there any other way in node to get it to work or am I better off with a local server.
Node.js - is single process web platform. Using clustering (child_process), you will create independent execution of same application with separate thread.
Each thread cost memory, and this is generally why most of traditional systems is not much scalable as will require thread per client. For node it will be extremely inefficient from hardware resources point of view.
Node is event based, and you dont need to worry much about scope as far as your application logic does not exploit it.
Count of workers is recommended to be equal of CPU Cores on hardware.
There is always a master application, that will create workers. Each worker will create http + socket.io listeners which technically will be bound to master socket and routed from there.
http requests will be routed for to different workers while sockets will be routed on connection moment, but then that worker will handle this socket until it gets disconnected.

node.js server with socket.io handling 50000 simultaneous clients

We are developing a Javascript control which should be constantly connected to a server for receiving animation updates.
We are planning to host this stuff on an Amazon cloud.
The scenario is like this: server connects to activemq queue waiting for updates, for each update it broadcasts it to all connected clients.
Is it even possible to handle such load with node.js + socket.io?
Will a single node.js server be able to handle such load?
How to organize fast transport between different nodes if we will have to use more than one node?
Will single node.js server be able to handle such load?.. How to organize fast transport between different nodes if we will have to use more than one node
You say that you are planning to host on Amazon. So first off, nothing should be scoped for a single server. Amazon machines will simply "disappear", you have to assume that you are going to use multiple computers.
...handling 50k simultaneous clients
So to start with, 50k connections for a single box is a very big number. Here's a very detailed blog post discussing "getting to 10k" with node.js+socket.io.
Here's a very telling quote:
it seemed as though 10,000 clients simply required more serialization
than my server was able to handle.
So a key component to "getting to 50k" is going to be the amount of work required just pushing data over the wire.
How to organize fast transport between different nodes if we will have to use more than one node.
That blog post is the first of 3. When you're done the first, read the other two. That should point you in the right direction.
