Multiple http servers running on different ports in node.js - node.js

http.createServer(onRequest).listen(8888);
http.createServer(onRequest).listen(8080);
In this case, I understand that two different http servers are created which listen to at different ports.
http.createServer(onRequestA).listen(8888);
http.createServer(onRequestB).listen(8080);
In this case the servers listen on various ports and also do different actions.
I have a a few questions.
Are these two approaches commonly used in the real world?
Is there really an advantage of snippet 1?
If such multiple servers can be created, what is the maximum number
of servers that can be created from a single node instance?

In answering your questions directly,
Are these two approaches commonly used in the real world?
It depends on what you're trying to archive on those ports.
Is there really an advantage of snippet 1?
It also depends on the action you plan to take on the ports, if you wanna run the same requests, it doesn't make sense running multiple ports.
If such multiple servers can be created, what is the maximum number of servers that can be created from a single node instance?
You might want to NOTE: Standard practices say no non-root process gets to talk to
the Internet on a port less than 1024, and also remember that the maximum number you can go for the ports is 65536 i.e (0 ~ 65535).

Related

Faye clustering multiple nodes NodeJS

I am trying to make a pub/sub infra using faye (nodejs). I wish to know whether horizontal scaling would be possible or not.
One nodejs process will run on single core, so when people are talking about clustering, they talk about creating multiple processes on the same machine, sharing a port, and sharing data through redis.
Like this:
http://www.davidado.com/2013/12/18/using-node-js-cluster-with-socket-io-for-push-notifications/
Firstly, I don't understand how we make sure that each of the forked processes goes to a different core. If I fork 10 node servers on a machine with 4 cores, is it taken care that they are equally distributed?
What if I wish to add is a new machine, and thus scale it. I have not seen any such support anywhere. I am not sure if it is even possible to do it.
Let's say somehow multiple nodes are being used and there is some load balancer. But one client will connect to only one server process. So when a client C1 publishes on a channel on which a client C2 has subscribed, and C1 is connected to process P1 and C2 is connected to process P2, how will P1 publish the message to C2 when it doesn't have the connection?
This would probably be possible in case of a single machine, because the cluster module enables all processes to share the same port and the connections too.
I am fairly new to the web world, as well as nodejs and faye. Please enlighten me if there is something wrong in the question.
You are correct in thinking that the cluster module allows multiple cores to be used on a single machine. The cluster module allows the same application to be spawned multiple times whilst listening to the same port. The distribution amongst the cores is down to the operating system, so if you have 10 processes and 4 cores then the OS will figure out how best to distribute them (as long as they haven't been spawned with a set affinity). By default this shouldn't be a concern for you.
Load-balancing can be done through node too but that is separate from clustering. Instead you would have a separate application that would grab the load statistics on each running server and proxy the http request to the most appropriate server (using http-proxy as an example). A very primitive load balancer will send one request to each running server instance incrementally to give an even distribution.
The final point about sharing messages between all the instances assumes that there is a single point where all the messages are held. In the article you linked to they assume that there is only one server and all the processes share access to the redis instance. As they all access the same redis instance, all processes will be able to receive the same messages. If we're going to start thinking about multiple servers that are in different locations in the world that all have different message stores (i.e. their own redis instances) then we get into the domain of 'replication'. Some data stores are built with this in mind and redis is one of them. You end up with a 'master' set of data and a set of 'slaves' that will periodically update with the master and grab anything they are missing. It is important to note here that messages will not be sent in 'real-time' here unless you have a very intensive replication process.
In conclusion, developers go through this chain of scaling for their applications. The first is to make the application multi-process (the cluster module). The second is to have a load balancer that proxies the http request to the appropriate server that is running the multi-process application. The third is to replicate the datastores so that the servers can run independently but keep in sync with each other.

How to make a distributed node.js application?

Creating a node.js application is simple enough.
var app = require('express')();
app.get('/',function(req,res){
res.send("Hello world!");
});
But suppose people became obsessed with your Hello World! application and exhausted your resources. How could this example be scaled up on practice? I don't understand it, because yes, you could open several node.js instance in different computers - but when someone access http://your_site.com/ it aims directly that specific machine, that specific port, that specific node process. So how?
There are many many ways to deal with this, but it boils down to 2 things:
being able to use more cores per server
being able to scale beyond more than one server.
node-cluster
For the first option, you can user node-cluster or the same solution as for the seconde option. node-cluster (http://nodejs.org/api/cluster.html) essentially is a built in way to fork the node process into one master and multiple workers. Typically, you'd want 1 master and n-1 to n workers (n being your number of available cores).
load balancers
The second option is to use a load balancer that distributes the requests amongst multiple workers (on the same server, or across servers).
Here you have multiple options as well. Here are a few:
a node based option: Load balancing with node.js using http-proxy
nginx: Node.js + Nginx - What now? (using more than one upstream server)
apache: (no clearly helpful link I could use, but a valid option)
One more thing, once you start having multiple processes serving requests, you can no longer use memory to store state, you need an additional service to store shared states, Redis (http://redis.io) is a popular choice, but by no means the only one.
If you use services such as cloudfoundry, heroku, and others, they set it up for you so you only have to worry about your app's logic (and using a service to deal with shared state)
I've been working with node for quite some time but recently got the opportunity to try scaling my node apps and have been researching on the same topic for some time now and have come across following pre-requisites for scaling:
My app needs to be available on a distributed system each running multiple instances of node
Each system should have a load balancer that helps distribute traffic across the node instances.
There should be a master load balancer that should distribute traffic across the node instances on distributed systems.
The master balancer should always be running OR should have a dependable restart mechanism to keep the app stable.
For the above requisites I've come across the following:
Use modules like cluster to start multiple instances of node in a system.
Use nginx always. It's one of the most simplest mechanism for creating a load balancer i've came across so far
Use HAProxy to act as a master load balancer. A few pointers on how to use it and keep it forever running.
Useful resources:
Horizontal scaling node.js and websockets.
Using cluster to take advantages of multiple cores.
I'll keep updating this answer as I progress.
The basic way to use multiple machines is to put them behind a load balancer, and point all your traffic to the load balancer. That way, someone going to http://my_domain.com, and it will point at the load balancer machine. The sole purpose (for this example anyways; in theory more could be done) of the load balancer is to delegate the traffic to a given machine running your application. This means that you can have x number of machines running your application, however an external machine (in this case a browser) can go to the load balancer address and get to one of them. The client doesn't (and doesn't have to) know what machine is actually handling its request. If you are using AWS, it's pretty easy to set up and manage this. Note that Pascal's answer has more detail about your options here.
With Node specifically, you may want to look at the Node Cluster module. I don't really have alot of experience with this module, however it should allow you to spawn multiple process of your application on one machine all sharing the same port. Also node that it's still experimental and I'm not sure how reliably it will be.
I'd recommend to take a look to http://senecajs.org, a microservices toolkit for Node.js. That is a good start point for beginners and to start thinking in "services" instead of monolitic applications.
Having said that, building distributed applcations is hard, take time to learn, take LOT of time to master it, and usually you will face a lot trade-off between performance, reliability, manteinance, etc.

Nginx or LVS for Node.js load balance?

Our project needs to do TCP packet load balance to node.js .
The proposal is: (Nginx or LVS) + Keepalived + Node Cluster
The questions:
The high concurrent client connections to TCP server needs to be long-lived. Which one is more suitable, Nginx or LVS?
We need to allocate different priority levels for node master on the Master server (the priority of localhost server will be higher than the remote servers). Which one can do this, Nginx or LVS?
Whose CPU utilization is smaller and the throughput is higher, Nginx or LVS?
Any recommended documents for performance benchmarking/function comparison between Nginx and LVS?
At last, we wonder whether our proposal is reasonable. Is there any other better proposals or component to choose?
I'm assuming you do not need nginx to server static assets, otherwise LVS would not be an option.
1) nginx only supports TCP via 3rd party module https://github.com/yaoweibin/nginx_tcp_proxy_module If you don't need a webserver, I'd say LVS is more suitable, but see my additional comment at the end of the #'d answers.
2) LVS supports priority, nginx does not.
3) Probably LVS: nginx is userland, LVS kernel.
4) Lies, Damned Lies and Benchmarks. You have to simulate your load on your equip, write a node client script and pound your setup.
We are looking at going all node from front to back with up https://github.com/LearnBoost/up Not in production yet, but we are pursuing this route for the following reasons:
1) We also have priority requirements, but they are custom and change dynamically. We are adjusting priority at runtime and it took us less than an hour to program node to do it.
2) We deploy a lot of code updates and up allows us to do it without interrupting existing clients. Because you can code it to do anything you want, we can spin up brand new processes to handle new connections and let the old ones die when existing connections are all gone.
3) We can see everything because we push any metric we want to see into a redis server.
I'm sure it's not the most performant per process/server, but the advantage of having so much programatic control is worth it, and scale out has the advantage of more redundancy so we are not looking at squeezing the last bit of performance out of the stack.
I just checked real quick to see if I could copy/paste a bunch of code, but we are rapidly coding it and it has a lot of references to stuff that would not be suitable for public consumption.

Concurrent networking in Scala

I have a working prototype of a concurrent Scala program using Actors. I am now trying to fine tune the number of different Actors, etc..
One stage of the processing requires fetching new data via the internet. Of course, there is nothing I can really do to speed that aspect up. However, I figure if I launch a bunch of requests in parallel, I can bring down the total time. The question, therefore, is:
=> Is there a limit on concurrent networking in Scala or on Unix systems (such as max num sockets)? If so, how can I find out what it is.
In Linux, there is a limit on the number of open file descriptors each program can have open. This can be seen using the ulimit -n. There is a system-wide limit in /proc/sys/kernel/file-max.
Another limit is the number of connections that the Linux firewall can track. If you are using the iptables connection tracking firewall this value is in /proc/sys/net/netfilter/nf_conntrack_max.
Another limit is of course TCP/IP itself. You can only have 65534 connections to the same remote host and port because each connection needs a unique combination of (localIP, localPort, remoteIP, remotePort).
Regarding speeding things up via concurrent connections: it isn't as easy as just using more connections.
It depends on where the bottlenecks are. If your local connection is being fully used, adding more connections will only slow things down. If you are connecting to the same remote server and its connection is fully used, more will only slow it down.
Where you can get a benefit is when your local connection is not fully used and you are connecting to multiple remote hosts.
If you look at web browsers, you will see they have limits on how many connections will be made to the same remote server. They also have limits on how many connections will be made in total.

Comet and node.js - how many simultaneous connections could we expect on an EC2 server?

With a comet server running on node.js - how many simultaneous connections could we expect to get out of an EC2 server?
Anyone done this before and found a reasonable limit?
Our particular application only needs to push data to the clients fairly infrequently, it's more the max simultaneous connections per server that is a worry for us. We're looking at somewhere between 200k - 500k i think, and i'm trying to figure out if comet is going to be workable without a monstrous fleet of servers...
If you are running Linux, get to know the contents of /proc/sys/net/ipv4
In particular, net.ipv4.netfilter.ip_conntrack_max will let you increase the maximum number of open connections, but when you start plugging in really big numbers you will run into other problems. For instance you might need to reduce orphan_retries because you will statistically be more likely to have orphans. And with really big numbers, it is entirely possible that kernel lookup algorithms will slow down significantly. You need to carefully tune the TCP settings.
If I were in your shoes, I would compare at least two OSes, such as Linux and FreeBSD or OpenSolaris/Illumos.
On FreeBSD you will need to change settings in /boot/loader.conf
On OpenSolaris/Illumos you will need to read the documentation for the ndd command.

Resources