Does clustering in Node.js and auto-scaling web application using Kubernetes serve the same purpose? - node.js

Node.js has introduced the Cluster module to scale up applications for performance optimization. We have Kubernetes doing the same thing.
I'm confused if both are serving the same purpose? My assumption is clustering can spawn up to max 8 processes (if there are 4 cpu cores with 2 threads each) and there is no such limitation in Kubernetes.

Kubernetes and the Node.js Cluster module operate at different levels.
Kubernetes is in charge of orchestrating containers (amongst many other things). From its perspective, there are resources to be allocated, and deployments that require or use a specific amount of resources.
The Node.js Cluster module behaves as a load-balancer that forks N times and spreads the requests between the various processes it owns, all within the limits defined by its environment (CPU, RAM, Network, etc).
In practice, Kubernetes has the possibility to spawn additional Node.js containers (scaling horizontally). On the other hand, Node.js can only grow within its environment (scaling vertically). You can read about this here.
While from a performance perspective both approaches might be relatively similar (you can use the same number of cores in both cases); the problem with vertically scaling on a single machine is that you lose the high-availability aspect that Kubernetes provides. On the other hand, if you decide to deploy several Node.js containers on different machines, you are much more tolerant for the day one of them is going down.

Related

Kubernetes cluster architecture

Does it make sense to create a separate Kubernetes cluster for my Cassandra instances and one cluster for the application layer? Is the DB cluster accessible from the service cluster when both are in the same region and zone?
Or is it better to have one cluster with different pools - one pool for the service layer and one pool the DB nodes?
Thanks
This is more of a toss-up or opinion in terms of how you want to design your whole architecture. Here are some things to consider:
Same cluster:
Pros
Workloads don't need to go to a different podCidr to get its data.
You can optimize your resources in the same set of servers.
This is one of the main reasons people use containers orchestrators and containers.
It allows you to run multiple different types of workloads on the same set of resources.
Cons
If you have an issue with your cluster running Cassandra you risk losing your data. Or temporarily lose data if you have backups. (Longer downtime)
If you'd like to super isolate the db and app in terms of security, it may be harder.
Different clusters:
Pros
'Safer' if one of your clusters goes down.
More separation in terms of security for your data at rest.
Cons
Resources may not be optimally utilized. Leaving some CPUs, memory, etc idle.
More infrastructure management.
Different node pools:
Pros
Separation of data at rest
Still going through the same PodCidr.
Cons
More management of different node pools.
Resources may not be optimally utilized.

Should You Use PM2, Node Cluster, or Neither in Kubernetes?

I am deploying some NodeJS code into Kubernetes. It used to be that you needed to run either PM2 or the NodeJS cluster module in order to take full advantage of multi-core hardware.
Now that we have Kubernetes, it is unclear if one must use one or the other, to get the full benefit of multiple cores.
Should a person specify the number of CPU units in their pod YAML configuration?
Or is there simply no need to account for multiple cores with NodeJS in Kubernetes?
You'll achieve utilization of multiple cores either way; the difference being that with the nodejs cluster module approach, you'd have to "request" more resources from Kubernetes (i.e., multiple cores), which might be more difficult for Kubernetes to schedule than a few different containers requesting one core (or less...) each (which it can, in turn, schedule on multiple nodes, and not necessarily look for one node with enough available cores).

Scaling Node.js: Using autoscaling groups with small virtual servers or cluster processes on VM with many vCPUs?

In learning about Node.js's cluster module I've been turning over the following architecture in my head: Balancing costs with performance, would it be more beneficial (i.e. cheapest but still scalable) to run your Node.js application in a cloud service's autoscaling group using small servers with one virtual CPU (say, AWS's t2.small EC2, 1 vCPU, 2gb memory) or use a larger server (say, an m5.xlarge 4 vCPU, 16gb memory), run Node.js to cluster four child processes to use the 4 vCPUs, but still autoscale?
A possible trade-off is the time it takes AWS to deploy another small server to autoscale, but on a low-traffic app or utility app you'll have to take on the cost of running the larger server when usage is low. But if the time it takes to spin up another server to handle the load is nominal, does that negate the benefits of using the cluster module?
Specifically, my question is twofold: Are these two approaches feasible and, if so, is my presumption about the cluster module's usefulness in the small server approach correct?

Is it possible to isolate spark cluster nodes for each individual application

We have a spark cluster comprising of 16 nodes. Is it possible to limit nodes 1 & 2 for application 'A'; nodes 3,4,5 for application 'B'; nodes 10,11,12,15 for application 'C'; and so on?
From the documentation, I understand that we can set some properties to control spark executor cores, number of executors to be launched, memories etc. But, I am curious to know if I can achieve the above use case.
One obvious way to do that is to configure 3 different clusters with the desired topology, otherwise you're out of luck, spark does not have any provision,
because it is usually a bad idea and generally against the design principles of spark and clustering in general. Why? If you assign application A to specific hosts, but it gets idle, while application B is running at 100%, you have 2 idle hosts that could be working for B, so you would be wasting costly computing resources. Usually, what you want is to assign a certain number of resources per application and let the system decide how to allocate them (scheduling.. plain spark is pretty elementary, but running under YARN and Mesos you can be more sophisticated).
Another reason why it's a bad idea is that you don't want rules that specify a specific host or set of hosts. What if you assign node 1&2 to application A and they both go down? Beside not using your resources efficiently, tying your app to specific hosts makes it also difficult to make them resilient to failure by rescheduling them on other hosts.
You may have other ways to do something similar though, if you're running spark under YARN or Mesos, you can define queues or quotas and limit the amount of resources that each application can use at a given time.
In general, it depends on the reason, why do you want to statically allocate resources to applications. If it's for resource management, you should instead looking at schedulers and queues. If it's for security, you should have multiple clusters, keeping in mind that you'd be losing in performance.

clustering in node.js using mesos

I'm working on a project with Node.js that involves a server. Now due to large number of jobs, I need to perform clustering to divide the jobs between different servers (different physical machines). Note that my jobs has nothing to do do with internet, so I cannot use stateless connection (or redis to keep state) and a load balancer in front of the servers to distribute the connection.
I already read about the "cluster" module, but, from what i understood, it seems to scale only on multiprocessors on the same machine.
My question: is there any suitable distributed module available in Node.js for my work? What about Apache mesos? I have heard that mesos can abstract multiple physical machines into a single server? is it correct? If yes, it is possible to use the node.js cluster module on top of the mesos, since now we have only one virtual server?
Thanks
My question: is there any suitable distributed module available in Node.js for my work?
Don't know.
I have heard that mesos can abstract multiple physical machines into a single server? is it correct?
Yes. Almost. It allows you to pool resources (CPU, RAM, DISK) across multiple machines, gives you ability to allocate resources for your applications, run and manage the said applications. So you can ask Mesos to run X instances of node.js and specify how much resource does each instance needs.
http://mesos.apache.org
https://www.cs.berkeley.edu/~alig/papers/mesos.pdf
If yes, it is possible to use the node.js cluster module on top of the mesos, since now we have only one virtual server?
Admittedly, I don't know anything about node.js or clustering in node.js. Going by http://nodejs.org/api/cluster.html, it just forks off a bunch of child workers and then round robins the connection between them. You have 2 options off the top of my head:
Run node.js on Mesos using an existing framework such as Marathon. This will be fastest way to get something going on Mesos. https://github.com/mesosphere/marathon
Create a Mesos framework for node.js, which essentially does what cluster node.js is doing, but across the machines. http://mesos.apache.org/documentation/latest/app-framework-development-guide/
In both these solutions, you have the option of letting Mesos create as many instances of node.js as you need, or, use Mesos to run cluster node.js on each machine and let it manage all the workers on that machine.
I didn't google, but there might already be a node.js mesos framework out there!

Resources