Node Cluster and/or Docker Cluster? - node.js

Trying to get the best performance from my application with as little setup as possible.
I'm struggling to find a consensus online of whether it would be better to use the Node cluster module in a Docker container, or to use a cluster of Docker instances instead.
OPINION: Node cluster first, then Docker cluster
OPINION: Don't use Node cluster in a Docker instance

Depends what "best performance" means? What is the bottleneck in your case? CPU? RAM? Network? Disk-I/O?
Advantages of a node cluster:
All communication is in memory.
Disadvantage
The solution doesn't scale beyond one host. If the host is overloaded, then so is your service
Advantages of a docker cluster:
high availability.
more network bandwidth, more resources as you have more hosts
Assuming you run your software as a service in docker anyway, I can't see the issue of "little setup as possible". Use both if it makes sense.

Related

Kafka using Docker for production clusters

We need to build a Kafka production cluster with 3-5 nodes in cluster ,
We have the following options:
Kafka in Docker containers (Kafka cluster include zookeeper and schema registry on each node)
Kafka cluster not using docker (Kafka cluster include zookeeper and schema registry on each node)
Since we are talking on production cluster we need good performance as we have high read/write to disks (disk size is 10T), good IO performance, etc.
So does Kafka using Docker meet the requirements for productions clusters?
more info - https://www.infoq.com/articles/apache-kafka-best-practices-to-optimize-your-deployment/
It can be done, sure. I have no personal experience with it, but if you don't otherwise have experience managing other stateful containers, I'd suggest avoiding it.
As far as "getting started" with Kafka in containers, Kubernetes is the most documented way, and Strimzi (free, optional commercial support by Lightbend) or Confluent Operator (commercial support by Confluent) can make this easy when using Kubernetes or Openshift. Or DC/OS offers a Kafka service over Mesos/Marathon. If you don't already have any of these services, then I think it's apparent that you should favor not using containers.
Bare metal or virtualized deployments would be much easier to maintain than hand-deployed containerized ones, from what I have experienced. Particularly for logging, metric gathering, and statically assigned Kafka listener mappings over the network. Confluent provides Ansible scripts for doing deployments to such environments
That isn't to say there's companies that have been successful at it, or at least tried. IBM, RedHat, and Shopify immediately pop up in my searches, for example
Here's a few talk about things to consider when Kafka is in containers
https://www.confluent.io/kafka-summit-london18/kafka-in-containers-in-docker-in-kubernetes-in-the-cloud
https://kafka-summit.org/sessions/running-kafka-kubernetes-practical-guide/

Should You Use PM2, Node Cluster, or Neither in Kubernetes?

I am deploying some NodeJS code into Kubernetes. It used to be that you needed to run either PM2 or the NodeJS cluster module in order to take full advantage of multi-core hardware.
Now that we have Kubernetes, it is unclear if one must use one or the other, to get the full benefit of multiple cores.
Should a person specify the number of CPU units in their pod YAML configuration?
Or is there simply no need to account for multiple cores with NodeJS in Kubernetes?
You'll achieve utilization of multiple cores either way; the difference being that with the nodejs cluster module approach, you'd have to "request" more resources from Kubernetes (i.e., multiple cores), which might be more difficult for Kubernetes to schedule than a few different containers requesting one core (or less...) each (which it can, in turn, schedule on multiple nodes, and not necessarily look for one node with enough available cores).

Drone slaves provided by CoreOs

I have a drone host and a CoreOS cluster with fleet.
The drone now have only unix:///var/run/docker.sock in the nodes menu.
As I understand, I could add other docker nodes defined by docker URLs and certificates. However once I have a CoreOS cluster, it seems logical to use that as the provider of the slaves. I am looking for a solution where
(1)I don't have to configure the nodes whenever the CoreOS cluster configration changes, and
(2) provides correct resource management.
I could think of the following solutions:
Expose docker uris in the CoreOS cluster nodes, and configure all of them directly in drone. In this case I would have follow CoreOs cluster changes manually. Resource management would probably conflict with that of fleet.
Expose docker uris in the CoreOS cluster nodes, and provide a DNS round-robin based access. Seems to be a terrible way of resource management, and would most probably conflict with feet.
Install Swarm on the CoreOs nodes. Resource management would probably conflict with that of fleet.
Have fleet or RKT expose a docker uri, and fleet/RKT would decide on which node the container runs on. The problem is that I could not find any way to do this.
Have drone.io use fleet or RKT. Same problem. Is it possible?
Is there any way to provide solutions for all of my requirements with drone.io and CoreOs?
As I understand, I could add other docker nodes defined by docker URLs
and certificates. However once I have a CoreOS cluster, it seems
logical to use that as the provider of the slaves.
The newest version of drone supports build agents. Build agents are installed per-server and will communicate with the central drone server to pull builds from the queue, execute and send back the results.
docker run \
-e DRONE_SERVER=http://my.drone.server \
-e DRONE_SECRET=passcode \
-v /var/run/docker.sock:/container/path/docker.sock \
drone/drone:0.5 agent
This allows you to add and remove agents on the fly without having to register or manage them at the server level.
I believe this should solve the basic problem you've outlined, although I'm not sure it will provide the level of integration you desire with fleet and coreos. Perhaps a coreos expert can augment my answer.

clustering in node.js using mesos

I'm working on a project with Node.js that involves a server. Now due to large number of jobs, I need to perform clustering to divide the jobs between different servers (different physical machines). Note that my jobs has nothing to do do with internet, so I cannot use stateless connection (or redis to keep state) and a load balancer in front of the servers to distribute the connection.
I already read about the "cluster" module, but, from what i understood, it seems to scale only on multiprocessors on the same machine.
My question: is there any suitable distributed module available in Node.js for my work? What about Apache mesos? I have heard that mesos can abstract multiple physical machines into a single server? is it correct? If yes, it is possible to use the node.js cluster module on top of the mesos, since now we have only one virtual server?
Thanks
My question: is there any suitable distributed module available in Node.js for my work?
Don't know.
I have heard that mesos can abstract multiple physical machines into a single server? is it correct?
Yes. Almost. It allows you to pool resources (CPU, RAM, DISK) across multiple machines, gives you ability to allocate resources for your applications, run and manage the said applications. So you can ask Mesos to run X instances of node.js and specify how much resource does each instance needs.
http://mesos.apache.org
https://www.cs.berkeley.edu/~alig/papers/mesos.pdf
If yes, it is possible to use the node.js cluster module on top of the mesos, since now we have only one virtual server?
Admittedly, I don't know anything about node.js or clustering in node.js. Going by http://nodejs.org/api/cluster.html, it just forks off a bunch of child workers and then round robins the connection between them. You have 2 options off the top of my head:
Run node.js on Mesos using an existing framework such as Marathon. This will be fastest way to get something going on Mesos. https://github.com/mesosphere/marathon
Create a Mesos framework for node.js, which essentially does what cluster node.js is doing, but across the machines. http://mesos.apache.org/documentation/latest/app-framework-development-guide/
In both these solutions, you have the option of letting Mesos create as many instances of node.js as you need, or, use Mesos to run cluster node.js on each machine and let it manage all the workers on that machine.
I didn't google, but there might already be a node.js mesos framework out there!

npm cluster package on a server cluster

So I have an app I am working on and I am wondering if I am doing it correctly.
I am running cluster on my node.js app, here is a link to cluster. I couldn't find anywhere that states if I should only run cluster on a single server or if it is okay to run it on a cluster of servers. If I continue down the road I am going I will have a cluster inside a cluster.
So that it is not just opinions as answers, here is my question. Was cluster the package made to do what I am doing (cluster of workers on a single server inside a cluster of servers)?
Thanks in advance!
Cluster wasn't specifically designed for that, but there is nothing about it which would cause a problem. If you've designed your app to work with cluster, it's a good indication that your app will also scale across multiple servers. The main gotcha would be if you're doing anything stateful on the filesystem. For example, if a user uploads a photo and you store it on the server disk, that would be problematic when scaling out across multiple servers (that don't share the same disk).

Resources