Why Cluster NodeJS vs Docker, And What About Einhorn? - node.js

I have been a fan of the ease with which I can create/compose application functionality using NodeJS. NodeJS, to me, is easy.
When looking at how to take advantage of multi-core machines (and then also considering the additional complexity of port specific apps - like a web app on 80/443), my original solutions looked at NodeJS Cluster (or something like pm2) and maybe a load balancer.
But I'm wondering what would be the downside (or the reason why it wouldn't work) of instead running multiple containers (to address the multi-core situation) and then load balancing across their respective external ports? Past that, would it be better to just use Einhorn or... how does Einhorn fit into this picture?
So, the question is - for NodeJS only (because I'm also thinking about Go) - am I correct in considering "clustering" vs "multiple docker containers with load balancing" as two possible ways to utilize multiple cores?
As a separate question, is Einhorn just an alternative third-party way to achieve the same thing a NodeJS clustering (which could also be used to load balance a Go app, for example)?

Docker is starting to take on more and more of the clustering and load-balancing aspects we used to handle independently, either directly or by idiomatic usage patterns. With NodeJS for example, you can have one nginx or haproxy container load balance between multiple NodeJS containers. I prefer using something like fig, and also setting the restart-policy so that the containers are restarted automatically. This removes the need for other clustering solutions in most cases.

Related

Dockerize VueJS app with node/express in single container

This questions must have been asked many times before but i can't find any solution that can help me solve my problem.
I'm running VueJS application with Express/NodeJS as server and I know the best way is probably to separate them in 2 containers. But how can I make this work in 1 container with multi-stage or any other way.
Any tips would be appreciated! Thank you!
Run multiple services in a container
A container’s main running process is the ENTRYPOINT and/or CMD at the end of the Dockerfile. It is generally recommended that you separate areas of concern by using one service per container. That service may fork into multiple processes (for example, Apache web server starts multiple worker processes). It’s ok to have multiple processes, but to get the most benefit out of Docker, avoid one container being responsible for multiple aspects of your overall application. You can connect multiple containers using user-defined networks and shared volumes.
If you need to run more than one service within a container, you can accomplish this in a few different ways.
Reference: https://docs.docker.com/config/containers/multi-service_container/

How to make a cluster of GitLab instances

Is it possible to create a cluster of multiple GitLab instances (multiple machines)? My instance is over utilized and I would like to add other machines, but at the same for the user should be transparent to access his project, he doesn't care which instance it will be hosted on.
What could be the best solution to help the users?
I'm on GitLab Community Edition 10.6.4
Thanks for your help,
Leonardo
I reckon you are talking about scaling GitLab server, not GitLab runners.
GitLab Omnibus is a fairly complex system with multiple components, some are stateless and some are stateful.
If you currently have everything on the same server, the easiest option is to scale up (move to bigger machine).
If you can't, you can extract stateful components to host them separately: PostgreSQL, Redis, files to NFS.
Funnily you can make performance worse here.
Next step you can scale out the stateless side.
But it is in no way an easy task.
I'd suggest to start with setting up proper monitoring to see where are your limitations (CPU, RAM, IO) and bottle-necks (in which components).
See docs, including some examples of scaling:
https://docs.gitlab.com/ee/administration/high_availability/
https://about.gitlab.com/solutions/high-availability/
https://docs.gitlab.com/charts/
https://docs.gitlab.com/ee/development/architecture.html
https://docs.gitlab.com/ee/administration/high_availability/gitlab.html

Should you create separate docker containers for services like redis and elastic search when they are being used by more than one other service?

I have my entire stack in a docker compose container setup. Currently, load is such that it can all run on a single instance. I have two separate applications and they both use redis and elastic search.
I have seen people suggesting that in cases like MySQL, proper container theory suggests that you should have two separate containers for two separate databases, if you have two separate applications using them.
Which I think is fine for MySQL because my understanding is that separate instances of MySQL doesnt really add much memory or processor overhead.
I'm wondering if this same strategy should apply to redis and elasticsearch. My understanding is that both of these applications can come with considerable overhead. So it seems like it might be inefficient to run more than one instance of them.
It's an interesting question, but I'm not sure there is an universal answer to this. It mostly depends on your situation.
However, there are advantages and drawbacks you must know if you are using a unique container for multiple applications. As an example, let's say you have only 2 applications containers : A and B, and a shared DB container, whatever the technology behind.
Advantages
resource usage is limited. Nonetheless, as you states in your question, if DB container overhead is not that important, then it's not really an advantage
Drawbacks
If A and B are independent applications, then the main disadvantage by sharing DB is that you break that independency and tightly couple your applications via DB :
you cannot update independently the DB container. Version of DB needs to be aligned for both applications. If A requires a new version of DB (new features needed for example), then DB must be upgraded, potentially breaking B
configuration of DB cannot be different for A and B : if A is issuing more writes than read, and if B is intensively reading data, then you probably won't find a perfect configuration for both usages
crash of DB have impacts on both applications : A could even crash B by crashing DB
security concerns : even if A and B have separate database instances in the DB, A could possibly access B database instance, unless you're setting up different access/roles; it's probably easier here to have one container per application, and don't worry about access if they are on the same network (and if DB cannot be accessed from outside, of course)
you have to put A, B and DB services inside the same docker-compose file
Conclusion
If A and B are already tightly coupled apps, then you can probably go for 1 DB. If you don't have many resources, you can also share DB. But don't forget that by doing this, you couple your apps, which you probably doesn't want. Otherwise, the cleanest solution is to go for 1 DB per application.
The main benefit I see which comes from having all linked services in the docker compose stack is that docker will then ensure that all required services are up. However with services like redis and elastic it is fine to have them installed stand-alone with the application just pointing to them via environment variables passed in the docker compose file.
e.g.
myapp:
image: myawesomerepo/myapp:latest
depends_on:
- someother_svc_in_same_docker_compose
environment:
- DB_HOST=172.17.2.73
- REDIS_HOST=172.17.2.103
- APP_ENV=QA
- APM_ENABLE=false
- APM_URL=http://172.17.2.103:8200
- CC_HOST=cc-3102
volumes:
- /opt/deploy/cc/config:/server/app/config
- /opt/deploy/cc/slogs:/server/app/logs
command: node ./app/scheduler/app.js
In the future if you decide you want to have these services hosted, for example, you just need to point the URL in the right direction.

Why nobody does not make it in the docker? (All-in-one container/"black box")

I need a lot of various web applications and microservices.
Also, I need to do easy backup/restore and move it between servers/cloud providers.
I started to study Docker for this. And I'm embarrassed when I see advice like this: "create first container for your application, create second container for your database and link these together".
But why I need to do separate container for database? If I understand correctly, the main message is the docker the: "allow to run and move applications with all these dependencies in isolated environment". That is, as I understand, it is appropriate to place in the container application and all its dependencies (especially if it's a small application with no require to have external database).
How I see the best-way for use Docker in my case:
Take a baseimage (eg phusion/baseimage)
Build my own image based on this (with nginx, database and
application code).
Expose port for interaction with my application.
Create data-volume based on this image on the target server (for store application data, database, uploads etc) or restore data-volume from prevous backup.
Run this container and have fun.
Pros:
Easy to backup/restore/move application around all. (Move data-volume only and simply start it on the new server/environment).
Application is the "black box", with no headache external dependencies.
If I need to store data in external databases or use data form this - nothing prevents me for doing it (but usually it is never necessary). And I prefer to use the API of other blackboxes instead direct access to their databases.
Much isolation and security than in the case of a single database for all containers.
Cons:
Greater consumption of RAM and disk space.
A little bit hard to scale. (If I need several instances of app for response on thousand requests per second - I can move database in separate container and link several app instances on it. But it need in very rare cases)
Why I not found recommendations for use of this approach? What's wrong with it? What's the pitfalls I have not seen?
First of all you need to understand a Docker container is not a virtual machine, just a wrapper around the kernel features chroot, cgroups and namespaces, using layered filesystems, with its own packaging format. A virtual machine usually a heavyweight, stateful artifact with extensive configuration options regarding to the resources available on the host machine and you can setup complex environments within a VM.
A container is a lightweight, throwable runtime environment with a recommendation to make it as stateless as possible. All changes are stored with in the container that is just a running instance of the image and you'll loose all diffs in case of container deletion. Of course you can map volumes for more static data, but this is available for the multi-container architecture too.
If you pack everything into one container you loose the capability to scale the components independently from each other and build a tight coupling.
With this tight coupling you can't implement fail-over, redundancy and scalability features into your app config. The most modern nosql databases are built to scale out easily and also the data redundancy could be a possibility when you run more than one backing database instance.
On the other side defining this single-responsible containers is easy with docker-compose, where you can declare them in a simple yml file.

What is best node.js cluster?

There is a cluster module in node http://nodejs.org/docs/v0.6.19/api/cluster.html
But I found some other implementations like this one https://github.com/learnboost/cluster
What is the best, who is experienced?
Other question,
Is it necessary to use nginx in production? If so, why? How many simultaneous connections can be handled by single modern multicore server with node, 100K, 200k?
Thanx!
The cluster module from https://github.com/learnboost/cluster is only available for Node v0.2.x and v0.4.x, while the official cluster module is baked into the Node core since v0.6.x. Note that the API will change for v0.8.x (which is around the corner).
So you should use the latest version of Node, with Cluster built in.
NGiNX is faster for serving static files, but other than that I don't see any solid reason to use it. If you want a reverse proxy something like HAProxy is better (or you can use a Node solution like node-http-proxy or bouncy).
Unless you are using a "Hello World" example in production, you cannot accurately predict how many simultaneous connection can be handled. Normally a single Node process can handle thousand of concurrent connections.
Resources:
https://github.com/nodejitsu/node-http-proxy
https://github.com/substack/bouncy

Resources