K8s: deployment patterns for node.js apps with dbs

K8s: deployment patterns for node.js apps with dbs - node.js

Hi!
My problem is relevant with the deployment of node.js apps via k8s, architecture patterns, and connected them with DBs.
alpha | beta | gamma1 | gamma2
I have the following node.js app services, some of them are scalable with app instances (like gamma), others are separate ones, all of them are built in a single docker image with .Dockefile and running from it.
And I also have a-non-cloud DBs, like elastic & mongo running from their containers with .env: mongo | elastic
As for now, my docker-compose.yml is like a typical node.js example app, but with common volume and bridge-network (except I have more then one node.js app):
version: '3'
services:
node:
restart: always
build: .
ports:
- 80:3000
volumes:
- ./:/code
mongo:
image: mongo
ports:
- 27017:27017
volumes:
- mongodb:/data/db
volumes:
mongodb:
networks:
test-network:
driver: bridge
Current deployment:
All these things are running on a single heavy VPS (X CPU cores, Y RAM, Z SSD, everything loaded by 70%) from single docker-compose.yml file.
What I want to ask and achieve:
Since one VPS is already not enough, I'd like to start using k8s with rancher. So the question is about correct deployment:
For example, I have N VPSs connected within one private network, each VPS is a worker connected in one cluster, (with Rancher, of course, one of them is a master node) which gives me X cores, Y RAM, and other shared resources.
Do I need another, separate cluster (or a VPS machine in a private network, but not part of a cluster) with DB running on it? Or I could deploy DB in the same cluster? And what if each VPS (worker) in the cluster has only 40GB volume, and DB will grow more than this volume? Do shared resources from workers include the shared volume space?
Is it right to have one image from which I can start all my apps, or in the case of k8s, I should I have a separate docker image for each service? So if I have 5 node.js apps within one mono-repo, I should have 5 separate docker-image, not one common?
I'll understand that my question can have a complex answer, so I will be glad to see, not just answer but links or anything that is connected with a problem. It's much more easy to find or google for something, if you know and how to ask.

A purist answer:
Each of your five services should have their own image, and their own database. It's okay for the databases to be in the same cluster so long as you have a way to back them up, run migrations, and do other database-y things. If your cloud provider offers managed versions of these databases then storing the data outside the cluster is fine too, and can help get around some of the disk-space issues you cite.
I tend to use Helm for actual deployment mechanics as a way to inject things like host names and other settings at deploy time. Each service would have its own Dockerfile, its own Helm chart, its own package.json, and so on. Your CI system would build and deploy each service separately.
A practical answer:
There's nothing technically wrong with running multiple containers off the same image doing different work. If you have a single repository and a single build system now, and you don't mind a change in one service causing all of them to redeploy, this approach will work fine.
Whatever build system your repository has now, if you go with this approach, I'd put a single Dockerfile in the repository root and probably have a single Helm chart to deploy it. In the Helm chart Deployment spec you can override the command to run with something like
# This fragment appears under containers: in a Deployment's Pod spec
# (this is Helm chart, Go text/template templated, YAML syntax)
image: {{ .Values.repository }}/{{ .Values.image }}:{{ .Values.tag }}
command: node service3/index.js
Kubernetes's terminology here is slightly off from Docker's, particularly if you use an entrypoint wrapper script. Kubernetes command: overrides a Dockerfile ENTRYPOINT, and Kubernetes args: overrides CMD.
In either case:
Many things in Kubernetes allocate infrastructure dynamically. For example, you can set up a horizontal pod autoscaler to set the replica count of a Deployment based on load, or a cluster autoscaler to set up more (cloud) instances to run Pods if needed. If you have a persistent volume provisioner then a Kubernetes PersistentVolumeClaim object can be backed by dynamically allocated storage (on AWS, for example, it creates an EBS volume), and you won't be limited to the storage space of a single node. You can often find prebuilt Helm charts for the databases; if not, use a StatefulSet to have Kubernetes create the PVCs for you.
Make sure your CI system produces images with a unique tag, maybe based on a timestamp or source control commit ID. Don't use ...:latest or another fixed string: Kubernetes won't redeploy on update unless the text of the image: string changes.
Multiple clusters is tricky in a lot of ways. In my day job we have separate clusters per environment (development, pre-production, production) but the application itself runs in a single cluster and there is no communication between clusters. If you can manage the storage then running the databases in the same cluster is fine.
Several Compose options don't translate well to Kubernetes. I'd especially recommend removing the volumes: that bind-mount your code into the container and validating your image runs correctly, before you do anything Kubernetes-specific. If you're replacing the entire source tree in the image then you're not really actually running the image, and it'll be much easier to debug locally. In Kubernetes you also have almost no control over networks: but they're not really needed in Compose either.

I can't answer the part of your question about the VPS machine setup, but I can make some suggestions about the image setup.
Actually, while you have asked this question about a node app, it's actually applicable for more than just node.
Regarding the docker image and having a common image or separate ones; generally it's up to you and/or your company as to whether you have a common or separate image.
There's both pros and cons about both methods:
You could "bake in" the code into the image, and have a different image per app, but if you run into any security vulnerabilities, you have to patch, rebuild, and redeploy all the images. If you had 5 apps all using the same library, but that library was not in the base image, then you would have to patch it 5 times, once in each image, rebuild the image and redeploy.
Or you could just use a single base image which includes the libraries needed, and mount the codebase in (for example as a configmap), and that base image would never need to change unless you had to patch something in the underlying operating system. The same vulnerability mentioned in the paragraph above, would only need to be patched in the base image, and the affected pods could be respun (no need to redeploy).

Related

Can Docker containers running a node.js service on ECS in production share a node_modules volume mounted from EFS?

Is it good practice for node.js service containers running under AWS ECS to mount a shared node_modules volume persisted on EFS? If so, what's the best way to pre-populate the EFS before the app launches?
My front-end services run a node.js app, launched on AWS Fargate instances. This app uses many node_modules. Is it necessary for each instance to install the entire body of node_modules within its own container? Or can they all mount a shared EFS filesystem containing a single copy of the node_modules?
I've been migrating to AWS Copilot to orchestrate, but the docs are pretty fuzzy on how to pre-populate the EFS. At one point they say, "we recommend mounting a temporary container and using it to hydrate the EFS, but WARNING: we don't recommend this approach for production." (Storage: AWS Copilot Advanced Use Cases)

Thanks for the question! This is pointing out some gaps in our documentation that have been opened as we released new features. There is actually a manifest field, image.depends_on which mitigates the issue called out in the docs about prod usage.
To answer your question specifically about hydrating EFS volumes prior to service container start, you can use a sidecar and the image.depends_on field in your manifest.
For example:
image:
build: ./Dockerfile
depends_on:
bootstrap: success
storage:
volumes:
common:
path: /var/copilot/common
read_only: true
efs: true
sidecars:
bootstrap:
image: public.ecr.aws/my-image:latest
essential: false #Allows the sidecar to run and terminate without the ECS task failing
mount_points:
- source_volume: common
path: /var/copilot/common
read_only: false
On deployment, you'd build and push your sidecar image to ECR. It should include either your packaged data or a script to pull down the data you need, then move it over into the EFS volume at /var/copilot/common in the container filesystem.
Then, when you next run copilot svc deploy, the following things will happen:
Copilot will create an EFS filesystem in your environment. Each service will get its own isolated access point in EFS, which means service containers all share data but can't see data added to EFS by other services.
Your sidecar will run to completion. That means that all currently running services will see the changes to the EFS filesystem whenever a new copy of the task is deployed unless you specifically create task-specific subfolders in EFS in your startup script.
Once the sidecar exits successfully, the new service container will come up on ECS and operate as normal. It will have access to the EFS volume which will contain the latest copy of your startup data.
Hope this helps.

Is it good practice for node.js service containers running under AWS
ECS to mount a shared node_modules volume persisted on EFS?
Asking if it is "good" or not is a matter of opinion. It is generally a common practice in ECS however. You have to be very cognizant of the IOPS your application is going to generate against the EFS volume however. Once an EFS volume runs out of burst credits it can really slow down and impact the performance of your application.
I have not ever seen an EFS volume used to store node_modules before. In all honesty it seems like a bad idea to me. Dependencies like that should always be bundled in your Docker image. Otherwise it's going to be difficult when it comes time to upgrade those dependencies in your EFS volume, and may require down-time to upgrade.
If so, what's the best way to pre-populate the EFS before the app
launches?
You would have to create the initial EFS volume and mount it somewhere like an EC2 instance, or another ECS container, and then run whatever commands necessary in that EC2/ECS instance to copy your files to the volume.
The quote in your question isn't present on the page you linked, so it's difficult to say exactly what other approach would the Copilot team would recommend.

Use shared database docker container in microservice architecture

From the count of questions tagged with docker i assume StackOverflow is the right place to ask (instead of e.g. DevOps), if not, please point me to the right place or move this question accordingly.
My scenario is the following:
multiple applications consisting of frontend (web GUI) and backend (REST services) are being developed following SOA/microservice approaches, each application has its own git repository
some applications require a shared additional resource like frontend needs a HTTP server and multiple backend applications need a database server (with persistent storage)
focus is primarily on offline mobile development (on the road) so a quick setup of required services/applications should be possible and the amount of resource overhead should be minimal. But of course the whole thing will be deployed/published at some point so i dont want to obstruct that if both can be managed
development is done on windows and linux host machines
access to all services from host machine is required for development purposes
What i am trying to achieve is to have a docker-compose.yaml file in the application repositories which i invoke via docker-compose up which would then start all required containers if not running already, e.g. the database container is started when i invoke docker-compose up in a backend application repository.
My approach was to have a new git repository which defines all shared docker images/containers, with its own docker-compose.yaml where all devs would have to run docker-compose build whenever something changed (might be automated with a git commit hook in the future). The central docker-compose.yaml looks like this
version: "3"
services:
postgres:
build: ./images/postgres
image: MY-postgres
container_name: MY-postgres-server
ports:
- "5432:5432"
httpd:
build: ./images/httpd
image: MY-httpd
container_name: MY-httpd-server
ports:
- "80:80"
The Dockerfile describing how each image is built is in its own subfolder and i think not relevant for the question, basically the default images for alpine + apache/postgres.
So the problem: how would a docker-compose.yaml in the application git repository look like that references the services/containers defined by the above central docker-compose.yaml.
Now since this is no new problem scenario, i did some research and honestly the variety of approaches and proposed solutions was confusing, for once the various versions and compatibilities, features that were deprecated, etc.
We want one single database instance for now for performance reasons and simplicity (reddit) or is this the problem because it is truly considered an anti-pattern (via this answer). Each application would be using its own database within the container, so no sync required on application level.
I am reading about volumes or data only containers to solve this problem, yet i cant understand how to implement
Some (Single Host scenario) suggest links (with depends_on) while i think this concept has been superseeded by networks but is it still applying? There seemed to be an extends option as well
docker-compose has an option --no-deps which is described as Don't start linked services.. If i omit it, i would assume it does what i need, but here i think then problem is the difference in meaning of image/container/service
Can a combination of multiple compose files solve this problem? This would add a hard requirement on project paths though
If i cant start the containers from my application directory, id like to at least link to them, is external_links the right approach?
There are some feature requests (feature: including external docker-compose.yml, allow sharing containers across services) so maybe its just not possible currently with docker means? Then how to solve it with third-party like dcao include (which doesnt support version 3)?
Wow, that escalated quickly. But i wanted to show the research i have done since i just cant believe that its currently not possible.

Microservices on docker - architecture

I am building a micro-services project using docker.
one of my micro-services is a listener that should get data from various number of sources.
What i'm trying to achieve is the ability to start and stop getting data from sources dynamically.
For example in this drawing, i have 3 sources connected to 3 dockers.
My problem starts because i need to create another docker instance when a new source is available. In this example lets say source #4 is now available and i need to get his data (I know when a new source became available) but i want it to be scaled automatically (with source #4 information for listening)
I came up with two solutions, each has advantages and disadvantages:
1) Create a docker pool of a large number of docker running the listener service and every time a new source is available send a message (using rabbitmq but i think less relevant) to an available docker to start getting data.
in this solution i'm a little bit afraid of the memory consumption of the docker images running for no reason - but it is not a very complex solution.
2) Whenever a new source is becoming available create a new docker (with different environment variables)
With this solution i have a problem creating the docker.
At this moment i have achieved this one, but the service that is starting the dockers (lets call it manager) is just a regular nodejs application that is executing commands on the same server - and i need it to be inside a docker container also.
So the problem here is that i couldn't manage create an ssh connection from the main docker to create my new Docker.
I am not quite sure that both of my solutions are on track and would really appreciate any suggestions for my problem.

Your question is a bit unclear, but if you just want to scale a service horizontally you should look into a container orchestration technology that will allow you that - For example Kubernetes. I recommend reading the introduction.
All you would need to do for adding additional service containers is to update the number of desired replicas in the Deployment configuration. For more information read this.
Using kubernetes (or short k8s) you will benefit from deployment automation, self healing and service discovery as well as load balancing capabilities in addition to the horizontal scalability.
There are other orchestration alternatives, too ( e.g. Docker Swarm), but I would recommend to look into kubernetes first.
Let me know if that solves your issue or if you have additional requirements that weren't so clear in your original question.
Links for your follow up questions:
1 - Run kubectl commands inside container
2 - Kubernetes autoscaling based on custom metrics
3 - Env variables in Pods

Limit useable host resources in Docker compose without swarm

I simply want to limit the resources of some Docker containers in a docker-compose file. The reason is simple: There are multiple apps/services running on the host. So I want to avoid, that a single container can use e.g. all memory, which harms the other containers.
From the docs I learned, that this can be done using resources. But this is beyond deploy. So I have to write my docker-compose file like the following example:
php:
image: php:7-fpm
restart: always
volumes:
- ./www:/www
deploy:
resources:
limits:
memory: 512M
This gave me the warning:
WARNING: Some services (php) use the 'deploy' key, which will be ignored. Compose does not support deploy configuration - use docker stack deploy to deploy to a swarm.
And that seems to be true: docker statsconfirms, the container is able to use all the ram from the host.
The documentation says:
Specify configuration related to the deployment and running of services. This only takes effect when deploying to a swarm with docker stack deploy, and is ignored by docker-compose up and docker-compose run.
But I don't need clustering. It seems that there is no other way to limit resources using a docker composer file. Why is it not possible to specify some kind of memorytag like the start-parameter in docker rundoes?
Example: docker run --memory=1g $imageName
This works perfectly for a single container. But I can't use this (at least without violating a clean separation of concerns), since I need to use two different containers.
Edit: Temp workaround
I found out, that I'm able to use mem_limit directly after downgrading from version 3 to version 2 (placing version: '2' on top). But we're currently on version 3.1, so this is not a long-time solution. And the docs say, that deploy.resources is the new replacement for v2 tags like mem_limit.
Someday, version 2 is deprecated. So resource management isn't possible any more with the latest versions, at least without having a swarm? Seems a worsening for me, can't belive this...

Since many Docker Compose users have complained about this incompatibility of compose v3 vs v2, the team has developed compatibility mode.
You can retain the same deploy structure that you provided and it will not be ignored, simply by adding the --compatibility flag to the docker-compose command (docker-compose --compatibility up), as explained here. I tested this with version 3.5 and verified with docker stats and can confirm that it works.

You can run the docker daemon in swarm mode on a single host. It will add extra un-needed features like the etcd service discovery but that's all behind the scene.
The Docker documentation has a "note" about it here https://docs.docker.com/engine/swarm/swarm-tutorial/#three-networked-host-machines

rethinkdb nodejs container in cluster environment

rethinkdb and nodejs+express app fit well in container for cluster environment??
The situation is below in a docker container
1. Running rethinkdb and nodjs+express app in one container
2. During the boot up of nodejs app it checks if there is a specific database and table exist or not. if not then create database and table
Running in one docker container works fine. But the problem is we need to do clustering of rethinkdb as well as maintaining specific number of replicas of the table.
putting all those clustering and replicas logic in the nodejs app seems not a good idea. Kind of stuck how can I proceed.
Help is very much appreciated.

Running rethinkdb and nodjs+express app in one container
You should typically not do this. Put rethinkdb in it's own container and put your application in a separate container.
I'd recommend using docker-compose and setup a docker-compose.yml file for your service. Make sure to use the depends_on property on the web application declaration so that docker will startup the rethinkdb container before the application container.
If you're hand spinning up your RethinkDB containers you should be totally set, but if you're using Swarm or some other scheduler, continue reading.
One problem RethinkDB has currently with automated / scheduled / containerized environments are ephemerality of containers and the possibility that they will possibly restart and come back with a different IP address. This requires some additional tooling around RethinkDB to modify the config tables.
For a bit of reading I'd recommend checking out how this was achieved in Kubernetes.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string