I am building a micro-services project using docker.
one of my micro-services is a listener that should get data from various number of sources.
What i'm trying to achieve is the ability to start and stop getting data from sources dynamically.
For example in this drawing, i have 3 sources connected to 3 dockers.
My problem starts because i need to create another docker instance when a new source is available. In this example lets say source #4 is now available and i need to get his data (I know when a new source became available) but i want it to be scaled automatically (with source #4 information for listening)
I came up with two solutions, each has advantages and disadvantages:
1) Create a docker pool of a large number of docker running the listener service and every time a new source is available send a message (using rabbitmq but i think less relevant) to an available docker to start getting data.
in this solution i'm a little bit afraid of the memory consumption of the docker images running for no reason - but it is not a very complex solution.
2) Whenever a new source is becoming available create a new docker (with different environment variables)
With this solution i have a problem creating the docker.
At this moment i have achieved this one, but the service that is starting the dockers (lets call it manager) is just a regular nodejs application that is executing commands on the same server - and i need it to be inside a docker container also.
So the problem here is that i couldn't manage create an ssh connection from the main docker to create my new Docker.
I am not quite sure that both of my solutions are on track and would really appreciate any suggestions for my problem.
Your question is a bit unclear, but if you just want to scale a service horizontally you should look into a container orchestration technology that will allow you that - For example Kubernetes. I recommend reading the introduction.
All you would need to do for adding additional service containers is to update the number of desired replicas in the Deployment configuration. For more information read this.
Using kubernetes (or short k8s) you will benefit from deployment automation, self healing and service discovery as well as load balancing capabilities in addition to the horizontal scalability.
There are other orchestration alternatives, too ( e.g. Docker Swarm), but I would recommend to look into kubernetes first.
Let me know if that solves your issue or if you have additional requirements that weren't so clear in your original question.
Links for your follow up questions:
1 - Run kubectl commands inside container
2 - Kubernetes autoscaling based on custom metrics
3 - Env variables in Pods
Related
Hi!
My problem is relevant with the deployment of node.js apps via k8s, architecture patterns, and connected them with DBs.
alpha | beta | gamma1 | gamma2
I have the following node.js app services, some of them are scalable with app instances (like gamma), others are separate ones, all of them are built in a single docker image with .Dockefile and running from it.
And I also have a-non-cloud DBs, like elastic & mongo running from their containers with .env: mongo | elastic
As for now, my docker-compose.yml is like a typical node.js example app, but with common volume and bridge-network (except I have more then one node.js app):
version: '3'
services:
node:
restart: always
build: .
ports:
- 80:3000
volumes:
- ./:/code
mongo:
image: mongo
ports:
- 27017:27017
volumes:
- mongodb:/data/db
volumes:
mongodb:
networks:
test-network:
driver: bridge
Current deployment:
All these things are running on a single heavy VPS (X CPU cores, Y RAM, Z SSD, everything loaded by 70%) from single docker-compose.yml file.
What I want to ask and achieve:
Since one VPS is already not enough, I'd like to start using k8s with rancher. So the question is about correct deployment:
For example, I have N VPSs connected within one private network, each VPS is a worker connected in one cluster, (with Rancher, of course, one of them is a master node) which gives me X cores, Y RAM, and other shared resources.
Do I need another, separate cluster (or a VPS machine in a private network, but not part of a cluster) with DB running on it? Or I could deploy DB in the same cluster? And what if each VPS (worker) in the cluster has only 40GB volume, and DB will grow more than this volume? Do shared resources from workers include the shared volume space?
Is it right to have one image from which I can start all my apps, or in the case of k8s, I should I have a separate docker image for each service? So if I have 5 node.js apps within one mono-repo, I should have 5 separate docker-image, not one common?
I'll understand that my question can have a complex answer, so I will be glad to see, not just answer but links or anything that is connected with a problem. It's much more easy to find or google for something, if you know and how to ask.
A purist answer:
Each of your five services should have their own image, and their own database. It's okay for the databases to be in the same cluster so long as you have a way to back them up, run migrations, and do other database-y things. If your cloud provider offers managed versions of these databases then storing the data outside the cluster is fine too, and can help get around some of the disk-space issues you cite.
I tend to use Helm for actual deployment mechanics as a way to inject things like host names and other settings at deploy time. Each service would have its own Dockerfile, its own Helm chart, its own package.json, and so on. Your CI system would build and deploy each service separately.
A practical answer:
There's nothing technically wrong with running multiple containers off the same image doing different work. If you have a single repository and a single build system now, and you don't mind a change in one service causing all of them to redeploy, this approach will work fine.
Whatever build system your repository has now, if you go with this approach, I'd put a single Dockerfile in the repository root and probably have a single Helm chart to deploy it. In the Helm chart Deployment spec you can override the command to run with something like
# This fragment appears under containers: in a Deployment's Pod spec
# (this is Helm chart, Go text/template templated, YAML syntax)
image: {{ .Values.repository }}/{{ .Values.image }}:{{ .Values.tag }}
command: node service3/index.js
Kubernetes's terminology here is slightly off from Docker's, particularly if you use an entrypoint wrapper script. Kubernetes command: overrides a Dockerfile ENTRYPOINT, and Kubernetes args: overrides CMD.
In either case:
Many things in Kubernetes allocate infrastructure dynamically. For example, you can set up a horizontal pod autoscaler to set the replica count of a Deployment based on load, or a cluster autoscaler to set up more (cloud) instances to run Pods if needed. If you have a persistent volume provisioner then a Kubernetes PersistentVolumeClaim object can be backed by dynamically allocated storage (on AWS, for example, it creates an EBS volume), and you won't be limited to the storage space of a single node. You can often find prebuilt Helm charts for the databases; if not, use a StatefulSet to have Kubernetes create the PVCs for you.
Make sure your CI system produces images with a unique tag, maybe based on a timestamp or source control commit ID. Don't use ...:latest or another fixed string: Kubernetes won't redeploy on update unless the text of the image: string changes.
Multiple clusters is tricky in a lot of ways. In my day job we have separate clusters per environment (development, pre-production, production) but the application itself runs in a single cluster and there is no communication between clusters. If you can manage the storage then running the databases in the same cluster is fine.
Several Compose options don't translate well to Kubernetes. I'd especially recommend removing the volumes: that bind-mount your code into the container and validating your image runs correctly, before you do anything Kubernetes-specific. If you're replacing the entire source tree in the image then you're not really actually running the image, and it'll be much easier to debug locally. In Kubernetes you also have almost no control over networks: but they're not really needed in Compose either.
I can't answer the part of your question about the VPS machine setup, but I can make some suggestions about the image setup.
Actually, while you have asked this question about a node app, it's actually applicable for more than just node.
Regarding the docker image and having a common image or separate ones; generally it's up to you and/or your company as to whether you have a common or separate image.
There's both pros and cons about both methods:
You could "bake in" the code into the image, and have a different image per app, but if you run into any security vulnerabilities, you have to patch, rebuild, and redeploy all the images. If you had 5 apps all using the same library, but that library was not in the base image, then you would have to patch it 5 times, once in each image, rebuild the image and redeploy.
Or you could just use a single base image which includes the libraries needed, and mount the codebase in (for example as a configmap), and that base image would never need to change unless you had to patch something in the underlying operating system. The same vulnerability mentioned in the paragraph above, would only need to be patched in the base image, and the affected pods could be respun (no need to redeploy).
I want to run a docker container which hosts a server which is going to be long running (e.g. 24x7).
Initially I looked at Azure Container Instances (ACI) and whilst these seems to fit the bill perfectly I've been advised they're not designed for long running containers, also they can prove to be quite expensive to run all the time compared to a basic VM.
So I've been looking at what else I should run this as:
AKS - Seems overkill for just one docker container
App Service for containers - my container doesn't have an http endpoint so I believe I will have issues with things like health checks
VM - this seems all a bit manual as I'd really like not to deal with VM maintenance and I'm also unsure I can use CI/CD techniques to build / spin up-down / do releases on a VM image (we're using terraform to deploy infra).
Are there any best practise guides on this, I've tried searching but I'm not finding anything relevant, I'm assuming I'm missing some key term to get going with this!
TIA
ACI is not designed for long-running (uninterrupted) processes have a look here
Recommendation is to use AKS where you can fully manage lifecycle of your machines or just use VMs
I need a lot of various web applications and microservices.
Also, I need to do easy backup/restore and move it between servers/cloud providers.
I started to study Docker for this. And I'm embarrassed when I see advice like this: "create first container for your application, create second container for your database and link these together".
But why I need to do separate container for database? If I understand correctly, the main message is the docker the: "allow to run and move applications with all these dependencies in isolated environment". That is, as I understand, it is appropriate to place in the container application and all its dependencies (especially if it's a small application with no require to have external database).
How I see the best-way for use Docker in my case:
Take a baseimage (eg phusion/baseimage)
Build my own image based on this (with nginx, database and
application code).
Expose port for interaction with my application.
Create data-volume based on this image on the target server (for store application data, database, uploads etc) or restore data-volume from prevous backup.
Run this container and have fun.
Pros:
Easy to backup/restore/move application around all. (Move data-volume only and simply start it on the new server/environment).
Application is the "black box", with no headache external dependencies.
If I need to store data in external databases or use data form this - nothing prevents me for doing it (but usually it is never necessary). And I prefer to use the API of other blackboxes instead direct access to their databases.
Much isolation and security than in the case of a single database for all containers.
Cons:
Greater consumption of RAM and disk space.
A little bit hard to scale. (If I need several instances of app for response on thousand requests per second - I can move database in separate container and link several app instances on it. But it need in very rare cases)
Why I not found recommendations for use of this approach? What's wrong with it? What's the pitfalls I have not seen?
First of all you need to understand a Docker container is not a virtual machine, just a wrapper around the kernel features chroot, cgroups and namespaces, using layered filesystems, with its own packaging format. A virtual machine usually a heavyweight, stateful artifact with extensive configuration options regarding to the resources available on the host machine and you can setup complex environments within a VM.
A container is a lightweight, throwable runtime environment with a recommendation to make it as stateless as possible. All changes are stored with in the container that is just a running instance of the image and you'll loose all diffs in case of container deletion. Of course you can map volumes for more static data, but this is available for the multi-container architecture too.
If you pack everything into one container you loose the capability to scale the components independently from each other and build a tight coupling.
With this tight coupling you can't implement fail-over, redundancy and scalability features into your app config. The most modern nosql databases are built to scale out easily and also the data redundancy could be a possibility when you run more than one backing database instance.
On the other side defining this single-responsible containers is easy with docker-compose, where you can declare them in a simple yml file.
Now I am developing the new content so building the server.
On my server, the base system is the Cent OS(7), I installed the Docker, pulled the cent os, and establish the "WEB SERVER container" Django with uwsgi and nginx.
However I want to up the service, (Database with postgres), what is the best way to do it?
Install postgres on my existing container (with web server)
Build up the new container only for database.
and I want to know each advantage and weak point of those.
It's idiomatic to use two separate containers. Also, this is simpler - if you have two or more processes in a container, you need a parent process to monitor them (typically people use a process manager such as supervisord). With only one process, you won't need to do this.
By monitoring, I mainly mean that you need to make sure that all processes are correctly shutdown if the container receives a SIGSTOP signal. If you don't do this properly, you will end up with zombie processes. You won't need to worry about this if you only have a signal process or use a process manager.
Further, as Greg points out, having separate containers allows you to orchestrate and schedule the containers separately, so you can do update/change/scale/restart each container without affecting the other one.
If you want to keep the data in the database after a restart, the database shouldn't be in a container but on the host. I will assume you want the db in a container as well.
Setting up a second container is a lot more work. You should find a way that the containers know about each other's address. The address changes each time you start the container, so you need to make some scripts on the host. The host must find out the ip-adresses and inform the containers.
The containers might want to update the /etc/hosts file with the address of the other container. When you want to emulate different servers and perform resilience tests this is a nice solution. You will need quite some bash knowledge before you get this running well.
In about all other situations choose for one container. Installing everything in one container is easier for setting up and for developing afterwards. Setting up Docker is just the environment where you want to do your real work. Tooling should help you with your real work, not take all your time and effort.
I am working for a product company and we do make lot of releases of the product. In the current approach to test multiple releases, we create separate VM and install all infrastructure softwares(db, app server etc) on top of it. Later we deploy the application WARs on the respective VM. Recently, I came across docker and it seems to be much helpful. Hence I started exploring it with the examples listed on the site. But, I am not able to find a way as how docker can be applied to build environment suitable to various releases?
Each product version will have db schema changes.
Each application WARs will have enhancements/defects etc.
Consider below example.
Every month, our company is releasing a new version of software and hence in order to support/fix defects we create VMs per release. Given the fact that if the application's overall size is 2 gb and OS takes close to 5 gb (apart from space it will also take up system resources for extra overhead). The VMs are required to restore any release and test any support issues reported against it. But looking at the additional infrastructure requirements, it seems that its very costly affair.
Can docker have everything required to run an application inside a container/image?
Can docker pack an application which consists of multiple WARs/DB schemas and when started allocate appropriate port?
Will there be any space/memory/speed differences compared to VM and docker assuming above scenario?
Do you think docker is still appropriate solution or should we continue using VMs? Can someone share pointers on how I can achieve above requirements with docker?
tl;dr: Yes, docker can run most applications inside a container.
Docker runs a single process inside each container. When using VMs or real servers, this one process is usually the init system which starts all system services. With docker it is usually your app.
This difference will get you faster startup times for your app (not starting the whole operating system). The trade off is that, if you depend on system services (such as cron, sshd…) you will need to start them yourself. There are some base images that provide a more "VM-like" environment… check phusion's baseimage for instance. To start more than a single process, you can also use a process manager such as supervisord.
Going forward, the recommended (although not required) approach is to start one process in each container (one per application server, one per database server, and so on) and not use containers as VMs.
Docker has no problems allocating ports either. It even has an explicit command on the Dockerfile: EXPOSE. Exposed ports can also be published on the docker host with the --publish argument of run so you don't even need to know the IP assigned to the container.
Regarding used space, you will probably see important savings. Docker images are created by stacking filesystem layers… this means that the common layers are only stored once on the server. In your setup, you will likely only have one copy of the base operating system layer (with VMs, you have a copy on each VM).
On memory you will probably see less significant savings (mostly caused by not starting all the operating system services). Speed is still a subject of research… A few things clear so far is that for faster IO you will need to use docker volumes and that for network heavy use cases you should use host networking. Check the IBM research "An Updated Performance Comparison of Virtual Machines and Linux Containers" for details. Or a summary like InfoQ's.