I have a Docker-compose file that has several environment variables in it for database users. We have multiple instances of this application each running on its own server, each with a different database user.
My question is: Is the Docker-compose.yaml file read once after running docker-compose build, and not at any point after?
No. Docker-compose reads yaml file every time you execute docker-compose (build, up, info, etc.).
But if you're going towards modification of environment variables during image build or container run - sorry bro, this aint gonna work.
You can modify environment variables during service lifetime (when using swarm) , but this will restart containers. Same when using docker-compose up again on running project.
Although if you wish to have a separate docker-compose files for each of your environment with db usernames and passwords - this will work when you run docker-compose up.
You can also take advantage of possibility to pass multiple yaml files to docker compose and this way you can have a "base" yaml with common definitions, and environment yamls where you will keep stored credentials for each environment.
However, if you are concern about not exposing passwords, env variables are not the solution. check out docker secrets with docker swarm or use external key store to make passwords secure.
Related
Hi!
My problem is relevant with the deployment of node.js apps via k8s, architecture patterns, and connected them with DBs.
alpha | beta | gamma1 | gamma2
I have the following node.js app services, some of them are scalable with app instances (like gamma), others are separate ones, all of them are built in a single docker image with .Dockefile and running from it.
And I also have a-non-cloud DBs, like elastic & mongo running from their containers with .env: mongo | elastic
As for now, my docker-compose.yml is like a typical node.js example app, but with common volume and bridge-network (except I have more then one node.js app):
version: '3'
services:
node:
restart: always
build: .
ports:
- 80:3000
volumes:
- ./:/code
mongo:
image: mongo
ports:
- 27017:27017
volumes:
- mongodb:/data/db
volumes:
mongodb:
networks:
test-network:
driver: bridge
Current deployment:
All these things are running on a single heavy VPS (X CPU cores, Y RAM, Z SSD, everything loaded by 70%) from single docker-compose.yml file.
What I want to ask and achieve:
Since one VPS is already not enough, I'd like to start using k8s with rancher. So the question is about correct deployment:
For example, I have N VPSs connected within one private network, each VPS is a worker connected in one cluster, (with Rancher, of course, one of them is a master node) which gives me X cores, Y RAM, and other shared resources.
Do I need another, separate cluster (or a VPS machine in a private network, but not part of a cluster) with DB running on it? Or I could deploy DB in the same cluster? And what if each VPS (worker) in the cluster has only 40GB volume, and DB will grow more than this volume? Do shared resources from workers include the shared volume space?
Is it right to have one image from which I can start all my apps, or in the case of k8s, I should I have a separate docker image for each service? So if I have 5 node.js apps within one mono-repo, I should have 5 separate docker-image, not one common?
I'll understand that my question can have a complex answer, so I will be glad to see, not just answer but links or anything that is connected with a problem. It's much more easy to find or google for something, if you know and how to ask.
A purist answer:
Each of your five services should have their own image, and their own database. It's okay for the databases to be in the same cluster so long as you have a way to back them up, run migrations, and do other database-y things. If your cloud provider offers managed versions of these databases then storing the data outside the cluster is fine too, and can help get around some of the disk-space issues you cite.
I tend to use Helm for actual deployment mechanics as a way to inject things like host names and other settings at deploy time. Each service would have its own Dockerfile, its own Helm chart, its own package.json, and so on. Your CI system would build and deploy each service separately.
A practical answer:
There's nothing technically wrong with running multiple containers off the same image doing different work. If you have a single repository and a single build system now, and you don't mind a change in one service causing all of them to redeploy, this approach will work fine.
Whatever build system your repository has now, if you go with this approach, I'd put a single Dockerfile in the repository root and probably have a single Helm chart to deploy it. In the Helm chart Deployment spec you can override the command to run with something like
# This fragment appears under containers: in a Deployment's Pod spec
# (this is Helm chart, Go text/template templated, YAML syntax)
image: {{ .Values.repository }}/{{ .Values.image }}:{{ .Values.tag }}
command: node service3/index.js
Kubernetes's terminology here is slightly off from Docker's, particularly if you use an entrypoint wrapper script. Kubernetes command: overrides a Dockerfile ENTRYPOINT, and Kubernetes args: overrides CMD.
In either case:
Many things in Kubernetes allocate infrastructure dynamically. For example, you can set up a horizontal pod autoscaler to set the replica count of a Deployment based on load, or a cluster autoscaler to set up more (cloud) instances to run Pods if needed. If you have a persistent volume provisioner then a Kubernetes PersistentVolumeClaim object can be backed by dynamically allocated storage (on AWS, for example, it creates an EBS volume), and you won't be limited to the storage space of a single node. You can often find prebuilt Helm charts for the databases; if not, use a StatefulSet to have Kubernetes create the PVCs for you.
Make sure your CI system produces images with a unique tag, maybe based on a timestamp or source control commit ID. Don't use ...:latest or another fixed string: Kubernetes won't redeploy on update unless the text of the image: string changes.
Multiple clusters is tricky in a lot of ways. In my day job we have separate clusters per environment (development, pre-production, production) but the application itself runs in a single cluster and there is no communication between clusters. If you can manage the storage then running the databases in the same cluster is fine.
Several Compose options don't translate well to Kubernetes. I'd especially recommend removing the volumes: that bind-mount your code into the container and validating your image runs correctly, before you do anything Kubernetes-specific. If you're replacing the entire source tree in the image then you're not really actually running the image, and it'll be much easier to debug locally. In Kubernetes you also have almost no control over networks: but they're not really needed in Compose either.
I can't answer the part of your question about the VPS machine setup, but I can make some suggestions about the image setup.
Actually, while you have asked this question about a node app, it's actually applicable for more than just node.
Regarding the docker image and having a common image or separate ones; generally it's up to you and/or your company as to whether you have a common or separate image.
There's both pros and cons about both methods:
You could "bake in" the code into the image, and have a different image per app, but if you run into any security vulnerabilities, you have to patch, rebuild, and redeploy all the images. If you had 5 apps all using the same library, but that library was not in the base image, then you would have to patch it 5 times, once in each image, rebuild the image and redeploy.
Or you could just use a single base image which includes the libraries needed, and mount the codebase in (for example as a configmap), and that base image would never need to change unless you had to patch something in the underlying operating system. The same vulnerability mentioned in the paragraph above, would only need to be patched in the base image, and the affected pods could be respun (no need to redeploy).
I have started working with Docker and so far, everything works.
Until now, I have created a docker file which creates a container and inside this, runs a dotnet application.
Now, the question comes to me if accomplishing the following task is possible:
I have e.g. 5 Json-Files. One docker container relies on one Json-file due to the dotnet application (the json-file includes credentials which are needed by the dotnet application).
So, is there a possibility that it is checked how many json-files are locally stored in the path xy, and, depending on the amount, automatically 5 containers are started and, afterwards, one json-file is passed to each of them?
I did not find anything and I don't know what the best approach is for such a scenario. A script would be great, maybe Linux or powershell? I don't think that this task can be realised by a simple docker file - but maybe I am wrong.
Thanks to everyone for any tips. :-)
What you need is a container orchestrator.
The simplest solution would be to write a shell script and spawn container with the JSON file name as an argument.
You can also select the JSON file to use through the script and only copy this file to the container.
Eventually, consider running these containers through Kubernetes/docker swarm. For Kubernetes, use https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/. We can define templates here. Each JSON file can be named in sequence, and the parameter can be templated.
Eg:
config-1.json
config-2.json
Could someone explain me what is happening when you map (in a volume) your vendor or node_module files?
I had some speed problems of docker environment and red that I don't need to map vendor files there, so I excluded it in docker-compose.yml file and the speed was much faster instantly.
So I wonder what is happening under the hood if you have vendor files mapped in your volume and what's happening when you don't?
Could someone explain that? I think this information would be useful to more than only me.
Docker does some complicated filesystem setup when you start a container. You have your image, which contains your application code; a container filesystem, which gets lost when the container exits; and volumes, which have persistent long-term storage outside the container. Volumes break down into two main flavors, bind mounts of specific host directories and named volumes managed by the Docker daemon.
The standard design pattern is that an image is totally self-contained. Once I have an image I should be able to push it to a registry and run it on another machine unmodified.
git clone git#github.com:me/myapp
cd myapp
docker build -t me/myapp . # requires source code
docker push me/myapp
ssh me#othersystem
docker run me/myapp # source code is in the image
# I don't need GitHub credentials to get it
There's three big problems with using volumes to store your application or your node_modules directory:
It breaks the "code goes in the image" pattern. In an actual production environment, you wouldn't want to push your image and also separately push the code; that defeats one of the big advantages of Docker. If you're hiding every last byte of code in the image during the development cycle, you're never actually running what you're shipping out.
Docker considers volumes to contain vital user data that it can't safely modify. That means that, if your node_modules tree is in a volume, and you add a package to your package.json file, Docker will keep using the old node_modules directory, because it can't modify the vital user data you've told it is there.
On MacOS in particular, bind mounts are extremely slow, and if you mount a large application into a container it will just crawl.
I've generally found three good uses for volumes: storing actual user data across container executions; injecting configuration files at startup time; and reading out log files. Code and libraries are not good things to keep in volumes.
For front-end applications in particular there doesn't seem to be much benefit to trying to run them in Docker. Since the actual application code runs in the browser, it can't directly access any Docker-hosted resources, and there's no difference if your dev server runs in Docker or not. The typical build chains involving tools like Typescript and Webpack don't have additional host dependencies, so your Docker setup really just turns into a roundabout way to run Node against the source code that's only on your host. The production path of building your application into static files and then using a Web server like nginx to serve them is still right in Docker. I'd just run Node on the host to develop this sort of thing, and not have to think about questions like this one.
I have to deploy my application software which is a linux based package (.bin) file on a VM instance. As per system requirements, it needs minimum 8vCPUs and 32GB RAM.
Now, i was wondering if it is possible to deploy this software over multiple containers that load share the CPU and RAM power in the kubernetes cluster, rather than installing the software on a single VM instance.
is it possible?
Yes, it's possible to achieve that.
You can start using docker compose to build your customs docker images and then build your applications quickly.
First, I'll show you my GitHub docker-compose repo, you can inspect the folders, they are separated by applications or servers, so, one docker-compose.yml build the app, only you must run a command docker-compose up -d
if you need to create a custom image with docker you should use this docker command docker build -t <user_docker>/<image_name> <path_of_files>
<user_docker> = your docker user
<image_name> = the image name that you choose
<path_of_files> = somelocal path, if you need to build in the same folder you should use . (dot)
So, after that, you can upload this image to Dockerhub using the following commands.
You must login with your credentials
docker login
You can check your images using the following command
docker images
Upload the image to DockerHub registry
docker push <user_docker>/<image_name>
Once the image was uploaded, you can use it in different projects, make sure to make the image lightweight and usefully
Second, I'll show a similar repo but this one has a k8s configuration into the folder called k8s. This configuration was made for Google cloud but I think you can analyze it and learn how you can start in your new project.
The Nginx service was replaced by ingress service ingress-service.yml and https certificate was added certificate.yml and issuer.yml files
If you need dockerize dbs, make sure the db is lightweight, you need to make a persistent volume using PersistentVolumeClaim (database-persistent-volume-claim.yml file) or if you use larger data onit you must use a dedicated db server or some db service in the cloud.
I hope this information will be useful to you.
There are two ways to achieve what you want to do. The first one is to write a dockerfile and create the image. More information about how to write a dockerfile can be found from here. Apart for that you can create a container from a base image and install all the software and packages and export it as a image. Then you can upload to a docker image repo like Docker Registry and Amazon ECR
I read about docker swarm secrets and did also some testing.
As far as I understood the secrets can replace sensitive environment variables provided in a docker-compose.yml file (e.g. database passwords). As a result when I inspect the docker-compose file or the running container I will not see the password. That's fine - but what does it really help?
If an attacker is on my docker host, he can easily take a look into the /run/secrets
docker exec -it df2345a57cea ls -la /run/secrets/
and can also look at the data inside:
docker exec -it df27res57cea cat /run/secrets/MY_PASSWORD
The same attacker can mostly open a bash on the running container and look how its working....
Also if an attacker is on the container itself he can look around.
So I did not understand why docker secrets are more secure as if I write them directly into the docker-compose.yml file?
A secret stored in the docker-compose.yml is visible inside that file, which should also be checked into version control where others can see the values in that file, and it will be visible in commands like a docker inspect on your containers. From there, it's also visible inside your container.
A docker secret conversely will encrypt the secret on disk on the managers, only store it in memory on the workers that need the secret (the file visible in the containers is a tmpfs that is stored in ram), and is not visible in the docker inspect output.
The key part here is that you are keeping your secret outside of your version control system. With tools like Docker EE's RBAC, you are also keeping secrets out of view from anyone that doesn't need access by removing their ability to docker exec into a production container or using a docker secret for a production environment. That can be done while still giving developers the ability to view logs and inspect containers which may be necessary for production support.
Also note that you can configure a secret inside the docker container to only be readable by a specific user, e.g. root. And you can then drop permissions to run the application as an unprivileged user (tools like gosu are useful for this). Therefore, it's feasible to prevent the secret from being read by an attacker that breaches an application inside a container, which would be less trivial with an environment variable.
Docker Secrets are for Swarm not for one node with some containers or a Docker-Compose for one machine (while it can be used, it is not mainly for this purpose). If you have more than one node then Docker Secrets is more secure than deploying your secrets on more than one worker machine, only to the machines that need the secret based on which container will be running there.
See this blog: Introducing Docker Secrets Management