Docker containers instead of multiprocessing - multithreading

One of the main application of Docker containers is load-balancing. For example, in the case of a web application, instead of having only one instance handling all requests, we have many containers doing exactly the same thing, but the requests are split toward all of these instances.
But can it be used to do the same service, but with different "parameters"?
For instance, let's suppose I want to create a platform storing crypto-currency data from different exchange platforms (Bitfinex, Bittrex, etc.).
A lot of these platforms are handling web sockets. So in order to create one socket per platform, I would do something at the "code layer" like (language agnostic):
foreach (platform in platforms)
client = createClient(platform)
socket = client.createSocket()
socket.GetData()
Now of course, this loop would be stuck on the first iteration, because the websocket is waiting (although I could use asynchrony, anyway). To circumvent that, I could use multiprocessing, something like:
foreach (platform in platforms)
client = createClient(platform)
socket = client.createSocket()
process = new ProcessWhichGetData(socket)
process.Launch()
Is there any way to do that at a "Docker layer", I mean to use Docker to make the different containers handling different platforms? I would have one Docker container for Bittrex, one Docker container for Bitfinex, etc.
I know this would imply that either the different containers would communicate between each other (who takes care of Bitfinex? who takes care of Bittrex?), or the container orchestrator (Docker Swarm / Kubernete) would handle itself this "repartition".
Is it something we could do, and, on top of that, is it something we want?

Docker containerization simply adds various layers of isolation around regular user-land processes. It does not by itself introduces coordination among several processes, though it certainly can be exploited in building a multi-process system where each process perform some jobs, no matter if these jobs are redundant or complementary.
If you can design your solution so that one process is launched for each "platform" (for example, passing the specific platform an instance should handle as a command line parameter), then indeed, this can technically be done in Docker.
I should however point out that it is not clear why you would want to run each process in a distinct container. Is isolation pertinent for security reasons? For resource accounting? To have each process dispatched to a distinct host in order to have access to more processing power? Also, is there coordination required among these processes, outside of the having to initially determine which process handle which platform? If so, do they need to have access to a shared storage, or be able to send signals to each others? These questions will help you decide how to approach the dockerization of your solution.
In the most simple case, assuming that all you want is to have the whole process be isolated from the rest of the system, but with no requirement that these processes be isolated from each other, then the most simple strategy would simply to have a single container that contains an entrypoint shell script, which will simply launch one process per platform.
entrypoint.sh (inside your docker image):
#!/bin/bash
platforms=Bitfinex Bittrex
for platform in ${platforms} ; do
./myprogram "${platform}" &
done
If you really need a distinct container for each platform, then you would use a similar script, but this time, it would be run directly on the host machine (that is, outside of a container), and would encapsulate each process inside a docker container.
launch.sh (directly on the host):
#!/bin/bash
for platform in ${platforms} ; do
docker -name "program_${platform}" my_program_docker \
/usr/local/bin/myprogram "$platform"
done
Alternatively, you could use docker-compose to define the list of docker containers to be launched, but I will not discuss more this option at present (just ask if this seems pertinent to your you case).
If you need containers to be distributed among several host machines, then that same loop could be used, but this time, processes would be launched using docker-machine. Alternatively, if using docker-compose, the processes could be distributed using Swarm.

Say you restructured this as a long-running program that handled only one platform at a time, and controlled which platform it was via a command-line option or an environment variable. Instead of having your "launch all the platforms" loop in code, you might write a shell script like
#!/bin/sh
for platform in $(cat platforms.txt); do
./run_platform $platform &
done
This setup is easy to transplant into Docker.
You should not plan on processes launching Docker containers dynamically. This is hard to set up and has significant security implications (by which I mean "a bug in your container launcher could easily root your host").
If the individual processing tasks can all run totally independently (maybe they use a shared database to store data) then you're basically done. You could replace that shell script with something like a Docker Compose YAML file that lists out all of the containers; if you want to run this on multiple hosts you can use tools like Ansible, or Docker Swarm, or Kubernetes to spread the containers out (with varying levels of infrastructure complexity).

You can bunch the different docker containers in a STACK and also configure networking so that the docker containers can remain isolated form the outside world but can communicate with each other.
More info here Docker Stack

Related

Understanding the technical (Docker) container architecture

I am new to containers and would like to get a good knowledge about how container technology (Docker) is made up from 'scratch'. I have to write a paper and hope that I have every important thing correctly understood so far.
The following diagram is made by me and shows my current understanding of containers.
Obviously we need an OS with a Kernel that allows us to use the hardware. For Docker this is Linux. Docker for Windows uses a VM with Linux for that.
On top of our Linux OS we then run our Docker Engine. Our Docker Engine is in charge of starting, building, configuring ... our images and containers. But most importantly the Docker Engine handles everything that has to do with isolating of containers, for example it maintains how namespaces or cgroups are used so that every container has it's own full filesystem.
Then we have our actual containers. Containers themselvesneed almost every time a kind of OS itself. This is mostly just a very compact one like Alpine or Busybox. They collect a small number of standard functions such as 'file', 'tar', 'grep' that most software definitely need. This compact OS is now using the Kernel from our full Linux OS. They don't have their own Kernel.
On top of the compact OS we then place our actual piece of software such as Node.js or a NGINX Server. This software is only using the compact OS which in return uses the Kernel from our full Linux OS. And all data or modifications that is generated or done in runtime are made on the writeable layer of our container.
And if I understood correctly, our container or everything that runs in our container is not using or interacting with our full Linux OS but just with it's Kernel?
I also don't quite understand how the writeable layer in a container works. Like how does my software for example know that a modified file from a read-only layer is now present in the writeable layer and should use this?
I would really appreciate some corrections or suggestions on what I have missed out so far. Thank you
And if I understood correctly, our container or everything that runs in our container is not using or interacting with our full Linux OS but just with it's Kernel?
The containers are just processes. For kernel, Docker daemon, NodeJS application and Nginx are processes. That's why containers don't have their own kernels. The difference between Docker daemon process (and other processes on a host) and processes that are running within containers is in their scope (it's called a namespace). Processes in containers are run in isolation and they don't see anything around their namespace. There are many different namespaces, for example, a pid namespace is one of them and it limits the visibility of other processes. That's why ps command in a container doesn't show processes from a host or other containers. Namespaces is a kernel things and they are more about what a process can see and access to while there is also cgroups that apply limits for CPU and memory usage.
I hope this helps you somehow, at least, I tried to put more attention to the kernel because Docker is just a daemon that spins new processes with configured namespaces, cgroups and own filesystem.
Here are some links that might be useful:
What even is a container: namespaces and cgroups
How containers work: overlayfs
See also other posts about containers on https://jvns.ca (I recommend it because Julia explains it in simple words and even provide illustrations.)
If you want to go deeper, I'd suggest to look at Namespaces: from chroot() to containers slides and read the article about creation of own containers.

How to create a docker image of current file and OS system?

I wonder if one can take all the current environment variables settings OS applications and create a simple docker layer on top of it all so that docker container user will not be able to damage host system even if he would remove all files, yet will have abilety to access all installed applications and system settings inside his docker layer?
Technically you might be able to hack together a solution that does this by copying in all data/apps, installing dependencies, re-configuring the applications and providing a bash shell to attach to for a user to play around with but this is not what Docker is designed for at all, not to mention that I would not recommend anyone to attempt this.
I always try to explain docker's usecase as processes which run in isolated containers with defined interfaces that may be exposed. Meaning you would ideally run one application within it which has an interface exposed for communication.
What you are looking for is essentially a VM with snapshots which you can provide to different users.

Is there a way to restrict untrusted container scheduler?

I have an application which I'd like to give the privilege to launch short-lived tasks and schedule these as docker containers. I was thinking of doing this simply via docker run.
As I want to make the attack surface as small as possible, I treat the application as untrusted. As such it can potentially run arbitrary docker run commands (if the codebase contained bug or the container was compromised, input was improperly escaped somewhere etc.) against a predefined docker API endpoint.
This is why I'd like to restrict that application (effectively a scheduler) in some ways:
prevent --privileged use
enforce --read-only flag
enforce memory & CPU limits
I looked at couple of options:
selinux
the selinux policies would need to be set on the host level and then propagated inside the containers via --selinux-enabled flag on the daemon level. The scheduler can however override this anyway via run --privileged.
seccomp profiles
these are only applied at a time of launching the container (seccomp flags are available for docker run)
AppArmor
this can (again) be overriden on the scheduler level via --privileged
docker daemon --exec-opts flag
only a single option is actually available for this flag (native.cgroupdriver)
It seems that Docker is designed to trust container schedulers by default.
Does anyone know if this is a design decision?
Is there any other possible solution available w/ current latest Docker version that I missed?
I also looked at Kubernetes and its Limit Ranges & Resource Quotas which can be applied to K8S namespaces, which looked interesting, assuming there's a way to enforce certain schedulers to only use certain namespaces. This would however increase the scope of this problem to operating K8S cluster.
running docker on a unix platform should be compatible with nice Or so I would think at first looking a little more closely it looks like you need somethign like -cpuset-cpus="0,1"
From the second link , "The --cpu-quota looks to be similar to the --cpuset-cpus ... allocate one or a few cores to a process, it's just time managed instead of processor number managed."

How to consist the containers in Docker?

Now I am developing the new content so building the server.
On my server, the base system is the Cent OS(7), I installed the Docker, pulled the cent os, and establish the "WEB SERVER container" Django with uwsgi and nginx.
However I want to up the service, (Database with postgres), what is the best way to do it?
Install postgres on my existing container (with web server)
Build up the new container only for database.
and I want to know each advantage and weak point of those.
It's idiomatic to use two separate containers. Also, this is simpler - if you have two or more processes in a container, you need a parent process to monitor them (typically people use a process manager such as supervisord). With only one process, you won't need to do this.
By monitoring, I mainly mean that you need to make sure that all processes are correctly shutdown if the container receives a SIGSTOP signal. If you don't do this properly, you will end up with zombie processes. You won't need to worry about this if you only have a signal process or use a process manager.
Further, as Greg points out, having separate containers allows you to orchestrate and schedule the containers separately, so you can do update/change/scale/restart each container without affecting the other one.
If you want to keep the data in the database after a restart, the database shouldn't be in a container but on the host. I will assume you want the db in a container as well.
Setting up a second container is a lot more work. You should find a way that the containers know about each other's address. The address changes each time you start the container, so you need to make some scripts on the host. The host must find out the ip-adresses and inform the containers.
The containers might want to update the /etc/hosts file with the address of the other container. When you want to emulate different servers and perform resilience tests this is a nice solution. You will need quite some bash knowledge before you get this running well.
In about all other situations choose for one container. Installing everything in one container is easier for setting up and for developing afterwards. Setting up Docker is just the environment where you want to do your real work. Tooling should help you with your real work, not take all your time and effort.

docker and product versions

I am working for a product company and we do make lot of releases of the product. In the current approach to test multiple releases, we create separate VM and install all infrastructure softwares(db, app server etc) on top of it. Later we deploy the application WARs on the respective VM. Recently, I came across docker and it seems to be much helpful. Hence I started exploring it with the examples listed on the site. But, I am not able to find a way as how docker can be applied to build environment suitable to various releases?
Each product version will have db schema changes.
Each application WARs will have enhancements/defects etc.
Consider below example.
Every month, our company is releasing a new version of software and hence in order to support/fix defects we create VMs per release. Given the fact that if the application's overall size is 2 gb and OS takes close to 5 gb (apart from space it will also take up system resources for extra overhead). The VMs are required to restore any release and test any support issues reported against it. But looking at the additional infrastructure requirements, it seems that its very costly affair.
Can docker have everything required to run an application inside a container/image?
Can docker pack an application which consists of multiple WARs/DB schemas and when started allocate appropriate port?
Will there be any space/memory/speed differences compared to VM and docker assuming above scenario?
Do you think docker is still appropriate solution or should we continue using VMs? Can someone share pointers on how I can achieve above requirements with docker?
tl;dr: Yes, docker can run most applications inside a container.
Docker runs a single process inside each container. When using VMs or real servers, this one process is usually the init system which starts all system services. With docker it is usually your app.
This difference will get you faster startup times for your app (not starting the whole operating system). The trade off is that, if you depend on system services (such as cron, sshd…) you will need to start them yourself. There are some base images that provide a more "VM-like" environment… check phusion's baseimage for instance. To start more than a single process, you can also use a process manager such as supervisord.
Going forward, the recommended (although not required) approach is to start one process in each container (one per application server, one per database server, and so on) and not use containers as VMs.
Docker has no problems allocating ports either. It even has an explicit command on the Dockerfile: EXPOSE. Exposed ports can also be published on the docker host with the --publish argument of run so you don't even need to know the IP assigned to the container.
Regarding used space, you will probably see important savings. Docker images are created by stacking filesystem layers… this means that the common layers are only stored once on the server. In your setup, you will likely only have one copy of the base operating system layer (with VMs, you have a copy on each VM).
On memory you will probably see less significant savings (mostly caused by not starting all the operating system services). Speed is still a subject of research… A few things clear so far is that for faster IO you will need to use docker volumes and that for network heavy use cases you should use host networking. Check the IBM research "An Updated Performance Comparison of Virtual Machines and Linux Containers" for details. Or a summary like InfoQ's.

Resources