Docker for a one shot CLI application - linux

Since I first knew of Docker, I thought it might be the solution for several problems we are usually facing at the lab. I work as a Data Analyst for a small Biology research group. I am using Snakemake for defining the -usually big and quite complex- workflows for our analyses.
From Snakemake, I usually call small scripts in R, Python, or even Command Line Applications such as aligners or annotation tools. In this scenario, it is not uncommon to suffer from dependency hell, hence I was thinking about wrapping some of the tools in Docker containers.
At this moment I am stuck at a point where I do not know if I have chosen technology badly, or if I am not able to properly assimilate all the information about Docker.
The problem is related to the fact that you have to run the Docker tools as root, which is something I would not like to do at all, since the initial idea was to make the dockerized applications available to every researcher willing to use them.
In AskUbuntu, the most voted answer proposes to add the final user to the docker group, but it seems that this is not good for security. In the security articles at Docker, on the other hand, they explain that running the tools as root is good for your security. I have found similar questions at SO, but related to the environment inside the container.
Ok, I have no problem with this, but as every moderate-complexity example I happen to find, it seems it is more oriented towards web-applications development, where the system could initially start the container once and then forget about it.
Things I am considering right now:
Configuring the Docker daemon as a TLS-enabled, TCP remote service, and provide the corresponding users with certificates. Would there be any overhead in running the applications? Security issues?
Create images that only make available the application to the host by sharing a /usr/local/bin/ volume or similar. Is this secure? How can you create a daemonized container that does not need to execute anything? The only example I have found implies creating an infinite loop.
The nucleotid.es page seem to do something similar to what I want, but I have not found any reference to security issues. Maybe they are running all the containers inside a virtual machine, where they do not have to worry about these issues, due to the fact that they do not need to expose the dockerized applications to more people.
Sorry about my verbosity. I just wanted to write down the mental process (possibly flawed, I know, I know) where I am stuck. To sum up:
Is there any possibility to create a dockerized command line application which does not need to be run using sudo, is available for several people in the same server, and which is not intended to run in a daemonized fashion?
Thank you in advance.
Regards.

If users will be able to execute docker run then will be able to control host system just because they could map files from host to container and in container they always could be root if they could use docker run or docker exec. So users should not be able to execute docker directly. I think easiest solution here to create scripts which run docker and these scripts could either have suid flag or users could have sudo access to them.

Related

Separate environments for learning or trying out vs production (sandboxes?)

Can you suggest me a way of separating learning/trying out vs production in the same computer? I am in such a place that I know a lot of JS and production ready skills whilst sometimes require probing or trying out simpler stuff or basics. I presume that a lot of engineers are also in a similar place.
This is the situation I am facing with right now.
I wanted to install redis and configure it while trying out something interested.
In a separate project I needed another clean redis configuration and installation.
In front-end side I tried and installed a few npm packages globally.
At some point I installed python 3.4 now require 3.6
At some point I installed nginx and configured it, now need another configuration and wipe the previous one out,
If I start a big project right now I feel like my computer will eventually let me down due to several attempts I previously done
et cetera, these all create friction on both my learning and exploration
Now, it crosses mind to use separate virtual box installations for trying out things, but this answer is trivial, please suggest something else.
P.S.: I am using Linux Mint.
You can install and use Docker, which is also trivial,
however, if your environment is Linux you can use LXC
There isn't really a single good answer to this sort of question of course; but some things that are generally a good idea are:
use git repos to keep the source "backed up" (obviously your local pc should not be the git server); commit your changes all the time, if you can't hold your breath for as long as the timespan between 2 commits, then you're doing it wrong (or you may have asthma, see a doctor).
Always build your project with there being not just multiple, but a variable amount of "deployments" in mind. That means not hardcoding absolute paths and database names/ports/hostnames and things like that. If your project needs database/api credentials then that should be in a configfile of sorts (or in the env); that configfile should be stored outside the codebase and shouldn't be checked into your git repos (though there can ofcourse be a config template in there).
Always have at least 2 deployments of any project actually deployed. Next to the (obvious) "live"/"production" deployment, which your clients/users use, you want a "dev"-version for yourself where you can freely shit the bed, and for bigger projects you may well want multiple. Each deployment would have its own database, and it's own copy of the code/assets.
It can be useful to deploy everything inside podman or docker containers, that makes it easier to have a near-identical system in both development and production (incase those are different servers), but that may be too much overhead for you.
Have a method (maybe a script) that makes it very easy to deploy updates from your gitrepo or dev-deployment, to the production deployment. Based on your description, i'm guessing if a client tells you she wants some minor cosmetic changes done, you do them straight on the live version; very convenient and fast, but a horrible thing in practice. once you switch from that workflow to having a seperate dev-deploy, you'll feel slowed down by that (which you are), but if you optimize that workflow over time you'll get to the point where you could still deploy cosmetic changes in a minute orso, while having fully separated deployments, it is worth the time investment.
Have a personal devtools git repo or something similar. You're likely using an IDE such as VS code ? Back up your vs code user config in that repo, update it reasonably frequently. Use a texteditor, photoshop/editor, etc etc, same deal. You hear that ticking sound ? that's the bomb that's been placed on your motherboard. It might go off tonight, it might not go off for years, but you never know, always expect it could be today or tomorrow, so have stuff backed up externally and/or on offline media.
There's a lot more but those are some of the basics that spring to mind.
I though Docker was only for containerizing your app with all the installation files and configurations before pushing to the production
Docker is useful whenever you need to configure the runtime environment in an isolated manner. Production, local development, other environments - all need the same runtime. All benefit from the runtime definition and isolation that docker provides. Arguably docker is even more useful in workstation-centric development, than it is in production.
I wanted to install redis and configure it while trying out something interested.
Instead of installing redis on your os directly, run the preexisting docker image for redis.
In a separate project I needed another clean redis configuration and installation.
Instantiate the docker image again and now you have 2 isolated redis servers running locally.
In front-end side I tried and installed a few npm packages globally.
Run your npm code within a nodejs docker container
At some point I installed python 3.4 now require 3.6
Different versions of python is a great use case for docker containers, which will tagged with specific python versions.
At some point I installed nginx and configured it, now need another configuration and wipe the previous one out,
Nginx also has a very useful official container.
If I start a big project right now I feel like my computer will eventually let me down due to several attempts I previously done
Yeah, it gets messy quick. That's why docker is such a great solution. Give every project dedicated services and use docker-compose to simplify the networking and building components. Fight the temptation to use a docker container for more than one service - instead stitch them together with docker networks.
Read https://docs.docker.com/get-started/overview/ to get started with docker.

Forbid npm update in Docker environment

guys,
For various projects, I'm creating single Docker environments. Each Docker container consists of Debian, Nginx, Node.js, etc. and is going to use by developers as well as in production via Google Cloud's Kubernetes. Since the Node.js/module version should be everywhere the same, I would like to restrict the access to certain npm commands (somehow). Often developers work with different Node.js and project modules and that caused a lot of trouble in the past. With the Docker containers, I can provide environments with everything you need for a project. To finish this step, I would like to restrict the npm command execution and only allow arguments like install, test, etc.
Please drop me a comment if you know how to resolve this :)
Cheers
It is almost impossible to limit your developers to run some commands in the container if they have an access to Dockerfiles and can somehow change a build flow.
But, because container providing isolation and you can build a custom container for which application based on your basic image, it can be not a big problem if the version of any package for one application will be changed somehow, as an example in a build step, because it will not affect other apps. They just have different containers.
So, you will not have a problem with compatibility like when you using one server with many application which using a shared environment.
The only one thing you need to do - make sure that nobody change container which you using as a base image.

How to create a docker image of current file and OS system?

I wonder if one can take all the current environment variables settings OS applications and create a simple docker layer on top of it all so that docker container user will not be able to damage host system even if he would remove all files, yet will have abilety to access all installed applications and system settings inside his docker layer?
Technically you might be able to hack together a solution that does this by copying in all data/apps, installing dependencies, re-configuring the applications and providing a bash shell to attach to for a user to play around with but this is not what Docker is designed for at all, not to mention that I would not recommend anyone to attempt this.
I always try to explain docker's usecase as processes which run in isolated containers with defined interfaces that may be exposed. Meaning you would ideally run one application within it which has an interface exposed for communication.
What you are looking for is essentially a VM with snapshots which you can provide to different users.

Do we really need security updates on docker images

This question has come to my mind many times and I just wanted everyone to pitch in with their thoughts on same.
The first thing that comes to my mind is container is not a VM and is almost equivalent to a process running in isolation in the host instance. Then why do we need to keep updating our docker images with security updates? If we have taken sufficient steps to secure our host instance then docker container should be safe. And even if we think from a different direction of multi layered security if the docker host is compromised then then there is no way to stop hacker from accessing all the containers running on the host; no matter how many security updates you did on the docker image.
Are there any particular scenarios which anybody can share where security updates for docker images has really helped?
Now I understand if somebody want's to update apache running in the container however are there reasons to do OS level security updates for images?
an exploit can be dangerous even if it does not give you access to the underlying operating system. Just being able to do something within the application itself can be a big issue. For example, imagine you could inject scripts into Stackoverflow, impersonate other users, or obtain a copy of the complete user database.
just like any software, Docker (and the underlying OS-provided container mechanism) is not perfect and can also have bugs. As such, there may be ways to bypass the isolation and break out of the sandbox.

How to consist the containers in Docker?

Now I am developing the new content so building the server.
On my server, the base system is the Cent OS(7), I installed the Docker, pulled the cent os, and establish the "WEB SERVER container" Django with uwsgi and nginx.
However I want to up the service, (Database with postgres), what is the best way to do it?
Install postgres on my existing container (with web server)
Build up the new container only for database.
and I want to know each advantage and weak point of those.
It's idiomatic to use two separate containers. Also, this is simpler - if you have two or more processes in a container, you need a parent process to monitor them (typically people use a process manager such as supervisord). With only one process, you won't need to do this.
By monitoring, I mainly mean that you need to make sure that all processes are correctly shutdown if the container receives a SIGSTOP signal. If you don't do this properly, you will end up with zombie processes. You won't need to worry about this if you only have a signal process or use a process manager.
Further, as Greg points out, having separate containers allows you to orchestrate and schedule the containers separately, so you can do update/change/scale/restart each container without affecting the other one.
If you want to keep the data in the database after a restart, the database shouldn't be in a container but on the host. I will assume you want the db in a container as well.
Setting up a second container is a lot more work. You should find a way that the containers know about each other's address. The address changes each time you start the container, so you need to make some scripts on the host. The host must find out the ip-adresses and inform the containers.
The containers might want to update the /etc/hosts file with the address of the other container. When you want to emulate different servers and perform resilience tests this is a nice solution. You will need quite some bash knowledge before you get this running well.
In about all other situations choose for one container. Installing everything in one container is easier for setting up and for developing afterwards. Setting up Docker is just the environment where you want to do your real work. Tooling should help you with your real work, not take all your time and effort.

Resources