What happens to the data when a docker container crashes

What happens to the data when a docker container crashes - node.js

We are trying to move onto Docker for deployment purpose. Our architecture requires to have a redis, a mongodb and several nodejs and java based Docker containers.
So my question is, if suppose the redis/mongodb docker container crashes, do we loose all the data that it had?
We want isolation, but at the same time we don't want to loose data due to malfunction/crashes. Is this even possible to achieve this with Docker or is it something not relevant here?
Any help or comments will be greatly appreciated.
Thanks

The answer is: YES - If a container crashes so that it can not be restored/restarted the data is gone. But, normally containers can be restarted and continued - in that case the data is not lost.
E.g. - the following sequence from the docker docs illustrates how container startup work. Note that the data is not lost here until the container is removed.
# Start a new container
$ JOB=$(sudo docker run -d ubuntu /bin/sh -c "while true; do echo Hello world; sleep 1; done")
# Stop the container
$ sudo docker stop $JOB
# Start the container
$ sudo docker start $JOB
# Restart the container
$ sudo docker restart $JOB
# SIGKILL a container
$ sudo docker kill $JOB
# Remove a container
$ sudo docker stop $JOB # Container must be stopped to remove it
$ sudo docker rm $JOB
Whenever you execute a docker run command you start a new container with fresh data. The data is based on the image you provide and that data is consistent (unless you rebuild the image of course).
So, how should you setup docker to keep your data intact? I think that a good approach is to keep the important data mounted in a volume. Volumes are simply external folders (i.e. a folder from the host system) that holds the data and this data will not be lost even if you reinstall the entire docker daemon.
Example:
docker run -v /some/local/dir:/some/dir/in/redis-container my/redis
This mounts the host folder /some/local/dir as the folder /some/dir/in/redis-container in the running container. If e.g. redis stores its data in that folder you're all set to go and reboots/crashes can be survived.
More info about docker volumes check out the docs. Another great article is the also from the docker website, Managing Data in Containers.
EDIT: After comments I clarified the answer - the data is lost if the container can't be restarted (total crash).

If a container crashes, you won't lose any data - at least not more than with a regular application crash.
The container itself is unlikely to crash (after all, it's only an envelope for your application(s)). Your application(s) running in a container can crash, and if they do, their data will still be on the container filesystem. All you have to do in such a situation is to restart the failed container.
One case where you could lose something is if you explicitly tell Docker to remove the container when it's not running anymore (--rm option).
That being said, for IO-intensive applications such as databases, it is highly recommended to host data on Docker volumes, for performance reasons (a docker volume is a traditional filesystem, while the container default filesystem is a stack of layers and will be slower).

Related

The data is getting lost whenever I restart the docker/elk image

I'm using docker/elk image to display my data in kibana dashboard (Version 6.6.0) and It works pretty good. I started the service like using below command.
Docker Image git repo:
https://github.com/caas/docker-elk
Command:
sudo docker-compose up --detach
Expecting that it will run background, and did as expected. After two days the server up and running the and third day the kibana alone getting stopped. and Used below command to make it up and running.
sudo docker run -d <Docer_image_name>
It's up and running when I use docker ps command. But when I tried to hit the kibana server in chrome browser it says not reachable.
So I just used to below command to restart the service.
sudo docker-compose down
After that I can see kibana server in chrome browser which is up and running but I do see all my data is lost.
I used below URL in jenkins to collect the data.
`http://hostname:9200/ecdpipe_builds/extern`al
Any idea how can I resolve this issue?

I did not see the persistent storage configuration the image you mentioned in their GitHub docker-compose file.
This is common to lost data in case of docker container if you did not provide persistent storage configuration. so docker-compose down may cause to lost you data if there is no persistent configuration docker-compose file.
Persisting log data
In order to keep log data across container restarts, this image mounts
/var/lib/elasticsearch — which is the directory that Elasticsearch
stores its data in — as a volume.
You may however want to use a dedicated data volume to persist this
log data, for instance to facilitate back-up and restore operations.
One way to do this is to mount a Docker named volume using docker's -v
option, as in:
$ sudo docker run -p 5601:5601 -p 9200:9200 -p 5044:5044 \
-v elk-data:/var/lib/elasticsearch --name elk sebp/elk
This command mounts the named volume elk-data to
/var/lib/elasticsearch (and automatically creates the volume if it
doesn't exist; you could also pre-create it manually using docker
volume create elk-data).
So you can set these paths in your docker-compose file accordingly. Here is the link that you can check elk-docker-persisting-log-data

Use docker volume or file location as persistant space

Stop VM with MongoDB docker image without losing data

I have installed the official MongoDB docker image in a VM on AWS EC2, and the database has already data on it. If I stop the VM (to save expenses overnight), will I lose all the data contained in the database? How can I make it persistent in those scenarios?

There are multiple options to achieve this but the 2 most common ways are:
Create a directory on your host to mount the data
Create a docker
volume to mount the data
1) Create a data directory on a suitable volume on your host system, e.g. /my/own/datadir. Start your mongo container like this:
$ docker run --name some-mongo -v /my/own/datadir:/data/db -d mongo:tag
The -v /my/own/datadir:/data/db part of the command mounts the /my/own/datadir directory from the underlying host system as /data/db inside the container, where MongoDB by default will write its data files.
Note that users on host systems with SELinux enabled may see issues with this. The current workaround is to assign the relevant SELinux policy type to the new data directory so that the container will be allowed to access it:
$ chcon -Rt svirt_sandbox_file_t /my/own/datadir
The source of this is the official documentation of the image.
2) Another possibility is to use a docker volume.
$ docker volume create my-volume
This will create a docker volume in the folder /var/lib/docker/volumes/my-volume. Now you can start your container with:
docker run --name some-mongo -v my-volume:/data/db -d mongo:tag
All the data will be stored in the my-volume so in the folder /var/lib/docker/my-volume. So even when you delete your container and create a new mongo container linked with this volume your data will be loaded into the new container.
You can also use the --restart=always option when you perform your initial docker run command. This mean that your container automatically will restart after a reboot of your VM. When you've persisted your data too there will be no difference between your DB before or after the reboot.

ownership of files that are written by the docker container to mounted volume

I'm using non-root user on a secured env to run stock DB docker container (elasticsearch). Of course - I want the data to be mounted so I won't lose it when the container is destroyed.
The problem is that this container writes to that volume with root ownership, and then the host doesn't have permissions to move/rm them.
I know that most docker images use root user from inside, but how can I control the file ownership of the hosting machine?

You can create a data container docker create -v /usr/share/elasticsearch/data --name esdata elasticsearch /bin/true, then use it in your container docker run -d --volumes-from esdata --name some-elasticsearch elasticsearch.
This is a prefer data pattern for docker, you can find out more in this docker page.

To answer you question use "docker run --user '$(id -u)' ..." it will run program within container with current user id, then you might have the same question as I did.
I answered it in some way I hope it might be useful.
Docker with '--user' can not write to volume with different ownership

Does Docker purge old/unused images?

Many organizations are using Docker specifically for the advantage of being able to seamlessly roll back deployed software. For instance, given an image called newapi, deployment looks like this:
# fetch latest
docker pull newapi:latest
# stop old one and terminate it
docker stop -t 10 newapi-container
docker rm -f newapi-container
# start new one
docker run ... newapi:latest
If something goes wrong, we can revert back to the previous version like this:
docker stop -t 10 newapi-container
docker rm -f newapi-container
docker run ... newapi:0.9.2
The problem becomes that over time, our local Docker images index will get huge. Does Docker automatically get rid of old, unused images from its local index to save disk space, or do I have to manually manage these?

It doesn't do it for you but you can use the following commands to do it manually.
#!/bin/bash
# Delete all containers
sudo docker rm $(sudo docker ps -a -q)
# Delete all images
sudo docker rmi $(sudo docker images -q)
The documentation relating to the docker rm and rmi commands is here: https://docs.docker.com/reference/commandline/cli/#rm
The additional commands are standard bash.

Update Sept. 2016 for docker upcoming docker 1.13: PR 26108 and commit 86de7c0 introduce a few new commands to help facilitate visualizing how much space the docker daemon data is taking on disk and allowing for easily cleaning up "unneeded" excess.
docker system prune will delete ALL dangling data (i.e. In order: containers stopped, volumes without containers and images with no containers). Even unused data, with -a option.
You also have:
docker container prune
docker image prune
docker network prune
docker volume prune
Taken from here.

How to start docker container as server

I would like to run a docker container that hosts a simple web application, however I do not understand how to design/run the image as a server. For example:
docker run -d -p 80:80 ubuntu:14.04 /bin/bash
This will start and immediately shutdown the container. Instead we can start it interactively:
docker run -i -p 80:80 ubuntu:14.04 /bin/bash
This works, but now I have to keep open the interactive shell for every container that is running? I would rather just start it and have it running in the background. A hack would be using a command that never returns:
docker run -d -p 80:80 {image} tail -F /var/log/kern.log
But now I cannot connect to the shell anymore, to inspect what is going on if the application is acting up.
Is there a way to start the container in the background (as we would do for a vm), in a way that allows for attaching/detaching a shell from the host? Or am I completely missing the point?

The final argument to docker run is the command to run within the container. When you run docker run -d -p 80:80 ubuntu:14.04 /bin/bash, you're running bash in the container and nothing more. You actually want to run your web application in a container and to keep that container alive, so you should do docker run -d -p 80:80 ubuntu:14.04 /path/to/yourapp.
But your application probably depends on some configuration in order to run. If it reads its configuration from environment variables, you can use the -e key=value arguments with docker run. If your application needs a configuration file to be in place, you should probably use a Dockerfile to set up the configuration first.
This article provides a nice complete example of running a node application in a container.

Users of docker tend to assume a container to be a complete a VM, while the docker design concept is more focused on optimal containerization rather than mimic the VM within a container.
Both are correct however some implementation details are not easy to get familiar with in the beginning. I am trying to summarize some of the implementational difference in a way that is easier to understand.
SSH
SSH would be the most straight-forward way to go inside a Linux VM (or container), however many dockerized templates do not have ssh server installed. I believe this is because of optimization & security reasons for the container.
docker attach
docker attach can be handy if working as out-of-the-box. However as of writing it is not stable - https://github.com/docker/docker/issues/8521. Might be associated with SSH set up, but not sure when it is completely fixed.
docker recommended practices (nsenter and etc)
Some alternatives (or best practices in some sense) recommended by Docker at https://blog.docker.com/2014/06/why-you-dont-need-to-run-sshd-in-docker/
This practice basically separates out mutable elements out of a container and maps them to some places in a docker host so they can be manipulated from outside of container and/or persisted. Could be a good practice in production environment but not now when more docker related projects are around dev and staging environment.
bash command line
"docker exec -it {container id} bash" cloud be very handy and practical tool to get in to the machine.
Some basics
"docker run" creates a new container so previous changes will not be saved.
"docker start" will start an existing container so previous changes will still be in the container, however you need to find the correct container-id among many with a same image-id. Need to "docker commit" to suppress versions if wanted.
Ctrl-C will stop the container when exiting. You will want to append "&" at the end so the container can run background and gives you the prompt when hitting enter key.

To the original question, you can tail some file, like you mentioned, to keep the process running.
To reach the shell, instead of "attach", you have two options:
docker exec -it <container_id> /bin/bash
Or
run ssh daemon in the container, port map the ssh and then ssh to container.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string