For some scenarios a clustered file system is just too much. This is, if I got it right, the use case for the data volume container pattern. But even CoreOS needs updates from time to time. If I'd still like to minimise the down time of applications, I'd have to move the data volume container with the app container to an other host, while the old host is being updated.
Are there best practices existing? A solution mentioned more often is the "backup" of a container with docker export on the old host and docker import on the new host. But this would include scp-ing of tar-files to an other host. Can this be managed with fleet?
#brejoc, I wouldn't call this a solution, but it may help:
Alternative
1: Use another OS, which does have clustering, or at least - doesn't prevent it. I am now experimenting with CentOS.
2: I've created a couple of tools that help in some use cases. First tool, retrieves data from S3 (usually artifacts), and is uni-directional. Second tool, which I call 'backup volume container', has a lot of potential in it, but requires some feedback. It provides a 2-way backup/restore for data, from/to many persistent data stores including S3 (but also Dropbox, which is cool). As it is implemented now, when you run it for the first time, it would restore to the container. From that point on, it would monitor the relevant folder in the container for changes, and upon changes (and after a quiet period), it would back up to the persistent store.
Backup volume container: https://registry.hub.docker.com/u/yaronr/backup-volume-container/
File sync from S3: https://registry.hub.docker.com/u/yaronr/awscli/
(docker run yaronr/awscli aws s3 etc etc - read aws docs)
Related
I have two docker containers A and B. On container A a django application is running. On container B a WEBDAV Source is mounted.
Now I want to check from container A if a folder exists in container B (in the WebDAV mount destination).
What is the best solution to do something like that? Currently I solved it mounting the docker socket into the container A to execute cmds from A inside B. I am aware that mounting the docker socket into a container is a security risk for the host and the whole application stack.
Other possible solutions would be to use SSH or share and mount the directory which should be checked. Of course there are further possible solutions like doing it with HTTP requests.
Because there are so many ways to solve a problem like that, I want to know if there is a best practise (considering security, effort to implement, performance) to execute commands from container A in contianer B.
Thanks in advance
WebDAV provides a file-system-like interface on top of HTTP. I'd just directly use this. This requires almost no setup other than providing the other container's name in configuration (and if you're using plain docker run putting both containers on the same network), and it's the same setup in basically all container environments (including Docker Swarm, Kubernetes, Nomad, AWS ECS, ...) and a non-Docker development environment.
Of the other options you suggest:
Sharing a filesystem is possible. It leads to potential permission problems which can be tricky to iron out. There are potential security issues if the client container isn't supposed to be able to write the files. It may not work well in clustered environments like Kubernetes.
ssh is very hard to set up securely in a Docker environment. You don't want to hard-code a plain-text password that can be easily recovered from docker history; a best-practice setup would require generating host and user keys outside of Docker and bind-mounting them into both containers (I've never seen a setup like this in an SO question). This also brings the complexity of running multiple processes inside a container.
Mounting the Docker socket is complicated, non-portable across environments, and a massive security risk (you can very easily use the Docker socket to root the entire host). You'd need to rewrite that code for each different container environment you might run in. This should be a last resort; I'd consider it only if creating and destroying containers would need to be a key part of this one container's operation.
Is there a best practise to execute commands from container A in contianer B?
"Don't." Rearchitect your application to have some other way to communicate between the two containers, often over HTTP or using a message queue like RabbitMQ.
One solution would be to mount one filesystem readonly on one container and read-write on the other container.
See this answer: Docker, mount volumes as readonly
Is it good practice for node.js service containers running under AWS ECS to mount a shared node_modules volume persisted on EFS? If so, what's the best way to pre-populate the EFS before the app launches?
My front-end services run a node.js app, launched on AWS Fargate instances. This app uses many node_modules. Is it necessary for each instance to install the entire body of node_modules within its own container? Or can they all mount a shared EFS filesystem containing a single copy of the node_modules?
I've been migrating to AWS Copilot to orchestrate, but the docs are pretty fuzzy on how to pre-populate the EFS. At one point they say, "we recommend mounting a temporary container and using it to hydrate the EFS, but WARNING: we don't recommend this approach for production." (Storage: AWS Copilot Advanced Use Cases)
Thanks for the question! This is pointing out some gaps in our documentation that have been opened as we released new features. There is actually a manifest field, image.depends_on which mitigates the issue called out in the docs about prod usage.
To answer your question specifically about hydrating EFS volumes prior to service container start, you can use a sidecar and the image.depends_on field in your manifest.
For example:
image:
build: ./Dockerfile
depends_on:
bootstrap: success
storage:
volumes:
common:
path: /var/copilot/common
read_only: true
efs: true
sidecars:
bootstrap:
image: public.ecr.aws/my-image:latest
essential: false #Allows the sidecar to run and terminate without the ECS task failing
mount_points:
- source_volume: common
path: /var/copilot/common
read_only: false
On deployment, you'd build and push your sidecar image to ECR. It should include either your packaged data or a script to pull down the data you need, then move it over into the EFS volume at /var/copilot/common in the container filesystem.
Then, when you next run copilot svc deploy, the following things will happen:
Copilot will create an EFS filesystem in your environment. Each service will get its own isolated access point in EFS, which means service containers all share data but can't see data added to EFS by other services.
Your sidecar will run to completion. That means that all currently running services will see the changes to the EFS filesystem whenever a new copy of the task is deployed unless you specifically create task-specific subfolders in EFS in your startup script.
Once the sidecar exits successfully, the new service container will come up on ECS and operate as normal. It will have access to the EFS volume which will contain the latest copy of your startup data.
Hope this helps.
Is it good practice for node.js service containers running under AWS
ECS to mount a shared node_modules volume persisted on EFS?
Asking if it is "good" or not is a matter of opinion. It is generally a common practice in ECS however. You have to be very cognizant of the IOPS your application is going to generate against the EFS volume however. Once an EFS volume runs out of burst credits it can really slow down and impact the performance of your application.
I have not ever seen an EFS volume used to store node_modules before. In all honesty it seems like a bad idea to me. Dependencies like that should always be bundled in your Docker image. Otherwise it's going to be difficult when it comes time to upgrade those dependencies in your EFS volume, and may require down-time to upgrade.
If so, what's the best way to pre-populate the EFS before the app
launches?
You would have to create the initial EFS volume and mount it somewhere like an EC2 instance, or another ECS container, and then run whatever commands necessary in that EC2/ECS instance to copy your files to the volume.
The quote in your question isn't present on the page you linked, so it's difficult to say exactly what other approach would the Copilot team would recommend.
I programmed an API with nodejs and express like million others out there and it will go live in a few weeks. The API currently runs on a single docker host with one volume for persistent data containing images uploaded by users.
I'm now thinking about scalability and a high availability setup where the question about network volumes come in. I've read a lot about NFS volumes and potentially the S3 Driver for a docker swarm.
From the information I gathered, I sorted out two possible solutions for the swarm setup:
Docker Volume Driver
I could connect each docker host either to an S3 Bucket or EFS Storage with the compose file
Connection should work even if I move VPS Provider
Better security if I put a NFS storage on the private part of the network (no S3 or EFS)
API Multer S3 Middleware
No attached volume required since the S3 connection is done from within the container
Easier swarm and docker management
Things have to be re-programmed and a few files needs to be migrated
On a GET request, the files will be provided by AWS directly instead of the API
Please, tell me your opposition on this. Am I getting this right or do I miss something? Which route should I take? Is there something to consider with latency or permissions when mounting from different hosts?
Tipps on S3, EFS are definitely welcome, since I have no knowledge yet.
I would not recommend saving to disk, instead use S3 API directly - create buckets and write in your app code.
If you're thinking of mounting a single S3 bucket as your drive there are severe limitations with that. The 5Gb limit. Anytime you modify contents in any way the driver will reupload the entire bucket. If there's any contention it'll have to retry. Years ago when I tried this the fuse drivers weren't stable enough to use as part of a production system, they'd crash and you have to remount. It was a nice idea but could only be used as an ad hoc kind of thing on the command line.
As far as NFS for the love of god don't do this to yourself you're taking on responsibility for this on yourself.
EFS can't really comment, by the time it was available most people just learned to use S3 and it is cheaper.
I have a docker container in a hyperledger fabric setup. This stores all user credentials.
What happens if this container or machine goes down and is not available?
If I bring up a backup container, how can the entire state be restored?
I tried doing the commit option but on bringing it back up, it does not work as expected. More likely the CA functionality uses some container id to track since a CA server is highly secretive price of the setup.
Overall, this is more of a strategy question, there are many approaches to backing up critical data - and you may or may not choose one that is specific to Docker containers.
On the technical questions that you asked:
If the container 'goes down', its files remain intact and will be there when it is restarted (that is, if you re-start the same container and don't create a new one). If the machine goes down, the container will come 'back up' if and when the machine is restarted. Depending on how you created the container, you may need to start it yourself or Docker may restart it automatically. If it went down hard and won't come back - you lose all data on it, including files in containers.
You can create a 'backup container' (or more precisely, a backup image), but if it was left on the same machine it will die with that machine. You will need to save it elsewhere (e.g., with 'docker push', though I don't recommend that unless you have your own docker registry to use for backups).
If you do 'commit', this simply creates a new container image, which has the files as they were when you did the commit. You should commit a stopped container, if you want a proper copy of all files - I don't think you can do it while there are active open files. This copy lives on the same machine where the container was, so you still need to save it away from that machine to protect it from loss. Note that to use the saved image, you should tag it and use it to start a new container. The image from which you started the old container is untouched by the 'commit' (using that old image will start the container as it was then, when you first created it).
IMO, an option better than 'commit' (which saves the entire container file system, along with all the junk like logs and temp. files) is to mount a docker volume to the path where important files are stored (e.g., /var/lib/mysql, if you run a mysql database) - and back up only that volume.
Is is possible, or even advisable to use and EBS instance that remains at Instance Termination, to store database/website files, and reattach to a new Amazon instance in case of failure? OR should I backup a volume-bundle to S3? Also, I need an application to accelerate terminal window functions intelligently. Can you tell I'm a linux NOob?
We do this with our Nexus installation - the data is stored on a separate EBS instance that's regularly snapshotted but the root disk isn't (since we can use Puppet to create a working Nexus instance using the latest base AMI, Java, Tomcat and Nexus versions). The one drawback of this approach (vs your other approach of backing up to S3) is that you can't retrieve it outside of AWS if needed - if that is an important use case I'd recommend either uploading a volume bundle or a .tar.gz backup to S3.
However, in your case if you have a single EBS-backed EC2 instance which is your CMS server you could run it with a large root volume and keep that regularly backed up (either using EBS Snapshots or backing up a .tar.gz to S3) - if you're not particularly familiar with Linux that'll likely be the easiest way to make sure all your data is backed up (and if you need to extract the data only you can always do this by attaching that volume (or an instantiation of a snapshot of it) to another machine - you'd also have access to all the config files which may be of use...
Bear in mind that if you only want to run your server some of the time you can always Stop the instance rather than Terminate it - the EBS Instances will remain. Once you take a snapshot your data is safe - if part of an EBS Instance fails but it hasn't been modified since the last snapshot then AWS will transparently restore it with the EBS Snapshot data.