Running command on EC2 launch and shutdown in auto-scaling group - linux

I'm running a Docker swarm deployed on AWS. The setup is an auto-scaling group of EC2 instances that each act as Docker swarm nodes.
When the auto-scaling group scales out (spawns new instance) I'd like to run a command on the instance to join the Docker swarm (i.e. docker swarm join ...) and when it scales in (shuts down instances) to leave the swarm (docker swarm leave).
I know I can do the first one with user data in the launch configuration, but I'm not sure how to act on shutdown. I'd like to make use of lifecycle hooks, and the docs mention I can run custom actions on launch/terminate, but it is never explained just how to do this. It should be possible to do without sending SQS/SNS/Cloudwatch events, right?
My AMI is a custom one based off of Ubuntu 16.04.
Thanks.

One of the core issues is that removing a node from a Swarm is currently a 2 or 3-step action when done gracefully, and some of those actions can't be done on the node that's leaving:
docker node demote, if leaving-node is a manager
docker swarm leave on leaving-node
docker swarm rm on a manager
This step 3 is what's tricky because it requires you to do one of three things to complete the removal process:
Put something on a worker that would let it do things on a manager remotely (ssh to a manager with sudo perms, or docker manager API access). Not a good idea. This breaks the security model of "workers can't do manager things" and greatly increases risk, so not recommended. We want our managers to stay secure, and our workers to have no control or visibility into the swarm.
(best if possible) Setup an external solution so that on a EC2 node removal, a job is run to SSH or API into a manager and remove the node from swarm. I've seen people do this, but can't remember a link/repo for full details on using a lambda, etc. to deal with the lifecycle hook.
Setup a simple cron on a single manager (or preferably as a manager-only service running a cron container) that removes workers that are marked down. This is a sort of blunt approach and has edge cases where you could potentially delete a node that's existing but considered down/unhealthy by swarm, but I've not heard of that happening. If it was fancy, it could maybe validate with AWS that node is indeed gone before removing.
WORST CASE, if a node goes down hard and doesn't do any of the above, it's not horrible, just not ideal for graceful management of user/db connections. After 30s a node is considered down and Service tasks will be re-created on healthy nodes. A long list of workers marked down in the swarm node list doesn't have an effect on your Services really, it's just unsightly (as long as there are enough healthy workers).
THERE'S A FEATURE REQUEST in GitHub to make this removal easier. I've commented on what I'm seeing in the wild. Feel free to post your story and use case in the SwarmKit repo.

Related

Correct container technolgy on azure for long running service

I want to run a docker container which hosts a server which is going to be long running (e.g. 24x7).
Initially I looked at Azure Container Instances (ACI) and whilst these seems to fit the bill perfectly I've been advised they're not designed for long running containers, also they can prove to be quite expensive to run all the time compared to a basic VM.
So I've been looking at what else I should run this as:
AKS - Seems overkill for just one docker container
App Service for containers - my container doesn't have an http endpoint so I believe I will have issues with things like health checks
VM - this seems all a bit manual as I'd really like not to deal with VM maintenance and I'm also unsure I can use CI/CD techniques to build / spin up-down / do releases on a VM image (we're using terraform to deploy infra).
Are there any best practise guides on this, I've tried searching but I'm not finding anything relevant, I'm assuming I'm missing some key term to get going with this!
TIA
ACI is not designed for long-running (uninterrupted) processes have a look here
Recommendation is to use AKS where you can fully manage lifecycle of your machines or just use VMs

Kubernetes cluster Nodes not creating automatically when other lost in Kubespray

I have successfully deployed a multi master Kubernetes cluster using the repo https://github.com/kubernetes-sigs/kubespray and everything works fine. But when I stop/terminate a node in the cluster, new node is not joining to the cluster.I had deployed kubernetes using KOPS, but the nodes were created automatically, when one deletes. Is this the expected behaviour in kubespray? Please help..
It is expected behavior because kubespray doesn't create any ASGs, which are AWS-specific resources. One will observe that kubespray only deals with existing machines; they do offer some terraform toys in their repo for provisioning machines, but kubespray itself does not get into that business.
You have a few options available to you:
Post-provision using scale.yml
Provision the new Node using your favorite mechanism
Create an inventory file containing it, and the etcd machines (presumably so kubespray can issue etcd certificates for the new Node
Invoke the scale.yml playbook
You may enjoy AWX in support of that.
Using plain kubeadm join
This is the mechanism I use for my clusters, FWIW
Create a kubeadm join token using kubeadm token create --ttl 0 (or whatever TTL you feel comfortable using)
You'll only need to do this once, or perhaps once per ASG, depending on your security tolerances
Use the cloud-init mechanism to ensure that docker, kubeadm, and kubelet binaries are present on the machine
You are welcome to use an AMI for doing that, too, if you enjoy building AMIs
Then invoke kubeadm join as described here: https://kubernetes.io/docs/setup/independent/high-availability/#install-workers
Use a Machine Controller
There are plenty of "machine controller" components that aim to use custom controllers inside Kubernetes to manage your node pools declaratively. I don't have experience with them, but I believe they do work. That link was just the first one that came to mind, but there are others, too
Our friends over at Kubedex have an entire page devoted to this question

Microservices on docker - architecture

I am building a micro-services project using docker.
one of my micro-services is a listener that should get data from various number of sources.
What i'm trying to achieve is the ability to start and stop getting data from sources dynamically.
For example in this drawing, i have 3 sources connected to 3 dockers.
My problem starts because i need to create another docker instance when a new source is available. In this example lets say source #4 is now available and i need to get his data (I know when a new source became available) but i want it to be scaled automatically (with source #4 information for listening)
I came up with two solutions, each has advantages and disadvantages:
1) Create a docker pool of a large number of docker running the listener service and every time a new source is available send a message (using rabbitmq but i think less relevant) to an available docker to start getting data.
in this solution i'm a little bit afraid of the memory consumption of the docker images running for no reason - but it is not a very complex solution.
2) Whenever a new source is becoming available create a new docker (with different environment variables)
With this solution i have a problem creating the docker.
At this moment i have achieved this one, but the service that is starting the dockers (lets call it manager) is just a regular nodejs application that is executing commands on the same server - and i need it to be inside a docker container also.
So the problem here is that i couldn't manage create an ssh connection from the main docker to create my new Docker.
I am not quite sure that both of my solutions are on track and would really appreciate any suggestions for my problem.
Your question is a bit unclear, but if you just want to scale a service horizontally you should look into a container orchestration technology that will allow you that - For example Kubernetes. I recommend reading the introduction.
All you would need to do for adding additional service containers is to update the number of desired replicas in the Deployment configuration. For more information read this.
Using kubernetes (or short k8s) you will benefit from deployment automation, self healing and service discovery as well as load balancing capabilities in addition to the horizontal scalability.
There are other orchestration alternatives, too ( e.g. Docker Swarm), but I would recommend to look into kubernetes first.
Let me know if that solves your issue or if you have additional requirements that weren't so clear in your original question.
Links for your follow up questions:
1 - Run kubectl commands inside container
2 - Kubernetes autoscaling based on custom metrics
3 - Env variables in Pods

Docker container management solution

We've NodeJS applications running inside docker containers. Sometimes, if any process gets locked down or due to any other issue the app goes down and we've to manually login to each container n restart the application. I was wondering
if there is any sort of control panel that allow us to easily and quickly restart those and see the whole health of the system.
Please Note: we can't use --restart flag because essentially application doesn't exist with exist code. It run into problem like some process gets blocked, things are just getting bogged down vs any crashes and exist codes. That's why I don't think restart policy will help in this scenario.
I suggest you consider using the new HEALTHCHECK directive in Docker 1.12 to define a custom check for your locking condition. This feature can be combined with the new Docker swarm service feature to specify how many copies of your container you want to have running.

How to consist the containers in Docker?

Now I am developing the new content so building the server.
On my server, the base system is the Cent OS(7), I installed the Docker, pulled the cent os, and establish the "WEB SERVER container" Django with uwsgi and nginx.
However I want to up the service, (Database with postgres), what is the best way to do it?
Install postgres on my existing container (with web server)
Build up the new container only for database.
and I want to know each advantage and weak point of those.
It's idiomatic to use two separate containers. Also, this is simpler - if you have two or more processes in a container, you need a parent process to monitor them (typically people use a process manager such as supervisord). With only one process, you won't need to do this.
By monitoring, I mainly mean that you need to make sure that all processes are correctly shutdown if the container receives a SIGSTOP signal. If you don't do this properly, you will end up with zombie processes. You won't need to worry about this if you only have a signal process or use a process manager.
Further, as Greg points out, having separate containers allows you to orchestrate and schedule the containers separately, so you can do update/change/scale/restart each container without affecting the other one.
If you want to keep the data in the database after a restart, the database shouldn't be in a container but on the host. I will assume you want the db in a container as well.
Setting up a second container is a lot more work. You should find a way that the containers know about each other's address. The address changes each time you start the container, so you need to make some scripts on the host. The host must find out the ip-adresses and inform the containers.
The containers might want to update the /etc/hosts file with the address of the other container. When you want to emulate different servers and perform resilience tests this is a nice solution. You will need quite some bash knowledge before you get this running well.
In about all other situations choose for one container. Installing everything in one container is easier for setting up and for developing afterwards. Setting up Docker is just the environment where you want to do your real work. Tooling should help you with your real work, not take all your time and effort.

Resources