How to create a livenessprobe for a node.js container that is not a server? - node.js

I have to create a readyness and liveness probe for a node.js container (docker) in kubernetes. My problem is that the container is NOT a server, so I cannot use an http request to see if it is live.
My container runs a node-cron process that download some csv file every 12 h, parse them and insert the result in elasticsearch.
I know I could add express.js but I woud rather not do that just for a probe.
My question is:
Is there a way to use some kind of liveness command probe? If it is possible, what command can I use?
Inside the container, I have pm2 running the process. Can I use it in any way for my probe and, if so, how?

Liveness command
You can use a Liveness command as you describe. However, I would recommend to design your job/task for Kubernetes.
Design for Kubernetes
My container runs a node-cron process that download some csv file every 12 h, parse them and insert the result in elasticsearch.
Your job is not executing so often, if you deploy it as a service, it will take up resources all the time. And when you write that you want to use pm2 for your process, I would recommend another design. As what I understand, PM2 is a process manager, but Kubernetes is also a process manager in a way.
Kubernetes native CronJob
Instead of handling a process with pm2, implement your process as a container image and schedule your job/task with Kubernetes CronJob where you specify your image in the jobTemplate. With this design, you don't have any livenessProbe but your task will be restarted if it fails, e.g. fail to insert the result to elasticSearch due to a network problem.

First, you should certainly consider a Kubernetes CronJob for this workload. That said, it may not be appropriate for your job, for example if your job takes the majority of the time between scheduled runs to run, or you need more complex interactions between error handling in your job and scheduling. Finally, you may even want a liveness probe running for the container spawned by the CronJob if you want to check that the job is making progress as it runs -- this uses the same syntax as you would use with a normal job.
I'm less familiar with pm2, though I don't think you should need to use the additional job management inside of Kubernetes, which should already provide most of what you need.
That said, it is certainly possible to use an arbitrary command for your liveness probe, and as you noted it is even explicitly covered in the kubernetes liveness/rediness probe documentation
You just add an exec member to the livenessProbe stanza for the container, like so:
livenessProbe:
exec:
command:
- cat
- /tmp/healthy
initialDelaySeconds: 5
periodSeconds: 5
If the command returns 0 (e.g. succeeds), then the kubelet considers the container to be alive and healthy. (e.g. in this trivial example, the container is considered healthy only while /tmp/healthy exists).
In your case, I can think of several possibilities to use. As one example, the job could probably be configured to drop a sentinel file that indicates it is making progress in some way. For example, append the name and timestamp of the last file copied. The liveness command would then be a small script that could read that file and ensure that there has been adequate progress (e.g. in the cron job case, that a file has been copied within the last few minutes).
Readiness probes probably don't make sense in the context of the service you describe, since they're more about not sending the application traffic, but they can also have a similar stanza, just for readinessProbe rather than livenessProbe.

Related

Docker containers runs great locally. Now I need it on schedule in cloud

I've containerized a logic that I have to run on a schedule. If I do my docker run locally (whatever my image is local or it is using the one from the hub) everything works great.
Now I need though to run that "docker run" on a scheduled base, on the cloud.
Azure would be preferred, but honestly, I'm looking for the easier and cheapest way to achieve this goal.
Moreover, my schedule can change, so maybe today that job runs once a day, in the future that can change.
What do you suggest?
You can create an Azure Logic app to trigger the start of a Azure Container Instance. As you have a "run-once" (every N minute/hour/..) container, the restart-policy should be set to "Never", so that the container only executes and then stops after the scheduling.
The Logic app needs to have the permissions to start the Container, so add a role assignment on the ACI to the managed identity of the logic App.
Screenshot shows the workflow with a Recurrence trigger, that starts an existing container every minute.
Should be quite cheap and utilizes only Azure services, without any custom infrastructure
Professionally I used 4 ways to run cron jobs/ scheduled builds. I give a quick summary of all with it pros/cons.
GitLab scheduled builds (free)
My personal preference would be to setup a scheduled pipeline in GitLab. Simply add the script to a .gitlab-ci.yml, configure the scheduled build and you are done. This is the lightweight option and works in most cases, if the execution time is not too long. I used this approach for scraping simple pages.
Jenkins scheduled builds (not-free)
I used the same approach as GitLab with Jenkins. But Jenkins comes with more overhead and you have to configure the entire Jenkins on multiple machines.
Kubernetes CronJob (expensive)
My third approach would be using a kubernetes cronjob. However, I would only use this if I consume a lot of memory/ram, or have a long execution time. I used this approach for dumping really large data sets.
Run a cron job from a container (expensive)
My last option would be to deploy a docker container on either a VM or a Kubernetes cluster and configure a cron job from within that docker container. You can even use docker-in-docker for that. This gives maximum flexibility, but comes with some challenges. Personally I like the separation of concerns when it comes to down-times etc. That's why never run a cron job as main process.

How to audit commands run by user inside a container in K8s

I want to audit commands that are being run by a user inside a running pod.
I know that kube-apiserver supports audit policies that allows you to log every request that is being done towards the API but as far as I know the API audit only records the exec command and not the inner commands run afterwards.
An approach that I thought is to have a sidecar container with auditbeat running but it's too heavy and the user might be able to kill it.
The container should run a single process inside. It is not recommended to run a command inside container exception for testing. Most of our image doesn't have any type of shell.
If you have to spawn a shell and run a command inside, Then you need to think about is it possible to run that outside container? If the main process is terminated but your shell commands are running in a container then k8s might not terminate that pod and recreate a new pod which might impact HA
There are some commercial products that allow to do this. Few weeks ago I did a PoC for one of them.
The way it's implemented is that their product running as a pod (with 1 container inside) on the host level (host namespace / HostPID) and tracks usage of Docker daemon.

Cloud-based node.js console app needs to run once a day

I'm looking for what I would assume is quite a standard solution: I have a node app that doesn't do any web-work - simply runs and outputs to a console, and ends. I want to host it, preferably on Azure, and have it run once a day - ideally also logging output or sending me the output.
The only solution I can find is to create a VM on Azure, and set a cron job - then I need to either go fetch the debug logs daily, or write node code to email me the output. Anything more efficient available?
Azure Functions would be worth investigating. It can be timer triggered and would avoid the overhead of a VM.
Also I would investigate Azure Container Instances, this is a good match for their use case. You can have a container image that you run on an ACI instance that has your Node app. https://learn.microsoft.com/en-us/azure/container-instances/container-instances-tutorial-deploy-app

Running command on EC2 launch and shutdown in auto-scaling group

I'm running a Docker swarm deployed on AWS. The setup is an auto-scaling group of EC2 instances that each act as Docker swarm nodes.
When the auto-scaling group scales out (spawns new instance) I'd like to run a command on the instance to join the Docker swarm (i.e. docker swarm join ...) and when it scales in (shuts down instances) to leave the swarm (docker swarm leave).
I know I can do the first one with user data in the launch configuration, but I'm not sure how to act on shutdown. I'd like to make use of lifecycle hooks, and the docs mention I can run custom actions on launch/terminate, but it is never explained just how to do this. It should be possible to do without sending SQS/SNS/Cloudwatch events, right?
My AMI is a custom one based off of Ubuntu 16.04.
Thanks.
One of the core issues is that removing a node from a Swarm is currently a 2 or 3-step action when done gracefully, and some of those actions can't be done on the node that's leaving:
docker node demote, if leaving-node is a manager
docker swarm leave on leaving-node
docker swarm rm on a manager
This step 3 is what's tricky because it requires you to do one of three things to complete the removal process:
Put something on a worker that would let it do things on a manager remotely (ssh to a manager with sudo perms, or docker manager API access). Not a good idea. This breaks the security model of "workers can't do manager things" and greatly increases risk, so not recommended. We want our managers to stay secure, and our workers to have no control or visibility into the swarm.
(best if possible) Setup an external solution so that on a EC2 node removal, a job is run to SSH or API into a manager and remove the node from swarm. I've seen people do this, but can't remember a link/repo for full details on using a lambda, etc. to deal with the lifecycle hook.
Setup a simple cron on a single manager (or preferably as a manager-only service running a cron container) that removes workers that are marked down. This is a sort of blunt approach and has edge cases where you could potentially delete a node that's existing but considered down/unhealthy by swarm, but I've not heard of that happening. If it was fancy, it could maybe validate with AWS that node is indeed gone before removing.
WORST CASE, if a node goes down hard and doesn't do any of the above, it's not horrible, just not ideal for graceful management of user/db connections. After 30s a node is considered down and Service tasks will be re-created on healthy nodes. A long list of workers marked down in the swarm node list doesn't have an effect on your Services really, it's just unsightly (as long as there are enough healthy workers).
THERE'S A FEATURE REQUEST in GitHub to make this removal easier. I've commented on what I'm seeing in the wild. Feel free to post your story and use case in the SwarmKit repo.

Kubernetes postStart hook seems that blocks all the containers startup. Is that possible?

I am in a situation where my pod is composed by 3 containers called:
application
collectd
fluentd
I want that: application starts only when both collectd and fluentd are up&running ..
So I created a simple postStart script placed in application container, that waits forever for liveness/readiness probe of collectd and fluentd. I developed this part with the assumption that kubernetes starts in parallel all the containers and in case there is something wrong (liveness probe down) restart automatically the failing containers.
But I found two strange behaviours, strictly connected one with the other:
the never-ending loop in the postStart script (that waits for liveness of other containers) blocks completely the spin-up of the other two. So that fluentd and collectd have never been started.
kubernetes run in alphabetical order the containers defined inside a pod. So if I call my first container zapplication (instead of application) everything works as expected.
Now what I am asking is if this behaviour is a feature or is it a bug?
(END OF QUESTION HERE, FREE THOUGHTS ABOUT THE POSSIBLE ANSWER FOLLOW)
I read a lot of articles and discussions that state there should be no assumption in the ordering of container start, but this doesn't seem the case. Moreover among all the possible kind of ordering the alphabetical one doesn't seem to me the most deterministic one.
It also seems really strange to me that a postStart script on a container has the power to completely block the other two. But it can be also the case that this is the feature that solves the problem of dependent containers!
========================================================================
UPDATE 2/12/2016: I'll link here as a reference the issue on github
https://github.com/kubernetes/kubernetes/issues/37805

Resources