I have deployed a pod in kubernetes cluster that run a python script.
The problem is i want to force the k8s to stop the container after the script complete his job and not to re-create another pod.
To be aware that i have tried to use kind:job but it doesn't fulfill my need.
I tried two types of kind, job and deployments.
With the deployment the pod always show status first completed after that crush with crashloopbackof error.
With the job the pod always show the status completed but i don't have the possibility to re-excute it with an automated way
Do you have any suggestions about that?
I have posted community wiki answer to summarise the topic.
User Jonas has posted great suggestions:
A kind Job does exactly this. Use Job and your problem is solved.
If you deploy with kubectl create -f job.yaml and your job has a generateName: instead of name:, a new Job will be created each time.
For more information look at the documentation about Jobs. See also information about Gnerated values.
Related
I have an AKS cluster running on which I enabled Container Insights.
The Log Analytics workspace has a decent amount of logs in there.
Now I do have my applications running on a separate namespace, and one namespace which has some Grafana containers running (which I also don't want in my captured logs).
So, I searched on how I could reduce the amount of captured logs and came across this Microsoft docs article.
I deployed the template ConfigMap to my cluster and for [log_collection_settings.stdout] and [log_collection_settings.stderr] I excluded the namespaces which I don't want to capture.
When calling kubectl edit configmap container-azm-ms-agentconfig -n kube-system I get the following:
Which means that my config is actually in there.
Now when I open a query window in Log Analytics workspace and execute the following query:
KubePodInventory
| where Namespace == "kube-system"
I get plenty of results with a TimeGenerated column that contains values that are like 5 minutes ago, while I setup the ConfigMap a week ago.
In the logs of one of the pods omsagent-... I see logs like the following:
Both stdout & stderr log collection are turned off for namespaces: '*.csv2,*_kube-system_*.log,*_grafana-namespace_*.log'
****************End Config Processing********************
****************Start Config Processing********************
config::configmap container-azm-ms-agentconfig for agent settings mounted, parsing values
config::Successfully parsed mounted config map
While looking here at StackOverflow, I found the following answers which make me believe that this is the right thing that I did:
https://stackoverflow.com/a/63838009
https://stackoverflow.com/a/63058387
https://stackoverflow.com/a/72288551
So, not sure what I am doing wrong here. Anyone an idea?
Since I hate it myself that some people don't post an answer even if they already have one, here it is (although not the answer you want, at least for now).
I posted the issue on GitHub where the repository is maintained for Container Insights.
The issue can be seen here on GitHub.
If you don't want to click the link, here is the answer from Microsoft:
We are working on adding support for namespace filtering for inventory and perf metrics tables and will update you as soon this feature available.
So, currently we are not able to exclude more data than the ContainerLog table with this ConfigMap.
I've containerized a logic that I have to run on a schedule. If I do my docker run locally (whatever my image is local or it is using the one from the hub) everything works great.
Now I need though to run that "docker run" on a scheduled base, on the cloud.
Azure would be preferred, but honestly, I'm looking for the easier and cheapest way to achieve this goal.
Moreover, my schedule can change, so maybe today that job runs once a day, in the future that can change.
What do you suggest?
You can create an Azure Logic app to trigger the start of a Azure Container Instance. As you have a "run-once" (every N minute/hour/..) container, the restart-policy should be set to "Never", so that the container only executes and then stops after the scheduling.
The Logic app needs to have the permissions to start the Container, so add a role assignment on the ACI to the managed identity of the logic App.
Screenshot shows the workflow with a Recurrence trigger, that starts an existing container every minute.
Should be quite cheap and utilizes only Azure services, without any custom infrastructure
Professionally I used 4 ways to run cron jobs/ scheduled builds. I give a quick summary of all with it pros/cons.
GitLab scheduled builds (free)
My personal preference would be to setup a scheduled pipeline in GitLab. Simply add the script to a .gitlab-ci.yml, configure the scheduled build and you are done. This is the lightweight option and works in most cases, if the execution time is not too long. I used this approach for scraping simple pages.
Jenkins scheduled builds (not-free)
I used the same approach as GitLab with Jenkins. But Jenkins comes with more overhead and you have to configure the entire Jenkins on multiple machines.
Kubernetes CronJob (expensive)
My third approach would be using a kubernetes cronjob. However, I would only use this if I consume a lot of memory/ram, or have a long execution time. I used this approach for dumping really large data sets.
Run a cron job from a container (expensive)
My last option would be to deploy a docker container on either a VM or a Kubernetes cluster and configure a cron job from within that docker container. You can even use docker-in-docker for that. This gives maximum flexibility, but comes with some challenges. Personally I like the separation of concerns when it comes to down-times etc. That's why never run a cron job as main process.
I'm calling the /clusters/events API with PowerShell to check if my Databricks cluster is up and ready for the next step in my setup process. Is this the best approach?
Currently, I grab the array of ClusterEvent and check the most recent ClusterEvent for its ClusterEventType. If it's RUNNING, we're good to go and we move on to the next step.
Recently, I discovered my release pipeline was hanging while checking the cluster status. It turns out that the cluster was in fact running but its status was DRIVER_HEALTHY, not RUNNNING. So, I changed my script and everyone is happy again.
Is there an official API call I make that returns yes/no, true/false, etc. so I don't need to code for the ClusterEventType I find means the cluster is running?
There is no such API that says yes/no about the cluster status. You can use Get command of the Clusters REST API - it returns information about current state of the cluster, so you just need to wait until it's get to the RUNNING state.
P.S. if you're doing that as part of release pipeline, or something like, then you can look to the Terraform provider for Databricks - it will handle waiting for cluster running, and other things automatically, and you can combine it with other things, like, provisioning of Azure resources, etc.
Situation:
I have a pipeline job that executes tests in parallel. I use Azure VMs that I start/stop on each build of the job thru Powershell. Before I run the job, it checks if there are available VMs on azure (offline VMs) then use that VMs for that build. If there is no available VMs then I will fail the job. Now, one of my requirements is that instead of failing the build, I need to queue the job until one of the nodes is offline/available then use those nodes.
Problem:
Is there any way for me to this? Any existing plugin or a build wrapper that will allow me to queue the job based on the status of the nodes? I was forced to do this because we need to stop the Azure VM to lessen the cost usage.
As of the moment, I am still researching if this is possible or any other way for me to achieve this. I am thinking of any groovy script that will check the nodes and if there are no available, I will manually add it to the build queue until at least 1 is available. The closest plugin that I got is Run Condition plugin but I think this will not work.
I am open to any approach that will help me achieve this. Thanks
I have to create a readyness and liveness probe for a node.js container (docker) in kubernetes. My problem is that the container is NOT a server, so I cannot use an http request to see if it is live.
My container runs a node-cron process that download some csv file every 12 h, parse them and insert the result in elasticsearch.
I know I could add express.js but I woud rather not do that just for a probe.
My question is:
Is there a way to use some kind of liveness command probe? If it is possible, what command can I use?
Inside the container, I have pm2 running the process. Can I use it in any way for my probe and, if so, how?
Liveness command
You can use a Liveness command as you describe. However, I would recommend to design your job/task for Kubernetes.
Design for Kubernetes
My container runs a node-cron process that download some csv file every 12 h, parse them and insert the result in elasticsearch.
Your job is not executing so often, if you deploy it as a service, it will take up resources all the time. And when you write that you want to use pm2 for your process, I would recommend another design. As what I understand, PM2 is a process manager, but Kubernetes is also a process manager in a way.
Kubernetes native CronJob
Instead of handling a process with pm2, implement your process as a container image and schedule your job/task with Kubernetes CronJob where you specify your image in the jobTemplate. With this design, you don't have any livenessProbe but your task will be restarted if it fails, e.g. fail to insert the result to elasticSearch due to a network problem.
First, you should certainly consider a Kubernetes CronJob for this workload. That said, it may not be appropriate for your job, for example if your job takes the majority of the time between scheduled runs to run, or you need more complex interactions between error handling in your job and scheduling. Finally, you may even want a liveness probe running for the container spawned by the CronJob if you want to check that the job is making progress as it runs -- this uses the same syntax as you would use with a normal job.
I'm less familiar with pm2, though I don't think you should need to use the additional job management inside of Kubernetes, which should already provide most of what you need.
That said, it is certainly possible to use an arbitrary command for your liveness probe, and as you noted it is even explicitly covered in the kubernetes liveness/rediness probe documentation
You just add an exec member to the livenessProbe stanza for the container, like so:
livenessProbe:
exec:
command:
- cat
- /tmp/healthy
initialDelaySeconds: 5
periodSeconds: 5
If the command returns 0 (e.g. succeeds), then the kubelet considers the container to be alive and healthy. (e.g. in this trivial example, the container is considered healthy only while /tmp/healthy exists).
In your case, I can think of several possibilities to use. As one example, the job could probably be configured to drop a sentinel file that indicates it is making progress in some way. For example, append the name and timestamp of the last file copied. The liveness command would then be a small script that could read that file and ensure that there has been adequate progress (e.g. in the cron job case, that a file has been copied within the last few minutes).
Readiness probes probably don't make sense in the context of the service you describe, since they're more about not sending the application traffic, but they can also have a similar stanza, just for readinessProbe rather than livenessProbe.