How to change the schedule of a Kubernetes cronjob or how to start it manually? - cron

Is there a simple way to change the schedule of a kubernetes cronjob like kubectl change cronjob my-cronjob "10 10 * * *"? Or any other way without needing to do kubectl apply -f deployment.yml? The latter can be extremely cumbersome in a complex CI/CD setting because manually editing the deployment yaml is often not desired, especially not if the file is created from a template in the build process.
Alternatively, is there a way to start a cronjob manually? For instance, a job is scheduled to start in 22 hours, but I want to trigger it manually once now without changing the cron schedule for good (for testing or an initial run)?

You can update only the selected field of resourse by patching it
patch -h
Update field(s) of a resource using strategic merge patch, a JSON merge patch, or a JSON patch.
JSON and YAML formats are accepted.
Please refer to the models in
https://htmlpreview.github.io/?https://github.com/kubernetes/kubernetes/blob/HEAD/docs/api-reference/v1/definitions.html
to find if a field is mutable.
As provided in comment for ref :
kubectl patch cronjob my-cronjob -p '{"spec":{"schedule": "42 11 * * *"}}'
Also, in current kubectl versions, to launch a onetime execution of a declared cronjob, you can manualy create a job that adheres to the cronjob spec with
kubectl create job --from=cronjob/mycron

The more recent versions of k8s (from 1.10 on) support the following command:
$ kubectl create job my-one-time-job --from=cronjobs/my-cronjob
Source is this solved k8s github issue.

From #SmCaterpillar answer above kubectl patch my-cronjob -p '{"spec":{"schedule": "42 11 * * *"}}',
I was getting the error: unable to parse "'{spec:{schedule:": yaml: found unexpected end of stream
If someone else is facing a similar issue, replace the last part of the command with -
"{\"spec\":{\"schedule\": \"42 11 * * *\"}}"

I have a friend who developed a kubectl plugin that answers exactly that !
It takes an existing cronjob and just create a job out of it.
See https://github.com/vic3lord/cronjobjob
Look into the README for installation instructions.

And if you want to do patch a k8s cronjob schedule with the Python kubernetes library, you can do this like that:
from kubernetes import client, config
config.load_kube_config()
v1 = client.BatchV1beta1Api()
body = {"spec": {"schedule": "#daily"}}
ret = v1.patch_namespaced_cron_job(
namespace="default", name="my-cronjob", body=body
)
print(ret)

Related

Spring Boot logs missing using crontab command

I run my Spring Boot/Batch Jar to process large number of files in my local environment like below:
java -jar /path/to_my_dir_for_jar/springbatch.jar --spring.profiles.active=qa
And logs get generated in the same directory: /path/to_my_dir_for_jar
No issues whatsoever till here.
Now I ran the same job using linux crontab using (11:30 am, daily) :
30 11 * * * java -jar /path/to_my_dir_for_jar/springbatch.jar --spring.profiles.active=qa
Please note: I have to rely on linux crontab to schedule and not spring cron-expression as Spring Batch wont read the new files in the staging directory after first run.
Now I can NOT find my logs at - /path/to_my_dir_for_jar. In the application there different types of logs are generated such as:
service_log.gnxError.log
service_log.gcnDone.log
service_log.error.log
service_log.done.log
service_log.log
I tried to follow this and some other similar help but none of them seem to be working as most of them say - No such file or directory
Where to look for them?
One possible way is to redirect the output of my crontab expression to a file like below:
30 11 * * * java -jar /path/to_my_dir_for_jar/springbatch.jar --spring.profiles.active=qa >> foobar.log
But this way I will loose my different type of logs as everything will be fused in just one file in this case.
Will I get the logs the way I get while running the jar manually ever using linux crontab? What are my options in this situation?

Databricks init scripts not working sometimes

Ok, it is very strange. I have some init scripts that I would like to run when a cluster starts
cluster has the init script , which is in a file (in dbfs)
basically this
dbfs:/databricks/init-scripts/custom-cert.sh
Now , when I make the init script like this, it works (no ssl errors for my endpoints. Also, the event logs for the cluster shows the duration as 1 second for the init script
dbutils.fs.put("/databricks/init-scripts/custom-cert.sh", """#!/bin/bash
cp /dbfs/orgcertificates/orgcerts.crt /usr/local/share/ca-certificates/
sudo update-ca-certificates
echo "export REQUESTS_CA_BUNDLE=/etc/ssl/certs/ca-certificates.crt" >> /databricks/spark/conf/spark-env.sh
""")
However, if I just put the init script in an bash script and upload it to DBFS through a pipeline, the init script does not do anything. It executes , as per the event log but the execution duration is 0 sec.
I have the sh script in a file named
custom-cert.sh
with the same contents as above, i.e.
#!/bin/bash
cp /dbfs/orgcertificates/orgcerts.crt /usr/local/share/ca-certificates/
sudo update-ca-certificates
echo "export REQUESTS_CA_BUNDLE=/etc/ssl/certs/ca-certificates.crt"
but when I check /usr/local/share/ca-certificates/ , it does not contain /dbfs/orgcertificates/orgcerts.crt, even though the cluster init script has run.
Also, I have compared the contents of the init script in both cases and it least to the naked eye, I can't figure out any difference
i.e.
%sh
cat /dbfs/databricks/init-scripts/custom-cert.sh
shows the same contents in both the scenarios. What is the problem with the 2nd case?
EDIT: I read a bit more about init scripts and found that the logs of init scripts are written here
%sh
ls /databricks/init_scripts/
Looking at the err file in that location, it seems there is an error
sudo: update-ca-certificates
: command not found
Why is it that update-ca-certificates found in the first case but not when I put the same script in a sh script and upload it to dbfs (instead of executing the dbutils.fs.put within a notebook) ?
EDIT 2: In response to the first answer. After running the command
dbutils.fs.put("/databricks/init-scripts/custom-cert.sh", """#!/bin/bash
cp /dbfs/orgcertificates/orgcerts.crt /usr/local/share/ca-certificates/
sudo update-ca-certificates
echo "export REQUESTS_CA_BUNDLE=/etc/ssl/certs/ca-certificates.crt" >> /databricks/spark/conf/spark-env.sh
""")
the output is the file custom-cert.sh and then I restart the cluster with the init script location as dbfs:/databricks/init-scripts/custom-cert.sh and then it works. So, it is essentially the same content that the init script is reading (which is the generated sh script). Why can't it read it if I do not use dbfs put but just put the contents in bash file and upload it during the CI/CD process?
As we aware, An init script is a shell script that runs during startup of each cluster node before the Apache Spark driver or worker JVM start. case-2 When you run bash
command by using of %sh magic command means you are trying to execute this command in Local driver node. So that workers nodes is not able to access . But based on
case-1 , By using of %fs magic command you are trying run copy command (dbutils.fs.put )from root . So that along with driver node , other workers node also can access path .
Ref : https://docs.databricks.com/data/databricks-file-system.html#summary-table-and-diagram
It seems that my observations I made in the comments section of my question is the way to go.
I now create the init script using a databricks job that I run during the CI/CD pipeline from Azure DevOps.
The notebook has the commands
dbutils.fs.rm("/databricks/init-scripts/custom-cert.sh")
dbutils.fs.put("/databricks/init-scripts/custom-cert.sh", """#!/bin/bash
cp /dbfs/internal-certificates/certs.crt /usr/local/share/ca-certificates/
sudo update-ca-certificates
echo "export REQUESTS_CA_BUNDLE=/etc/ssl/certs/ca-certificates.crt" >> /databricks/spark/conf/spark-env.sh
""")
I then create a Databricks job (pointing to this notebook), the cluster is a job cluster which is just temporary . Of course , in my case , even this job creation is automated using a powershell script.
I then call this Databricks job in the release pipeline using again a Powershell script.
This creates the file
/databricks/init-scripts/custom-cert.sh
I then use this file in any other cluster that accesses my org's endpoints (without certificate errors).
I do not know (or still understand), why can't the same script file be just part of a repo and uploaded during the release process (instead of it being this Databricks job calling a notebook). I would love to know the reason . The other answer on this question does not hold true as you can see, that the cluster script is created by a job cluster and then accessed from another cluster as part of its init script.
It simply boils down to how the init script gets created.
But I get my job done. Just if it helps someone get their job done too.
I have raised a support case though to understand the reason.

check if Kubernetes deployment was sucessful in CI/CD pipeline

I have an AKS cluster with Kubernetes version 1.14.7.
I have build CI/CD pipelines to deploy newly created images to the cluster.
I am using kubectl apply to update a specific deployment with the new image. sometimes and for many reasons, the deployment fails, for example ImagePullBackOff.
is there a command to run after the kubectl apply command to check if the pod creation and deployment was successful?
For this purpose Kubernetes has kubectl rollout and you should use option status.
By default 'rollout status' will watch the status of the latest rollout until it's done. If you don't want to wait for the rollout to finish then you can use --watch=false. Note that if a new rollout starts in-between, then 'rollout status' will continue watching the latest revision. If you want to pin to a specific revision and abort if it is rolled over by another revision, use --revision=N where N is the revision you need to watch for.
You can read the full description here
If you use kubect apply -f myapp.yaml and check rollout status you will see:
$ kubectl rollout status deployment myapp
Waiting for deployment "myapp" rollout to finish: 0 of 3 updated replicas are available…
Waiting for deployment "myapp" rollout to finish: 1 of 3 updated replicas are available…
Waiting for deployment "myapp" rollout to finish: 2 of 3 updated replicas are available…
deployment "myapp" successfully rolled out
There is another way to wait for deployment to become available with a configured timeout like
kubectl wait --for=condition=available --timeout=60s deploy/myapp
otherwise kubectl rollout status can be used but it may stuck forever in some rare cases and will require manual cancellation of pipeline if that happens.
You can parse the output through jq:
kubectl get pod -o=json | jq '.items[]|select(any( .status.containerStatuses[]; .state.waiting.reason=="ImagePullBackOff"))|.metadata.name'
It looks like kubediff tool is a perfect match for your task:
Kubediff is a tool for Kubernetes to show you the differences between your running configuration and your version controlled configuration.
The tool can be used from the command line and as a Pod in the cluster that continuously compares YAML files in the configured repository with the current state of the cluster.
$ ./kubediff
Usage: kubediff [options] <dir/file>...
Compare yaml files in <dir> to running state in kubernetes and print the
differences. This is useful to ensure you have applied all your changes to the
appropriate environment. This tools runs kubectl, so unless your
~/.kube/config is configured for the correct environment, you will need to
supply the kubeconfig for the appropriate environment.
kubediff returns the status to stdout and non-zero exit code when difference is found. You can change this behavior using command line arguments.
You may also want to check the good article about validating YAML files:
Validating Kubernetes Deployment YAMLs

Ansible: setup a cron job on one host

I'm deploying a 2-hosts service that also needs to setup a cron job. This job should only be run on one of the two machines (I dont care which). what's the easiest way to do so?
I know that the shell module in Ansible supports "run_once", but the cron module does not.
I could setup the cron job on both machines and then use the command "crontab -r" to remove all the jobs (provided no other jobs are needed there) on one machine. this is dirty, but very easy.
any better ideas?
I know that the shell module in Ansible supports "run_once", but the cron module does not.
Wrong. run_once is a property of a task, not of action modules.
Use cron module and set run_once for the task (mind the indentation level), for example:
- cron:
name: "check dirs"
minute: "0"
hour: "5,2"
job: "ls -alh > /dev/null"
run_once: true

Compile a node's catalog to JSON without a master?

I'm looking to improve my org's Puppet work lifecycle by comparing differences for actual node catalogs. We came across this project which compiles catalogs for nodes and creates a diff for them, but it seems to require an online master.
I need to be able to do what this tool does, albeit without a master - I'd just like to compile a deterministic JSON or YAML blob which describes all of the resources that would be managed by Puppet for a given node and given a set of facts.
Is there a way for me to do this without an online master?
If you have Rspec-puppet set up, there is an easy way to do this. Just add a File.write statement inside one of your it blocks:
require 'spec_helper'
describe 'myclass' do
it {
File.write(
'myclass.json',
PSON.pretty_generate(catalogue)
)
#is_expected.to compile.with_all_deps
}
end
I have more information in a blog post on this here.
If you can't use Rspec-puppet to do it (recommended), have a look at another blog post I wrote, Compiling a puppet catalog – on a laptop.
I have been looking for it for a long time.
In the end, Vagrant in combination with Puppet brought me the solution.
The only thing you need is to install puppet.
First you have to set the Facts. After that you can compile your catalog.
FACTER_server_role='webstack' \
... \
FACTER_hostname='hostname' \
FACTER_fqdn='hostnamne.fqdn' \
puppet catalog compile hostnamne.fqdn \
--modulepath "./modules" \
--hiera_config "./hiera.yaml" \
--environmentpath ./environments/ \
--environment production
If you want to get a clean json file, pipe the output in sed and send the output to a file.
| sed -n '1!p' > $file

Resources