Ansible: setup a cron job on one host - cron

I'm deploying a 2-hosts service that also needs to setup a cron job. This job should only be run on one of the two machines (I dont care which). what's the easiest way to do so?
I know that the shell module in Ansible supports "run_once", but the cron module does not.
I could setup the cron job on both machines and then use the command "crontab -r" to remove all the jobs (provided no other jobs are needed there) on one machine. this is dirty, but very easy.
any better ideas?

I know that the shell module in Ansible supports "run_once", but the cron module does not.
Wrong. run_once is a property of a task, not of action modules.
Use cron module and set run_once for the task (mind the indentation level), for example:
- cron:
name: "check dirs"
minute: "0"
hour: "5,2"
job: "ls -alh > /dev/null"
run_once: true

Related

Databricks init scripts not working sometimes

Ok, it is very strange. I have some init scripts that I would like to run when a cluster starts
cluster has the init script , which is in a file (in dbfs)
basically this
dbfs:/databricks/init-scripts/custom-cert.sh
Now , when I make the init script like this, it works (no ssl errors for my endpoints. Also, the event logs for the cluster shows the duration as 1 second for the init script
dbutils.fs.put("/databricks/init-scripts/custom-cert.sh", """#!/bin/bash
cp /dbfs/orgcertificates/orgcerts.crt /usr/local/share/ca-certificates/
sudo update-ca-certificates
echo "export REQUESTS_CA_BUNDLE=/etc/ssl/certs/ca-certificates.crt" >> /databricks/spark/conf/spark-env.sh
""")
However, if I just put the init script in an bash script and upload it to DBFS through a pipeline, the init script does not do anything. It executes , as per the event log but the execution duration is 0 sec.
I have the sh script in a file named
custom-cert.sh
with the same contents as above, i.e.
#!/bin/bash
cp /dbfs/orgcertificates/orgcerts.crt /usr/local/share/ca-certificates/
sudo update-ca-certificates
echo "export REQUESTS_CA_BUNDLE=/etc/ssl/certs/ca-certificates.crt"
but when I check /usr/local/share/ca-certificates/ , it does not contain /dbfs/orgcertificates/orgcerts.crt, even though the cluster init script has run.
Also, I have compared the contents of the init script in both cases and it least to the naked eye, I can't figure out any difference
i.e.
%sh
cat /dbfs/databricks/init-scripts/custom-cert.sh
shows the same contents in both the scenarios. What is the problem with the 2nd case?
EDIT: I read a bit more about init scripts and found that the logs of init scripts are written here
%sh
ls /databricks/init_scripts/
Looking at the err file in that location, it seems there is an error
sudo: update-ca-certificates
: command not found
Why is it that update-ca-certificates found in the first case but not when I put the same script in a sh script and upload it to dbfs (instead of executing the dbutils.fs.put within a notebook) ?
EDIT 2: In response to the first answer. After running the command
dbutils.fs.put("/databricks/init-scripts/custom-cert.sh", """#!/bin/bash
cp /dbfs/orgcertificates/orgcerts.crt /usr/local/share/ca-certificates/
sudo update-ca-certificates
echo "export REQUESTS_CA_BUNDLE=/etc/ssl/certs/ca-certificates.crt" >> /databricks/spark/conf/spark-env.sh
""")
the output is the file custom-cert.sh and then I restart the cluster with the init script location as dbfs:/databricks/init-scripts/custom-cert.sh and then it works. So, it is essentially the same content that the init script is reading (which is the generated sh script). Why can't it read it if I do not use dbfs put but just put the contents in bash file and upload it during the CI/CD process?
As we aware, An init script is a shell script that runs during startup of each cluster node before the Apache Spark driver or worker JVM start. case-2 When you run bash
command by using of %sh magic command means you are trying to execute this command in Local driver node. So that workers nodes is not able to access . But based on
case-1 , By using of %fs magic command you are trying run copy command (dbutils.fs.put )from root . So that along with driver node , other workers node also can access path .
Ref : https://docs.databricks.com/data/databricks-file-system.html#summary-table-and-diagram
It seems that my observations I made in the comments section of my question is the way to go.
I now create the init script using a databricks job that I run during the CI/CD pipeline from Azure DevOps.
The notebook has the commands
dbutils.fs.rm("/databricks/init-scripts/custom-cert.sh")
dbutils.fs.put("/databricks/init-scripts/custom-cert.sh", """#!/bin/bash
cp /dbfs/internal-certificates/certs.crt /usr/local/share/ca-certificates/
sudo update-ca-certificates
echo "export REQUESTS_CA_BUNDLE=/etc/ssl/certs/ca-certificates.crt" >> /databricks/spark/conf/spark-env.sh
""")
I then create a Databricks job (pointing to this notebook), the cluster is a job cluster which is just temporary . Of course , in my case , even this job creation is automated using a powershell script.
I then call this Databricks job in the release pipeline using again a Powershell script.
This creates the file
/databricks/init-scripts/custom-cert.sh
I then use this file in any other cluster that accesses my org's endpoints (without certificate errors).
I do not know (or still understand), why can't the same script file be just part of a repo and uploaded during the release process (instead of it being this Databricks job calling a notebook). I would love to know the reason . The other answer on this question does not hold true as you can see, that the cluster script is created by a job cluster and then accessed from another cluster as part of its init script.
It simply boils down to how the init script gets created.
But I get my job done. Just if it helps someone get their job done too.
I have raised a support case though to understand the reason.

Run PM2 cron only on a scheduled time

I run one of my node.js command using the following command:
pm2 start sample.js --cron "0 1 * * *" -- SAMP
But the problem is this program run twice. First one is when I execute the command and send one is at 1:00 AM(Which we want).
So my query is how can set the cron so that this program runs at only once (At 1:00 AM only).
My suggestion to you is to use crontab for this task. This is exactly what cron was designed for, and not what pm2 was designed for.
As the comment above states, the --cron option only specifies when the app should be restarted, --cron cannot be used to schedule the running of an app. In the cases where we need to run a node app on a specific time table and don't need all of the fancy pm2 capabilities of auto-restarting and clustering, we simply use crontab.

How to change the schedule of a Kubernetes cronjob or how to start it manually?

Is there a simple way to change the schedule of a kubernetes cronjob like kubectl change cronjob my-cronjob "10 10 * * *"? Or any other way without needing to do kubectl apply -f deployment.yml? The latter can be extremely cumbersome in a complex CI/CD setting because manually editing the deployment yaml is often not desired, especially not if the file is created from a template in the build process.
Alternatively, is there a way to start a cronjob manually? For instance, a job is scheduled to start in 22 hours, but I want to trigger it manually once now without changing the cron schedule for good (for testing or an initial run)?
You can update only the selected field of resourse by patching it
patch -h
Update field(s) of a resource using strategic merge patch, a JSON merge patch, or a JSON patch.
JSON and YAML formats are accepted.
Please refer to the models in
https://htmlpreview.github.io/?https://github.com/kubernetes/kubernetes/blob/HEAD/docs/api-reference/v1/definitions.html
to find if a field is mutable.
As provided in comment for ref :
kubectl patch cronjob my-cronjob -p '{"spec":{"schedule": "42 11 * * *"}}'
Also, in current kubectl versions, to launch a onetime execution of a declared cronjob, you can manualy create a job that adheres to the cronjob spec with
kubectl create job --from=cronjob/mycron
The more recent versions of k8s (from 1.10 on) support the following command:
$ kubectl create job my-one-time-job --from=cronjobs/my-cronjob
Source is this solved k8s github issue.
From #SmCaterpillar answer above kubectl patch my-cronjob -p '{"spec":{"schedule": "42 11 * * *"}}',
I was getting the error: unable to parse "'{spec:{schedule:": yaml: found unexpected end of stream
If someone else is facing a similar issue, replace the last part of the command with -
"{\"spec\":{\"schedule\": \"42 11 * * *\"}}"
I have a friend who developed a kubectl plugin that answers exactly that !
It takes an existing cronjob and just create a job out of it.
See https://github.com/vic3lord/cronjobjob
Look into the README for installation instructions.
And if you want to do patch a k8s cronjob schedule with the Python kubernetes library, you can do this like that:
from kubernetes import client, config
config.load_kube_config()
v1 = client.BatchV1beta1Api()
body = {"spec": {"schedule": "#daily"}}
ret = v1.patch_namespaced_cron_job(
namespace="default", name="my-cronjob", body=body
)
print(ret)

Specifying Parallel Environment on Google Compute Engine using Elasticluster

I recently created a Grid Engine cluster on Compute Engine using Elasticluster (http://googlegenomics.readthedocs.org/en/latest/use_cases/setup_gridengine_cluster_on_compute_engine/index.html).
I was wondering what is the appropriate command to run shared-memory multithreaded batch jobs on a cluster of Compute Engine virtual machine running Grid Engine.
In other words, what is the name (i.e. pe_name) of the Grid Engine parallel environment.
Let's say I want to run a job requesting 4 cpus on 1 node, what would be the right qsub command.
So far I tried the following command:
qsub -cwd -l h_vmem=800G -pe smp 6 run.sh
Unable to run job: job rejected: the requested parallel environment "smp" does not exist.
qsub -cwd -l h_vmem=800G -pe omp 6 run.sh
Unable to run job: job rejected: the requested parallel environment "omp" does not exist.
Thank you for your help!
I don't believe that Elasticluster's Ansible playbook includes a parallel environment. You can see the main configuration run on the master here:
https://github.com/gc3-uzh-ch/elasticluster/blob/master/elasticluster/providers/ansible-playbooks/roles/gridengine/tasks/master.yml
I believe you can simply connect to the master and issue the "add parallele environment" command:
$ qconf -ap smp
and write a configuration file like:
pe_name smp
slots 9999
user_lists NONE
xuser_lists NONE
start_proc_args /bin/true
stop_proc_args /bin/true
allocation_rule $fill_up
control_slaves FALSE
job_is_first_task FALSE
urgency_slots min
accounting_summary FALSE
and then modify the queue configuration for all.q:
$ qconf -mq all.q
...
pe_list make smp
...
I would also suggest filing an issue with Elasticluster here:
https://github.com/gc3-uzh-ch/elasticluster/issues
I would expect that someone has already done this in a fork of Elasticluster and may be able to provide a pull request to the master fork.
Hope that helps.
-Matt

How to run a nodejs script every second

I need to run my nodejs script for every second ,Similar to PHP cron jobs. I have tried some nodejs cron libraries like https://github.com/ncb000gt/node-cron but the issue was first run should be manual i:e I have to run the file with cron script for first time manually.
But in php cron jobs, they run by the server so if the apache server running script will automatically start and even if the script return an error for a cycle then script will run again from the beginning from the next cycle
So is there any way to achieve this in nodejs ?
You have two options:
using Node as a daemon, with something like Supervisord to run your node-cron script. This alternative is wasteful on resources such as RAM because Node and Supervisord are running all the time.
using the system's crontab, you can run your script like calling Node on the command line, such as * * * * node /path/to/your/script.js. This alternative is highly efficient but lacks some control, like being able to log the output in case of an error, although you could just pipe the output to a file: node script.js > logfile

Resources