cron Job Won't Run on R Scripts on Google Compute Engine - cron

Following these instructions I got R-Studio Server running on a Google Compute Instance: http://code.markedmondson.me/launch-rstudio-server-google-cloud-in-two-lines-r/
When I open a terminal from this RStudio-Server I note that man, ps, vi, cron are all absent.
bash: ps: command not found
My goal is to have a simple cron job periodically run an R-Script.
I manually installed cron with:
sudo apt-get update
sudo apt-get install cron
Still, I can't get cron to run this test:
cmd <- cron_rscript("/home/law9723/now_to_file.R")
cron_add(cmd, frequency = "*/1 * * * *", id = "now_to_file", description = "Write now to file every minute")
-Clearly Confused

I got things to work eventually by using these very helpful instructions: https://yuhuisdatascienceblog.blogspot.ca/2017/07/setting-up-r-studio-server-on-google.html
Using absolute path names with everything associated with cron is sage advice.
I think when I created a vm with this command the sandbox that Rstudio-server lives in is very minimal. Hence, no vi, man, cron, ps...
gce_vm(template = "rstudio",
name = "my-rstudio",
username = "mark", password = "mark1234",
predefined_type = "n1-highmem-2")

I have had success using the rstudio addin. A reference is
http://www.bnosac.be/index.php/blog/51-new-rstudio-add-in-to-schedule-r-scripts.
I am also under the impression that you have to start cron with
sudo cron start
As in https://cran.r-project.org/web/packages/cronR/README.html.
I install the packages shinyFiles, miniUI and cronR when I first get into the Rstudio on GCE after using googleComputeEngineR locally like you. Then "Schedule R scripts on Linux/Unix" will appear in the add-in list.

Related

Databricks init scripts not working sometimes

Ok, it is very strange. I have some init scripts that I would like to run when a cluster starts
cluster has the init script , which is in a file (in dbfs)
basically this
dbfs:/databricks/init-scripts/custom-cert.sh
Now , when I make the init script like this, it works (no ssl errors for my endpoints. Also, the event logs for the cluster shows the duration as 1 second for the init script
dbutils.fs.put("/databricks/init-scripts/custom-cert.sh", """#!/bin/bash
cp /dbfs/orgcertificates/orgcerts.crt /usr/local/share/ca-certificates/
sudo update-ca-certificates
echo "export REQUESTS_CA_BUNDLE=/etc/ssl/certs/ca-certificates.crt" >> /databricks/spark/conf/spark-env.sh
""")
However, if I just put the init script in an bash script and upload it to DBFS through a pipeline, the init script does not do anything. It executes , as per the event log but the execution duration is 0 sec.
I have the sh script in a file named
custom-cert.sh
with the same contents as above, i.e.
#!/bin/bash
cp /dbfs/orgcertificates/orgcerts.crt /usr/local/share/ca-certificates/
sudo update-ca-certificates
echo "export REQUESTS_CA_BUNDLE=/etc/ssl/certs/ca-certificates.crt"
but when I check /usr/local/share/ca-certificates/ , it does not contain /dbfs/orgcertificates/orgcerts.crt, even though the cluster init script has run.
Also, I have compared the contents of the init script in both cases and it least to the naked eye, I can't figure out any difference
i.e.
%sh
cat /dbfs/databricks/init-scripts/custom-cert.sh
shows the same contents in both the scenarios. What is the problem with the 2nd case?
EDIT: I read a bit more about init scripts and found that the logs of init scripts are written here
%sh
ls /databricks/init_scripts/
Looking at the err file in that location, it seems there is an error
sudo: update-ca-certificates
: command not found
Why is it that update-ca-certificates found in the first case but not when I put the same script in a sh script and upload it to dbfs (instead of executing the dbutils.fs.put within a notebook) ?
EDIT 2: In response to the first answer. After running the command
dbutils.fs.put("/databricks/init-scripts/custom-cert.sh", """#!/bin/bash
cp /dbfs/orgcertificates/orgcerts.crt /usr/local/share/ca-certificates/
sudo update-ca-certificates
echo "export REQUESTS_CA_BUNDLE=/etc/ssl/certs/ca-certificates.crt" >> /databricks/spark/conf/spark-env.sh
""")
the output is the file custom-cert.sh and then I restart the cluster with the init script location as dbfs:/databricks/init-scripts/custom-cert.sh and then it works. So, it is essentially the same content that the init script is reading (which is the generated sh script). Why can't it read it if I do not use dbfs put but just put the contents in bash file and upload it during the CI/CD process?
As we aware, An init script is a shell script that runs during startup of each cluster node before the Apache Spark driver or worker JVM start. case-2 When you run bash
command by using of %sh magic command means you are trying to execute this command in Local driver node. So that workers nodes is not able to access . But based on
case-1 , By using of %fs magic command you are trying run copy command (dbutils.fs.put )from root . So that along with driver node , other workers node also can access path .
Ref : https://docs.databricks.com/data/databricks-file-system.html#summary-table-and-diagram
It seems that my observations I made in the comments section of my question is the way to go.
I now create the init script using a databricks job that I run during the CI/CD pipeline from Azure DevOps.
The notebook has the commands
dbutils.fs.rm("/databricks/init-scripts/custom-cert.sh")
dbutils.fs.put("/databricks/init-scripts/custom-cert.sh", """#!/bin/bash
cp /dbfs/internal-certificates/certs.crt /usr/local/share/ca-certificates/
sudo update-ca-certificates
echo "export REQUESTS_CA_BUNDLE=/etc/ssl/certs/ca-certificates.crt" >> /databricks/spark/conf/spark-env.sh
""")
I then create a Databricks job (pointing to this notebook), the cluster is a job cluster which is just temporary . Of course , in my case , even this job creation is automated using a powershell script.
I then call this Databricks job in the release pipeline using again a Powershell script.
This creates the file
/databricks/init-scripts/custom-cert.sh
I then use this file in any other cluster that accesses my org's endpoints (without certificate errors).
I do not know (or still understand), why can't the same script file be just part of a repo and uploaded during the release process (instead of it being this Databricks job calling a notebook). I would love to know the reason . The other answer on this question does not hold true as you can see, that the cluster script is created by a job cluster and then accessed from another cluster as part of its init script.
It simply boils down to how the init script gets created.
But I get my job done. Just if it helps someone get their job done too.
I have raised a support case though to understand the reason.

Add a job to linux cron list programmatically with Nodejs

I have a job with Nodejs that I want to do it each 30 minutes to scan the Database and Update Products Data in An Ecommerce API with my Nodejs Program, note that the Nodejs Program is serving an REST API (Backend) for a react js web application
So I searched for that and I found that I can do that with Nodejs Cron Library like "node-schedule" but I know that will be more interesting to do it with Linux Cron
var j = schedule.scheduleJob('42 * * * *', function(){
console.log('The answer to life, the universe, and everything!');
});
Is there any library that can let me add Cron jobs to Linux using Nodejs Or would I do it with "fs" only? so I will open the cron job file and add my command?
The command crontab which is part of Vixie Cron allows you to create, edit and delete per-user cron entries.
Or if you are running as the root user, which you should not be doing, you can drop cron files into /etc/cron.d
This is not always supported, and if you're running in a Docker type container environment it is doubtful that you have any cron at all. In that environment you'd want your running Nodejs to handle scheduled jobs for you. Or use some other kind of distributed scheduled work system.
You can put your cron job to a nodejs script. Then adding to the crontab can be done with cronbee module, via API:
import { cronbee } from 'cronbee'
await cronbee.ensure({
taskName: 'do smth',
taskRun: `node my-script`,
cron: '42 * * * *'
})
or you can ensure the cron job via CLI, if the module is installed globally or from npm scripts:
$ cronbee ensure mytasks.json

run command at interval on debian

So i have a debian web mapping server for my minecraft world. In order for the map to display the correct information two commands have to be run periodically. I have tried following a few guides to use crontab but so far have failed (and even had to restore the debian image -.-) I am new to linux as a whole and need a step by step guide in plain english to do the following.
run:
"overviewer.py --config /home/mc/test.cfg"
every 30 minutes on the hour and
"overviewer.py --config /home/mc/test.cfg --genpoi"
every five minutes on the hour
It seems pretty straight forward but I have literally spent the better part of two months doing this cause I keep screwing things up.
Thanks for any help!
Remember if you are using crontab, to use the full path to the python script. In debian you can type pwd in the terminal to show the path to your current location.
Assuming the python script is also located in /home/mc/ you should use the command:
/home/mc/overviewer.py --config /home/mc/test.cfg
I would suggest you look into crontab again, the ubuntu help page has alot of information. https://help.ubuntu.com/community/CronHowto
For every 30 minutes:
0,30 * * * * /home/mc/overviewer.py --config /home/mc/test.cfg
And for every 5 minutes:
*/5 * * * * /home/mc/overviewer.py --config /home/mc/test.cfg --genpoi

How to run a nodejs script every second

I need to run my nodejs script for every second ,Similar to PHP cron jobs. I have tried some nodejs cron libraries like https://github.com/ncb000gt/node-cron but the issue was first run should be manual i:e I have to run the file with cron script for first time manually.
But in php cron jobs, they run by the server so if the apache server running script will automatically start and even if the script return an error for a cycle then script will run again from the beginning from the next cycle
So is there any way to achieve this in nodejs ?
You have two options:
using Node as a daemon, with something like Supervisord to run your node-cron script. This alternative is wasteful on resources such as RAM because Node and Supervisord are running all the time.
using the system's crontab, you can run your script like calling Node on the command line, such as * * * * node /path/to/your/script.js. This alternative is highly efficient but lacks some control, like being able to log the output in case of an error, although you could just pipe the output to a file: node script.js > logfile

Automating Linux EBS snapshots backup and clean-up

Are there any good updated shell scripts for EBS snapshots to S3, and clean-up of older snapshots?
I looked through SO, but mostly are from 2009, referring to link that either broken or outdated.
Thanks.
Try the following shell-script, I use this to create snapshot for most of my projects and it works well.
https://github.com/rakesh-sankar/Tools/blob/master/AmazonAWS/EBS/EBS-Snapshot.sh
You can give me pull-request/fork the project to add the functionality of cleaning-up the old entries. Also watch for this repo, when I find some time I will update the code to have clean-up functionality.
If it is ok to use PHP as shel script you can use my latest script with latest AWS PHP SDK. This is much simpler because you do not need to setup environment. Just feed script your API keys.
How to setup
Open SSH connection to your server.
Navigate to folder
$ cd /usr/local/
Clon this gist into ec2 folder
$ git clone https://gist.github.com/9738785.git ec2
Go to that folder
$ cd ec2
Make backup.php executable
$ chmod +x backup.php
Open releases of the AWS PHP SDK github project and copy URL of aws.zip button. Now download it into your server.
$ wget https://github.com/aws/aws-sdk-php/releases/download/2.6.0/aws.zip
Unzip this file into aws directory.
$ unzip aws.zip -d aws
Edit backup.php php file and set all settings in line 5-12
$dryrun = FALSE;
$interval = '24 hours';
$keep_for = '10 Days';
$volumes = array('vol-********');
$api_key = '*********************';
$api_secret = '****************************************';
$ec2_region = 'us-east-1';
$snap_descr = "Daily backup";
Test it. Run this script
$ ./backup.php
Test is snapshot was created.
If everything is ok just add cronjob.
* 23 * * * /usr/local/ec2/backup.php

Resources