I am quite new to Airflow. However, I bumped into the same timing and interval issues that novice faced when dealing with the schedule interval. As such I wanted to try to externally trigger a DAG via cli. This can be done by simply going to the console and typing example:
airflow trigger_dag tutorial
(using airflow docker image: 1.10.9)
Next I wanted to see if the same command works with a regular cron job as I wanted to trigger it as like a cron job time. Hence I created a cron job of something like this:
* * * * * airflow trigger_dag tutorial
However this does not trigger the DAG now.
Upon other few experiments, I can manually trigger the DAG via the same command in an shell script, but it cannot be done with a sh command via the cron job.
(I have verified that the cron works as I tried with just outputing a normal file.)
Can anybody tell me how I can trigger the DAG with a regular cron job?
Or what went wrong here?
Most likely you have some environment variable in your user's profile - possibly also .bash_profile or .bashrc that are not available in cron. Cron does not "source" any of the user profile files - you need to source them manually if you want to set them, before running the script.
Nice article that also shows you how to debug it is here: https://www.ibm.com/support/pages/cron-environment-and-cron-job-failures
Try to update the cron command with full path to Airflow. To get installation path you can execute:
which airflow
Now update the cron command.
Related
I have created an http cloud scheduler task. I'm expecting it to have a maximum run time of 5 minutes. However my task is reporting DEADLINE_EXCEEDED after exactly 1 minute.
When I run gcloud scheduler jobs describe MySyncTask to view my task it reports attemptDeadline: 300s. The service I am calling is cloud run and I have also set a 300s limit.
I am running the task manually by clicking "force a job run" in the GUI.
After 1 minute exactly in the logs it reports DEADLINE_EXCEEDED
When you execute a job from the GUI, it will be executed using the default attemptDeadline value, which is 60 seconds according to this question.
If you want to run it manually, I suggest to run the job from the Cloud Shell and pass the --attempt-deadline flag with the desired value, as shown on this answer:
gcloud beta scheduler jobs update http <job> --attempt-deadline=1800s --project <project>
I accidentally removed a job submission script for a Slurm job in terminal using rm command. As far as I know there are no (relatively easy) ways of recovering that file anymore, and I hadn't saved it anywhere. I have used that job submission script many many times before, so there are a lot of Slurm job submissions (all of them finished) that have used it. Is it possible to recover that job script from an old finished job somehow?
If Slurm is configured with the ElasticSearch plugin, then you will find the submission script for all completed jobs in the ElasticSearch instance used in the setup.
Another option is to install sarchive
I am trying to optimize the time how many times my cron job runs to create a python script.
I want to know how can I find or know before a process dies by itself so that way I can run my cron job accordingly
Trying to setup the time schedule of my cron job to run a python script on linux server
no code
I am using node-cron to schedule some tasks inside my node app. This package has some API to create, start and stop cron jobs. However, I can't seem to find these cron jobs when I run crontab -l command in my OS terminal. I tried both on my mac os as well as on centos.
Specific question:
Does such node packages create cron jobs at the OS level?
If answer to 1 is yes, then will these cron jobs execute irregardless my node app is running or not?
If answer to 2 is yes, then how do I stop and clear out all such schedules cron jobs?
Giving a fast look at the node-cron source code, you can check that,
node-cron does not create any cron at the OS Level.
It looks like just a long time out functionality..
I suppose that if the node process will be restarted you lost the launched cronjobs.
I am using /torque/4.2.5 to schedule my jobs and I need to find a way to make a copy of my PBS launching script that I used for jobs that are currently queueing or running. The plan is to make a copy of that launch script in the output folder.
TORQUE has a job logging feature that can be configured to record job scripts used at launch time.
EDIT: if you have administrator privileges and want to read the file that is stored you can inspect TORQUE_HOME/server_priv/jobid.SC
TORQUE_HOME is usually /var/spool/torque but is configurable.