So I was working on small project and I learned about cronjobs in openshift. From this https://docs.openshift.com/container-platform/3.11/dev_guide/cron_jobs.html#creating-a-cronjob
I learned how to write the yaml file. But I am confused about what to mention in container field and how to associate PV to cron job.
I am working with Persistent volume to clear a log file from PV at regular interval through cron job.
Can anyone help me with core understanding of execution of cron job and how do I mention in yaml file for command to delete file inside PV for a pod which has PV mounted to it.
Related
Am using Kubernetes executor https://airflow.apache.org/docs/apache-airflow/stable/executor/kubernetes.html
My requirement is as below, There is a DAG that has two tasks.
Bash Task A (BashOperator) , created a file at temp location, using python code
Email Task B (EmailOperator), must access the above created file and send an email as an attachment
Apparently, In a Kubernetes Executor, each task instance is run in its own pod on a Kubernetes cluster. The worker pod then runs the task, reports the result, and terminates. Therefore after the worker pods shuts everything inside the pod is lost. Thus any file downloaded is lost.
Note : No Storage mounted yet. Exploring easy options if any?.
Would not like python code to send email too, instead want a separate task to email.
If you are looking for the easiest option you can use the
Hostpath to mount the files to Node and if you are running your container on a specific node pool POD will be able to get the
files. Note : If the node goes down you files will be gone.
If you want to share the file system between PODs you have to
implement the ReadWriteMany PVC.
If you are on any cloud provider you can use a File system like
AWS to provide the EFS.
You can also implement the GlusterFS or Minio to create the File
system on K8s and use that as the mount option to PODs so those can
access it share it.
Could also leverage the s3 bucket option to upload the artifacts or
files and new POD will download it first in temp location, email and
terminate it self this way files will be saved at s3 and no clean up
required at FS level or POD level.
I am quite new to Airflow. However, I bumped into the same timing and interval issues that novice faced when dealing with the schedule interval. As such I wanted to try to externally trigger a DAG via cli. This can be done by simply going to the console and typing example:
airflow trigger_dag tutorial
(using airflow docker image: 1.10.9)
Next I wanted to see if the same command works with a regular cron job as I wanted to trigger it as like a cron job time. Hence I created a cron job of something like this:
* * * * * airflow trigger_dag tutorial
However this does not trigger the DAG now.
Upon other few experiments, I can manually trigger the DAG via the same command in an shell script, but it cannot be done with a sh command via the cron job.
(I have verified that the cron works as I tried with just outputing a normal file.)
Can anybody tell me how I can trigger the DAG with a regular cron job?
Or what went wrong here?
Most likely you have some environment variable in your user's profile - possibly also .bash_profile or .bashrc that are not available in cron. Cron does not "source" any of the user profile files - you need to source them manually if you want to set them, before running the script.
Nice article that also shows you how to debug it is here: https://www.ibm.com/support/pages/cron-environment-and-cron-job-failures
Try to update the cron command with full path to Airflow. To get installation path you can execute:
which airflow
Now update the cron command.
I am working on Openshift project. I configured a cron job using yaml config. In our application, all logs are appended to a file inside Persistent Volume. Once cron job is successfully executed to clear the file, further logs are not appended.
My observations:
Initially I thought PV itself isn't Read-Write Many, so I changed it to Read-Write Many and still observed the same behaviour.
There was some image issues, for the image mentioned in the cron job yaml. But I tried with different images and still observed same issue.
Can anyone explain me and find a plausible solution for the following ?
Edit:
Yes logs are being config as APPEND only while opening the file. I am aware about Access Modes to PV do not control over IO Operations.
First you should check how the logs are recorded in the file on the backend storage of the PV. For example, the log files should be opened with "APPEND" mode, when the application start to log the messages to the file.
The PV access modes, such as ReadWriteMany and ReadWriteOnce do not control over the IO of the application perspective, it considers just only the nodes, not the pods and the applications. Refer Access Modes for more details.
ReadWriteOnce -- the volume can be mounted as read-write by a single node
ReadOnlyMany -- the volume can be mounted read-only by many nodes
ReadWriteMany -- the volume can be mounted as read-write by many nodes
In other words, it means k8s does not ensure the file handling on the pods and the applications. Or the IO operations depends on the backend storage or filesystem characteristic, such as block(iscsi, ebs ...) or file unit(nfs, glusterfs ...) IO.
I'm using the Helm Chart to deploy Spark Operator to GKE. Then I define a SparkApplication specification in a YAML file. But after reading the User Guide I still don't understand:
Where to store SparkApplication YAML files on Kubernetes cluster or Google storage?
Is it ok/possible to deploy them along with the Spark Operator Helm chart to the Spark Master container?
Is it a good approach to load the SparkApplication configurations to Google Storage and then run kubectl apply -f <YAML GS file path>
What are the best practices for storing SparkApplication configurations on Kubernetes cluster or GS that I may be missing?
To address your questions:
There are a lot of possibilities to store your YAML files. You can store it locally on your PC, laptop or you can store it in the cloud. Going further in that topic, syncing your YAML files to version controlled system (for example Git) would be one of the better options because you will have full history of the changes with ability to check what changes you made and rollback if something failed. The main thing is that the kubectl will need access to this files.
There is no such thing as master container in Kubernetes. There is master node. A master node is a machine which controls and manages a set of worker nodes (workloads runtime)
Please check the official documentation about Kubernetes components.
You can put your YAML files in your Google Storage (bucket). But you would not be able to run command in a way kubectl apply -f FILE. kubectl will not be able to properly interpret file location like gs://NAME_OF_THE_BUCKET/magical-deployment.yaml.
One way to run kubectl apply -f FILE_NAME.yaml would be to have it stored locally and synced outside.
You can access the data inside a bucket through gsutil. You could try to tinker with gsutil cat gs://NAME_OF_THE_BUCKET/magical-deployment.yaml and try to pipe it into kubectl but I would not recommend that approach.
Please refer to gsutil tool documentation in this case and be aware of:
The gsutil cat command does not compute a checksum of the downloaded data. Therefore, we recommend that users either perform their own validation of the output of gsutil cat or use gsutil cp or rsync (both of which perform integrity checking automatically).
-- https://cloud.google.com/storage/docs/gsutil/commands/cat
Let me know if you have any questions to this.
I am using /torque/4.2.5 to schedule my jobs and I need to find a way to make a copy of my PBS launching script that I used for jobs that are currently queueing or running. The plan is to make a copy of that launch script in the output folder.
TORQUE has a job logging feature that can be configured to record job scripts used at launch time.
EDIT: if you have administrator privileges and want to read the file that is stored you can inspect TORQUE_HOME/server_priv/jobid.SC
TORQUE_HOME is usually /var/spool/torque but is configurable.