Kubernetes CronJob Failure to start a job "Timeout" and "Job already exists"

Kubernetes CronJob Failure to start a job "Timeout" and "Job already exists" - cron

I am trying to run a cronjob in kubernetes, but keep to having these two errors:
type: 'Warning' reason: 'FailedCreate' Error creating job: jobs.batch "dev-cron-1516702680" already exists
and
type: 'Warning' reason: 'FailedCreate' Error creating job: Timeout: request did not complete within allowed duration
Below are my cronjob yaml
apiVersion: batch/v1beta1
kind: CronJob
metadata:
creationTimestamp: 2018-01-23T09:45:10Z
name: dev-cron
namespace: dev
resourceVersion: "16768201"
selfLink: /apis/batch/v1beta1/namespaces/dev/cronjobs/dev-cron
uid: 1a32eb94-0022-11e8-9256-065eb556d6a2
spec:
concurrencyPolicy: Allow
failedJobsHistoryLimit: 1
jobTemplate:
metadata:
creationTimestamp: null
spec:
template:
metadata:
creationTimestamp: null
spec:
containers:
- args:
- for country in th;
- do
- 'curl -X POST -d "{'footprint':'xxxx-xxxx'}"-H "Content-Type: application/json" https://dev.xxx.com/xxx/xxx'
- done
image: appropriate/curl:latest
imagePullPolicy: Always
name: cron
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
dnsPolicy: ClusterFirst
restartPolicy: Never
schedulerName: default-scheduler
securityContext: {}
terminationGracePeriodSeconds: 30
schedule: '* * * * *'
startingDeadlineSeconds: 10
successfulJobsHistoryLimit: 3
suspend: false
status: {}
I am not sure why this is keep happening. I am running Kubernetes version 1.9.1, in AWS cluster. Any idea why?

It turned out to happen because there is an auto injector by Istio initializer. Once I disabled Istio initializer injection for cronjobs, it works fine.

Related

Apache Flink Operator - enable azure-fs-hadoop

I am trying to perform a flink job, using Flink Operator (https://github:com/apache/flink-kubernetes-operator) on k8s, that using uses a connection to Azure Blob Storage described here: https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/deployment/filesystems/azure/
Following the guideline I need to copy the jar file flink-azure-fs-hadoop-1.15.0.jar from one directory to another.
I have already tried to do it via podTemplate and command functionality, but unfortunately it does not work, and the file does not appear in the destination directory.
Can you guide me on how to do it properly?
Below you can find my FlinkDeployment file.
apiVersion: flink.apache.org/v1beta1
kind: FlinkDeployment
metadata:
namespace: flink
name: basic-example
spec:
image: flink:1.15
flinkVersion: v1_15
flinkConfiguration:
taskmanager.numberOfTaskSlots: "2"
serviceAccount: flink
podTemplate:
apiVersion: v1
kind: Pod
metadata:
name: pod-template
spec:
serviceAccount: flink
containers:
- name: flink-main-container
volumeMounts:
- mountPath: /opt/flink/data
name: flink-data
# command:
# - "touch"
# - "/tmp/test.txt"
volumes:
- name: flink-data
emptyDir: { }
jobManager:
resource:
memory: "2048m"
cpu: 1
podTemplate:
apiVersion: v1
kind: Pod
metadata:
name: job-manager-pod-template
spec:
initContainers:
- name: fetch-jar
image: cirrusci/wget
volumeMounts:
- mountPath: /opt/flink/data
name: flink-data
command:
- "wget"
- "LINK_TO_CUSTOM_JAR_FILE_ON_AZURE_BLOB_STORAGE"
- "-O"
- "/opt/flink/data/test.jar"
containers:
- name: flink-main-container
command:
- "touch"
- "/tmp/test.txt"
taskManager:
resource:
memory: "2048m"
cpu: 1
job:
jarURI: local:///opt/flink/data/test.jar
parallelism: 2
upgradeMode: stateless
state: running
ingress:
template: "CUSTOM_LINK_TO_AZURE"
annotations:
cert-manager.io/cluster-issuer: letsencrypt
kubernetes.io/ingress.allow-http: 'false'
traefik.ingress.kubernetes.io/router.entrypoints: websecure
traefik.ingress.kubernetes.io/router.tls: 'true'
traefik.ingress.kubernetes.io/router.tls.options: default

Since you are using the stock Flink 1.15 image this Azure filesystem plugin comes built-in. You can enable it via setting the ENABLE_BUILT_IN_PLUGINS environment variable.
spec:
podTemplate:
containers:
# Do not change the main container name
- name: flink-main-container
env:
- name: ENABLE_BUILT_IN_PLUGINS
value: flink-azure-fs-hadoop-1.15.0.jar
https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/resource-providers/standalone/docker/#using-filesystem-plugins

Using a PV in an OpenShift 3 cron job

I have been able to successfully create a cron job for my OpenShift 3 project. The project is a lift and shift from an existing Linux web server. Part of the existing application requires several cron tasks to run. The one I am looking at the moment is a daily update to the applications database. As part of the execution of the cron job I want to write to a log file. There is already a PV/PVC defined for the main application and I was intending to use that hold the logs for my cron job but it seems the cron job is not being provided access to the PV.
I am using the following inProgress.yml for the definition of the cron job
apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: in-progress
spec:
schedule: "*/5 * * * *"
concurrencyPolicy: "Replace"
startingDeadlineSeconds: 200
suspend: false
successfulJobsHistoryLimit: 3
failedJobsHistoryLimit: 1
jobTemplate:
spec:
template:
metadata:
labels:
parent: "cronjobInProgress"
spec:
containers:
- name: in-progress
image: <image name>
command: ["php", "inProgress.php"]
restartPolicy: OnFailure
volumeMounts:
- mountPath: /data-pv
name: log-vol
volumes:
- name: log-vol
persistentVolumeClaim:
claimName: data-pv
I am using the following command to create the cron job
oc create -f inProgress.yml
PHP Warning: fopen(/data-pv/logs/2022-04-27-app.log): failed to open
stream: No such file or directory in
/opt/app-root/src/errorHandler.php on line 75 WARNING: [2] mkdir():
Permission denied, line 80 in file
/opt/app-root/src/errorLogger.php WARNING: [2]
fopen(/data-pv/logs/2022-04-27-inprogress.log): failed to open stream:
No such file or directory, line 60 in file
/opt/app-root/src/errorLogger.php
Looking at the yml for pod that is executed, there is no mention of data-pv - it appears as though secret volumeMount, which has been added by OpenShift, is removing any further volumeMounts.
apiVersion: v1 kind: Pod metadata: annotations:
openshift.io/scc: restricted creationTimestamp: '2022-04-27T13:25:04Z' generateName: in-progress-1651065900- ...
volumeMounts:
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: default-token-n9jsw
readOnly: true ... volumes:
- name: default-token-n9jsw
secret:
defaultMode: 420
secretName: default-token-n9jsw
How can I access the PV from within the cron job?

Your manifest is incorrect. The volumes block needs to be part of the spec.jobTemplate.spec.template.spec, that is, it needs to be indented at the same level as spec.jobTemplate.spec.template.spec.containers. In its current position it is invisible to OpenShift. See e.g. this pod example.
Similarly, volumeMounts and restartPolicy are arguments to the container block, and need to be indented accordingly.
apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: in-progress
spec:
schedule: '*/5 * * * *'
concurrencyPolicy: Replace
startingDeadlineSeconds: 200
suspend: false
successfulJobsHistoryLimit: 3
failedJobsHistoryLimit: 1
jobTemplate:
spec:
template:
metadata:
labels:
parent: cronjobInProgress
spec:
containers:
- name: in-progress
image: <image name>
command:
- php
- inProgress.php
restartPolicy: OnFailure
volumeMounts:
- mountPath: /data-pv
name: log-vol
volumes:
- name: log-vol
persistentVolumeClaim:
claimName: data-pv

Thanks for the informative response larsks.
OpenShift displayed the following when I copied your manifest suggestions
$ oc create -f InProgress.yml The CronJob "in-progress" is invalid:
spec.jobTemplate.spec.template.spec.restartPolicy: Unsupported value:
"Always": supported values: "OnFailure", " Never"
As your answer was very helpful I was able to resolve this problem by moving restartPolicy: OnFailure so the final manifest is below.
apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: in-progress
spec:
schedule: "*/5 * * * *"
concurrencyPolicy: "Replace"
startingDeadlineSeconds: 200
suspend: false
successfulJobsHistoryLimit: 3
failedJobsHistoryLimit: 1
jobTemplate:
spec:
template:
metadata:
labels:
parent: "cronjobInProgress"
spec:
restartPolicy: OnFailure
containers:
- name: in-progress
image: <image name>
command: ["php", "updateToInProgress.php"]
volumeMounts:
- mountPath: /data-pv
name: log-vol
volumes:
- name: log-vol
persistentVolumeClaim:
claimName: data-pv

Nextcloud with Replicas on Azure Kubernetes - Failing to Mount Azure Files ReadWriteMany Volume

I'm trying to deploy Nextcloud w/HPA (replicas - horizontal scaling) on Azure Kubernetes with the official Nextcloud Helm chart and a ReadWriteMany volume created following these official instructions, but the volume never mounts, get this (or some version thereof) error:
kind: Event
apiVersion: v1
metadata:
name: nextcloud-6bc9b947bf-z6rlh.16bf7711bc2827a5
namespace: nextcloud
uid: c3c5619b-19da-4070-afbb-24bce111ddbe
resourceVersion: '55858'
creationTimestamp: '2021-12-10T18:08:27Z'
managedFields:
- manager: kubelet
operation: Update
apiVersion: v1
time: '2021-12-10T18:08:27Z'
fieldsType: FieldsV1
fieldsV1:
f:count: {}
f:firstTimestamp: {}
f:involvedObject: {}
f:lastTimestamp: {}
f:message: {}
f:reason: {}
f:source:
f:component: {}
f:host: {}
f:type: {}
involvedObject:
kind: Pod
namespace: nextcloud
name: nextcloud-6bc9b947bf-z6rlh
uid: 6106d13f-7033-4a4e-a6e9-a8e3947c52a4
apiVersion: v1
resourceVersion: '55764'
reason: FailedMount
message: >
MountVolume.MountDevice failed for volume "nextcloud-rwx" : rpc error: code =
Internal desc = volume(#azure-secret#aksshare#) mount
"//nextcloudcluster.file.core.windows.net/aksshare" on
"/var/lib/kubelet/plugins/kubernetes.io/csi/pv/nextcloud-rwx/globalmount"
failed with mount failed: exit status 32
Mounting command: mount
Mounting arguments: -t cifs -o
dir_mode=0777,file_mode=0777,gid=33,mfsymlinks,actimeo=30,<masked>
//nextcloudcluster.file.core.windows.net/aksshare
/var/lib/kubelet/plugins/kubernetes.io/csi/pv/nextcloud-rwx/globalmount
Output: mount error(13): Permission denied
Refer to the mount.cifs(8) manual page (e.g. man mount.cifs) and kernel log
messages (dmesg)
source:
component: kubelet
host: aks-agentpool-16596208-vmss000002
firstTimestamp: '2021-12-10T18:08:27Z'
lastTimestamp: '2021-12-10T18:08:35Z'
count: 5
type: Warning
eventTime: null
reportingComponent: ''
reportingInstance: ''
Here is my PersistentVolume yaml:
apiVersion: v1
kind: PersistentVolume
metadata:
name: nextcloud-rwx
namespace: nextcloud
spec:
capacity:
storage: 32Gi
accessModes:
- ReadWriteMany
azureFile:
secretName: azure-secret
shareName: aksshare
readOnly: false
mountOptions:
- dir_mode=0777
- file_mode=0777
- gid=33
- mfsymlinks
PersistentVolumeClaim yaml:
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: nextcloud-rwx
namespace: nextcloud
spec:
accessModes:
- ReadWriteMany
storageClassName: ""
resources:
requests:
storage: 32Gi
I've also tried changing uid and gid to 0, 1000, etc, and get an even more egregious permission denied message because it doesn't "match the fsgroup(33)" (hence why I tried with gid=33).
Any ideas would be greatly appreciated! Thank you for your time.

Application running in Kubernetes cron job does not connect to database in same Kubernetes cluster

I have a Kubernetes cluster running the a PostgreSQL database, a Grafana dashboard, and a Python single-run application (built as a Docker image) that runs hourly inside a Kubernetes CronJob (see manifests below). Additionally, this is all being deployed using ArgoCD with Istio side-car injection.
The issue I'm having (as the title indicates) is that my Python application cannot connect to the database in the cluster. This is very strange to me since the dashboard, in fact, can connect to the database so I'm not sure what might be different for the Python app.
Following are my manifests (with a few things changed to remove identifiable information):
Contents of database.yaml:
---
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: database
name: database
spec:
replicas: 1
selector:
matchLabels:
app: database
strategy: {}
template:
metadata:
labels:
app: database
spec:
containers:
- image: postgres:12.5
imagePullPolicy: ""
name: database
ports:
- containerPort: 5432
env:
- name: POSTGRES_DB
valueFrom:
secretKeyRef:
name: postgres-secret
key: POSTGRES_DB
- name: POSTGRES_USER
valueFrom:
secretKeyRef:
name: postgres-secret
key: POSTGRES_USER
- name: POSTGRES_PASSWORD
valueFrom:
secretKeyRef:
name: postgres-secret
key: POSTGRES_PASSWORD
resources: {}
readinessProbe:
initialDelaySeconds: 30
tcpSocket:
port: 5432
restartPolicy: Always
serviceAccountName: ""
volumes: null
status: {}
---
apiVersion: v1
kind: Service
metadata:
labels:
app: database
name: database
spec:
ports:
- name: "5432"
port: 5432
targetPort: 5432
selector:
app: database
status:
loadBalancer: {}
Contents of dashboard.yaml:
---
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: dashboard
name: dashboard
spec:
replicas: 1
selector:
matchLabels:
app: dashboard
strategy: {}
template:
metadata:
labels:
app: dashboard
spec:
containers:
- image: grafana:7.3.3
imagePullPolicy: ""
name: dashboard
ports:
- containerPort: 3000
resources: {}
env:
- name: POSTGRES_DB
valueFrom:
secretKeyRef:
name: postgres-secret
key: POSTGRES_DB
- name: POSTGRES_USER
valueFrom:
secretKeyRef:
name: postgres-secret
key: POSTGRES_USER
- name: POSTGRES_PASSWORD
valueFrom:
secretKeyRef:
name: postgres-secret
key: POSTGRES_PASSWORD
volumeMounts:
- name: grafana-datasource
mountPath: /etc/grafana/provisioning/datasources
readinessProbe:
initialDelaySeconds: 30
httpGet:
path: /
port: 3000
restartPolicy: Always
serviceAccountName: ""
volumes:
- name: grafana-datasource
configMap:
defaultMode: 420
name: grafana-datasource
- name: grafana-dashboard-provision
status: {}
---
apiVersion: v1
kind: Service
metadata:
labels:
app: dashboard
name: dashboard
spec:
ports:
- name: "3000"
port: 3000
targetPort: 3000
selector:
app: dashboard
status:
loadBalancer: {}
Contents of cronjob.yaml:
---
apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: python
spec:
concurrencyPolicy: Replace
# TODO: Go back to hourly when finished testing/troubleshooting
# schedule: "#hourly"
schedule: "*/15 * * * *"
jobTemplate:
spec:
template:
spec:
containers:
- image: python-tool:1.0.5
imagePullPolicy: ""
name: python
args: []
command:
- /bin/sh
- -c
- >-
echo "$(POSTGRES_USER)" > creds/db.creds;
echo "$(POSTGRES_PASSWORD)" >> creds/db.creds;
echo "$(SERVICE1_TOKEN)" > creds/service1.creds;
echo "$(SERVICE2_TOKEN)" > creds/service2.creds;
echo "$(SERVICE3_TOKEN)" > creds/service3.creds;
python3 -u main.py;
echo "Job finished with exit code $?";
env:
- name: POSTGRES_DB
valueFrom:
secretKeyRef:
name: postgres-secret
key: POSTGRES_DB
- name: POSTGRES_USER
valueFrom:
secretKeyRef:
name: postgres-secret
key: POSTGRES_USER
- name: POSTGRES_PASSWORD
valueFrom:
secretKeyRef:
name: postgres-secret
key: POSTGRES_PASSWORD
- name: SERVICE1_TOKEN
valueFrom:
secretKeyRef:
name: api-tokens-secret
key: SERVICE1_TOKEN
- name: SERVICE2_TOKEN
valueFrom:
secretKeyRef:
name: api-tokens-secret
key: SERVICE2_TOKEN
- name: SERVICE3_TOKEN
valueFrom:
secretKeyRef:
name: api-tokens-secret
key: SERVICE3_TOKEN
restartPolicy: OnFailure
serviceAccountName: ""
status: {}
Now, as I mentioned Istio is also a part of this picture so I have a Virtual service for the dashboard since it should be accessible outside of the cluster, but that's it.
With all of that out of the way, here's what I've done to try and solve this, myself:
Confirm the CronJob is using the correct connection settings (i.e. host, database name, username, and password) for connecting to the database.
For this, I added echo statements to the CronJob deployment showing the username and password (I know, I know) and they were the expected values. I also know those were the correct connection settings for the database because I used them verbatim to connect the dashboard to the database, which gave a successful connection.
The data source settings for the Grafana dashboard:
The error message from the Python application (shown in the ArgoCD logs for the container):
Thinking Istio might be causing this problem, I tried disabling Istio side-car injection for the CronJob resource (by adding this annotation to the metadata.annotations section: sidecar.istio.io/inject: false) but the annotation never actually showed up in the Argo logs and no change was observed when the CronJob was running.
I tried kubectl execing into the CronJob container that was running the Python script to debug more but was never actually able to since the container exited as soon as the connection error occurs.
That said, I've been banging my head into a wall for long enough on this. Could anyone spot what I might be missing and point me in the right direction, please?

I think the problem is that your pod tries to connect to the database before the istio sidecar is ready. And thus the connection can't be established.
Istio runs an init container that configures the pods route table so all traffic is routed through the sidecar. So if the sidecar isn't running and the other pod tries to connect to the db, no connection can be established.
There are two solutions.
First your job could wait for eg 30 seconds before calling main.py with some sleep command.
Alternatively you could enable holdApplicationUntilProxyStarts. By this main container will not start until the sidecar is running.

How to use kubernetes secrets in nodejs application?

I have one kubernetes cluster on gcp, running my express and node.js application, operating CRUD operations with MongoDB.
I created one secret, containing username and password,
connecting with mongoDB specifiedcified secret as environment in my kubernetes yml file.
Now My question is "How to access that username and password
in node js application for connecting mongoDB".
I tried process.env.SECRET_USERNAME and process.env.SECRET_PASSWORD
in Node.JS application, it is throwing undefined.
Any idea ll'be appreciated .
Secret.yaml
apiVersion: v1
data:
password: pppppppppppp==
username: uuuuuuuuuuuu==
kind: Secret
metadata:
creationTimestamp: 2018-07-11T11:43:25Z
name: test-mongodb-secret
namespace: default
resourceVersion: "00999"
selfLink: /api-path-to/secrets/test-mongodb-secret
uid: 0900909-9090saiaa00-9dasd0aisa-as0a0s-
type: Opaque
kubernetes.yaml
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
annotations:deployment.kubernetes.io/
revision: "4"
creationTimestamp: 2018-07-11T11:09:45Z
generation: 5
labels:
name: test
name: test
namespace: default
resourceVersion: "90909"
selfLink: /api-path-to/default/deployments/test
uid: htff50d-8gfhfa-11egfg-9gf1-42010gffgh0002a
spec:
replicas: 1
selector:
matchLabels:
name: test
strategy:
rollingUpdate:
maxSurge: 1
maxUnavailable: 1
type: RollingUpdate
template:
metadata:
creationTimestamp: null
labels:
name: test
spec:
containers:
- env:
- name: SECRET_USERNAME
valueFrom:
secretKeyRef:
key: username
name: test-mongodb-secret
- name: SECRET_PASSWORD
valueFrom:
secretKeyRef:
key: password
name: test-mongodb-secret
image: gcr-image/env-test_node:latest
imagePullPolicy: Always
name: env-test-node
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
dnsPolicy: ClusterFirst
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
terminationGracePeriodSeconds: 30
status:
availableReplicas: 1
conditions:
- lastTransitionTime: 2018-07-11T11:10:18Z
lastUpdateTime: 2018-07-11T11:10:18Z
message: Deployment has minimum availability.
reason: MinimumReplicasAvailable
status: "True"
type: Available
observedGeneration: 5
readyReplicas: 1
replicas: 1
updatedReplicas: 1

Yourkubernetes.yaml file specifies which environment variable to store your secret so it is accessible by apps in that namespace.
Using kubectl secrets cli interface you can upload your secret.
kubectl create secret generic -n node-app test-mongodb-secret --from-literal=username=a-username --from-literal=password=a-secret-password
(the namespace arg -n node-app is optional, else it will uplaod to the default namespace)
After running this command, you can check your kube dashboard to see that the secret has been save
Then from you node app, access the environment variable process.env.SECRET_PASSWORD
Perhaps in your case the secretes are created in the wrong namespace hence why undefined in yourapplication.
EDIT 1
Your indentation for container.env seems to be wrong
apiVersion: v1
kind: Pod
metadata:
name: secret-env-pod
spec:
containers:
- name: mycontainer
image: redis
env:
- name: SECRET_USERNAME
valueFrom:
secretKeyRef:
name: mysecret
key: username
- name: SECRET_PASSWORD
valueFrom:
secretKeyRef:
name: mysecret
key: password
restartPolicy: Never

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Kubernetes CronJob Failure to start a job "Timeout" and "Job already exists" - cron

It turned out to happen because there is an auto injector by Istio initializer. Once I disabled Istio initializer injection for cronjobs, it works fine.

Related

Apache Flink Operator - enable azure-fs-hadoop

Using a PV in an OpenShift 3 cron job

Nextcloud with Replicas on Azure Kubernetes - Failing to Mount Azure Files ReadWriteMany Volume

Application running in Kubernetes cron job does not connect to database in same Kubernetes cluster

How to use kubernetes secrets in nodejs application?

Categories

Resources