Why doesn't my GKE cronjob exit after running `echo`? - cron

I'm trying to run a thor task from my Rails app's server image.
The cronjob runs one time, but it never exits.
I've tested with the example "hello world" job and that seems to work:
# hello-world-cronjob.yaml
apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: hello
spec:
schedule: "*/1 * * * *"
jobTemplate:
spec:
template:
spec:
containers:
- name: hello
image: busybox
args:
- /bin/sh
- -c
- date; echo "Hello, World!"
restartPolicy: OnFailure
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
hello-1557392100-2hnjc 0/1
Completed 0 3m
hello-1557392160-58mwb 0/1 Completed 0 2m
hello-1557392220-qbstx 0/1
send-reminders-events-starting-in-two-hours-1557391560-2qwtv 1/2 Running 0 11m
send-reminders-events-starting-in-two-hours-1557391740-9dm6q 1/2 Running 0 8m
send-reminders-events-starting-in-two-hours-1557391800-2tjdt 1/2 Running 0 8m
send-reminders-events-starting-in-two-hours-1557391860-q6qgb 1/2 Running 0 7m
send-reminders-events-starting-in-two-hours-1557391920-j9kdn 1/2 Running 0 6m
send-reminders-events-starting-in-two-hours-1557391980-sqg28 1/2 Running 0 5m
send-reminders-events-starting-in-two-hours-1557392040-twr4t 1/2 Running 0 4m
send-reminders-events-starting-in-two-hours-1557392100-skzbz 1/2 Running 0 3m
send-reminders-events-starting-in-two-hours-1557392160-2qgxl 1/2 Running 0 2m
send-reminders-events-starting-in-two-hours-1557392220-z7tk4 1/2 Running 0 1m
send-reminders-users-that-has-not-replied-1557391560-tmlnb 1/2 Running 0 11m
This is my cronjob.yaml. I only see one echo, then it hangs forever. Any idea why?
# cronjob.yaml
apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: send-reminders-events-starting-in-two-hours
spec:
schedule: "*/1 * * * *"
jobTemplate:
spec:
template:
spec:
restartPolicy: OnFailure
volumes:
- name: cloudsql-instance-credentials
secret:
secretName: cloudsql-instance-credentials
containers:
- name: events-starting-in-two-hours
image: eu.gcr.io/example/pepper:latest
args:
- /bin/sh
- -c
- echo "triggering send-reminders-events-starting-in-two-hours ======="
Dockerfile
FROM ruby:2.6.2-slim-stretch
COPY Gemfile* /app/
WORKDIR /app
RUN gem update --system
RUN gem install bundler
RUN bundle install --jobs 20 --retry 5
# Set the timezone to Stockholm
ENV TZ=Europe/Stockholm
RUN ln -snf /usr/share/zoneinfo/$TZ /etc/localtime && echo $TZ > /etc/timezone
COPY . /app/
RUN echo echo "RAILS_ENV is \$RAILS_ENV" >> ~/.bashrc
WORKDIR /app
CMD ["/usr/local/bundle/bin/rails", "s", "-b", "0.0.0.0", "-p", "3001"]

I fixed the issue by enabling private IP on the CloudSQL instance and connected directly to that IP from my cronjob. That way I could skip the cloudsql-proxy sidecar.
Make sure you have enabled VPC-native support for your GKE cluster also.

Related

Kubernetes - Wait for available does not work as expected

I'm using a Gitlab CI/CD pipeline to deploy a few containers to a Kubernetes environment. The script (excerpt) basically just deploys a few resources like this:
.deploy-base:
# For deploying, we need an image that can interact with k8s
image:
name: registry.my-org.de/public-projects/kubectl-gettext:1-21-9
entrypoint: ['']
variables:
# Define k8s namespace and domain used for deployment:
NS: $KUBE_NAMESPACE
before_script:
- echo $NS
- cp $KUBECONFIG ~/.kube/config
- export CI_ENVIRONMENT_DOMAIN=$(echo "$CI_ENVIRONMENT_URL" | sed -e 's/[^/]*\/\/\([^#]*#\)\?\([^:/]*\).*/\2/')
script:
- kubectl config get-contexts
- kubectl config use-context org-it-infrastructure/org-fastapi-backend:azure-aks-agent
# Make Docker credentials available for deployment:
- kubectl create secret -n $NS docker-registry gitlab-registry-secret --docker-server=$CI_REGISTRY --docker-username=$CI_DEPLOY_USER --docker-password=$CI_DEPLOY_PASSWORD --docker-email=$GITLAB_USER_EMAIL -o yaml --dry-run | kubectl replace --force -n $NS -f -
- kubectl -n $NS patch serviceaccount default -p '{"imagePullSecrets":[{"name":"gitlab-registry-secret"}]}'
# Create config map for container env variables
- envsubst < dev/config-map.yml | kubectl -n $NS replace --force -f -
# Start and expose deployment, set up ingress:
- envsubst < dev/backend-deploy.yml | kubectl -n $NS replace --force -f -
# Set up ingress with env var expansion from template:
- envsubst < dev/ingress.yml | kubectl -n $NS replace --force -f -
# Wait for pod
- kubectl -n $NS wait --for=condition=available deployment/backend --timeout=180s
The last command should wait for the deployment to become available and return as soon as it does. Since the latest Gitlab 15 update and the switch from certificate based authentication vs agent based authentication towards K8s it doesn't work anymore and yields the following error message:
error: timed out waiting for the condition on deployments/backend
It also takes way longer than the specified 180s, its more like 15-20 minutes.
The application is available and works as expected, also the deloyment looks good:
$kubectl -n org-fastapi-backend-development describe deployment backend
Name: backend
Namespace: org-fastapi-backend-development
CreationTimestamp: Thu, 02 Jun 2022 14:15:18 +0200
Labels: app=app
Annotations: deployment.kubernetes.io/revision: 1
Selector: app=app
Replicas: 1 desired | 1 updated | 1 total | 1 available | 0 unavailable
StrategyType: RollingUpdate
MinReadySeconds: 0
RollingUpdateStrategy: 25% max unavailable, 25% max surge
Pod Template:
Labels: app=app
Containers:
app:
Image: registry.my-org.de/org-it-infrastructure/org-fastapi-backend:development
Port: 80/TCP
Host Port: 0/TCP
Environment Variables from:
backend-config ConfigMap Optional: false
Environment: <none>
Mounts: <none>
Volumes: <none>
Conditions:
Type Status Reason
---- ------ ------
Available True MinimumReplicasAvailable
Progressing True NewReplicaSetAvailable
OldReplicaSets: <none>
NewReplicaSet: backend-6bb4f4bcd5 (1/1 replicas created)
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal ScalingReplicaSet 83s deployment-controller Scaled up replica set backend-6bb4f4bcd5 to 1
As you can see, the Condition Available has the status set to true, still the wait command does not return positively.
Both kubectl as well as the Kubernetes environment (its Azure AKS) is running version 1.21.9

Taking Thread dump/ Heap dump of Azure Kubernetes pods

We are running our kafka stream application on Azure kubernetes written in java. We are new to kubernetes. To debug an issue we want to take thread dump of the running pod.
Below are the steps we are following to take the dump.
Building our application with below docker file.
FROM mcr.microsoft.com/java/jdk:11-zulu-alpine
RUN apk update && apk add --no-cache gcompat
RUN addgroup -S user1 && adduser -S user1 -G user1
USER user1
WORKDIR .
COPY target/my-application-1.0.0.0.jar .
Submitting the image with below deployment yaml file
apiVersion: apps/v1
kind: Deployment
metadata:
name: my-application-v1.0.0.0
spec:
replicas: 1
selector:
matchLabels:
name: my-application-pod
app: my-application-app
template:
metadata:
name: my-application-pod
labels:
name: my-application-pod
app: my-application-app
spec:
nodeSelector:
agentpool: agentpool1
containers:
- name: my-application-0
image: myregistry.azurecr.io/my-application:v1.0.0.0
imagePullPolicy: Always
command: ["java","-jar","my-application-1.0.0.0.jar","input1","$(connection_string)"]
env:
- name: connection_string
valueFrom:
configMapKeyRef:
name: my-application-configmap
key: connectionString
resources:
limits:
cpu: "4"
requests:
cpu: "0.5"
To get a shell to a Running container you can run the command below:
kubectl exec -it <POD_NAME> -- sh
To get thread dump running below command
jstack PID > threadDump.tdump
but getting permission denied error
Can some one suggest how to solve this or steps to take thread/heap dumps.
Thanks in advance
Since you likely need the thread dump locally, you can bypass creating the file in the pod and just stream it directly to a file on your local computer:
kubectl exec -i POD_NAME -- jstack 1 > threadDump.tdump
If your thread dumps are large you may want to consider piping to pv first to get a nice progress bar.

Kubernetes Node 14 pod restarts and terminated with exit code 0

I have an Angular Universal application with the following Dockerfile:
FROM node:14-alpine
WORKDIR /app
COPY package.json /app
COPY dist/webapp /app/dist/webapp
ENV NODE_ENV "production"
ENV PORT 80
EXPOSE 80
CMD ["npm", "run", "serve:ssr"]
And I can deploy it to a Kubernetes cluster just fine but it keeps getting restarted every 10 minutes or so:
NAME READY STATUS RESTARTS AGE
api-xxxxxxxxx-xxxxx 1/1 Running 0 48m
webapp-xxxxxxxxxx-xxxxx 1/1 Running 232 5d19h
Pod logs are clean and when I describe the pod I just see:
Last State: Terminated
Reason: Completed
Exit Code: 0
Started: Tue, 22 Sep 2020 15:58:27 -0300
Finished: Tue, 22 Sep 2020 16:20:31 -0300
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Created 3m31s (x233 over 5d19h) kubelet, pool-xxxxxxxxx-xxxxx Created container webapp
Normal Started 3m31s (x233 over 5d19h) kubelet, pool-xxxxxxxxx-xxxxx Started container webapp
Normal Pulled 3m31s (x232 over 5d18h) kubelet, pool-xxxxxxxxx-xxxxx Container image "registry.gitlab.com/..." already present on machine
This is my deployment.yaml:
apiVersion: apps/v1
kind: Deployment
metadata:
name: $CI_PROJECT_NAME
namespace: $KUBE_NAMESPACE
labels:
app: webapp
tier: frontend
spec:
replicas: 1
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 0
maxUnavailable: 1
selector:
matchLabels:
app: webapp
tier: frontend
template:
metadata:
labels:
app: webapp
tier: frontend
spec:
imagePullSecrets:
- name: gitlab-registry
containers:
- name: $CI_PROJECT_NAME
image: $IMAGE_TAG
ports:
- containerPort: 80
How can I tell the reason it keeps restarting? Thanks!

Passing in Arguments to Kubernetes Deployment Pod

I'm trying to create a kubernetes deployment that creates a pod.
I want this pod to run the command "cron start" on creation so that cron is automatically initialized.
This is currently how I am trying to run the command though it clearly isn't working (kubernetes_deployment.yaml)
- containerPort: 8080
command: [ "/bin/sh" ]
args: ["cron start"]
Thank you in advance :)
Maybe you could use Kubernetes CronJobs.
You can set a cron expression.
https://kubernetes.io/es/docs/concepts/workloads/controllers/cron-jobs/
apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: hello
spec:
schedule: "*/1 * * * *"
jobTemplate:
spec:
template:
spec:
containers:
- name: hello
image: busybox
args:
- /bin/sh
- -c
- date; echo Hello from the Kubernetes cluster
restartPolicy: OnFailure

Delete all k8s pods created by a cronjob

I have a cronjob that runs every 10 minutes. So every 10 minutes, a new pod is created. After a day, I have a lot of completed pods (not jobs, just one cronjob exists). Is there way to automatically get rid of them?
That's a work for labels.
Use them on your CronJob and delete completed pods using a selector (-l flag).
For example:
apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: my-cron
spec:
schedule: "*/1 * * * *"
jobTemplate:
spec:
template:
metadata:
labels:
app: periodic-batch-job
is-cron: "true"
spec:
containers:
- name: cron
image: your_image
imagePullPolicy: IfNotPresent
restartPolicy: OnFailure
Delete all cron-labeled pods with:
kubect delete pod -l is-cron
Specifically for my situation, my pods where not fully terminating as I was running one container with the actual job, another with cloud sql proxy, and cloud sql proxy was preventing the pod from completing successfully.
The fix was to kill the proxy process after 30 seconds (my jobs typically take couple of seconds). Then once the job completes, successfulJobsHistoryLimit on the cronjob kicks in and keeps (by default) only 3 last pods.
- name: cloudsql-proxy
image: gcr.io/cloudsql-docker/gce-proxy:1.11
command: ["sh", "-c"]
args:
- /cloud_sql_proxy -instances=myinstance=tcp:5432 -credential_file=/secrets/cloudsql/credentials.json & pid=$! && (sleep 30 && kill -9 $pid 2>/dev/null)

Resources