Access CronWorkflow name for metrics labels - kustomize

I have a CronWorkflow that sends the following metric:
apiVersion: argoproj.io/v1alpha1
kind: CronWorkflow
metadata:
name: my-cron-wf
spec:
schedule: "0 * * * *"
suspend: false
workflowSpec:
metrics:
prometheus:
- name: wf_exec_duration_gauge
help: "Duration gauge by workflow name and status"
labels:
- key: name
value: my-cron-wf
- key: status
value: "{{workflow.status}}"
gauge:
value: "{{workflow.duration}}"
I would like to populate the metric's label name with the CronWorkflow name using a variable in order to avoid copying it but I didn't find a variable for it.
I tried to use {{workflow.name}} but it equals to the generated workflow name and not to the desired CronWorkflow name.
I use Kustomize to manage argo workflows resources so if there is a kustomize-way to achieve this it would be great as well.

Argo Workflows automatically adds the name of the Cron Workflow as a label on the workflow. That label is accessible as a variable.
apiVersion: argoproj.io/v1alpha1
kind: CronWorkflow
metadata:
name: my-cron-wf
spec:
schedule: "0 * * * *"
suspend: false
workflowSpec:
metrics:
prometheus:
- name: wf_exec_duration_gauge
help: "Duration gauge by workflow name and status"
labels:
- key: name
value: "{{workflow.labels.workflows.argoproj.io/cron-workflow}}"
- key: status
value: "{{workflow.status}}"
gauge:
value: "{{workflow.duration}}"

Related

Using a PV in an OpenShift 3 cron job

I have been able to successfully create a cron job for my OpenShift 3 project. The project is a lift and shift from an existing Linux web server. Part of the existing application requires several cron tasks to run. The one I am looking at the moment is a daily update to the applications database. As part of the execution of the cron job I want to write to a log file. There is already a PV/PVC defined for the main application and I was intending to use that hold the logs for my cron job but it seems the cron job is not being provided access to the PV.
I am using the following inProgress.yml for the definition of the cron job
apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: in-progress
spec:
schedule: "*/5 * * * *"
concurrencyPolicy: "Replace"
startingDeadlineSeconds: 200
suspend: false
successfulJobsHistoryLimit: 3
failedJobsHistoryLimit: 1
jobTemplate:
spec:
template:
metadata:
labels:
parent: "cronjobInProgress"
spec:
containers:
- name: in-progress
image: <image name>
command: ["php", "inProgress.php"]
restartPolicy: OnFailure
volumeMounts:
- mountPath: /data-pv
name: log-vol
volumes:
- name: log-vol
persistentVolumeClaim:
claimName: data-pv
I am using the following command to create the cron job
oc create -f inProgress.yml
PHP Warning: fopen(/data-pv/logs/2022-04-27-app.log): failed to open
stream: No such file or directory in
/opt/app-root/src/errorHandler.php on line 75 WARNING: [2] mkdir():
Permission denied, line 80 in file
/opt/app-root/src/errorLogger.php WARNING: [2]
fopen(/data-pv/logs/2022-04-27-inprogress.log): failed to open stream:
No such file or directory, line 60 in file
/opt/app-root/src/errorLogger.php
Looking at the yml for pod that is executed, there is no mention of data-pv - it appears as though secret volumeMount, which has been added by OpenShift, is removing any further volumeMounts.
apiVersion: v1 kind: Pod metadata: annotations:
openshift.io/scc: restricted creationTimestamp: '2022-04-27T13:25:04Z' generateName: in-progress-1651065900- ...
volumeMounts:
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: default-token-n9jsw
readOnly: true ... volumes:
- name: default-token-n9jsw
secret:
defaultMode: 420
secretName: default-token-n9jsw
How can I access the PV from within the cron job?
Your manifest is incorrect. The volumes block needs to be part of the spec.jobTemplate.spec.template.spec, that is, it needs to be indented at the same level as spec.jobTemplate.spec.template.spec.containers. In its current position it is invisible to OpenShift. See e.g. this pod example.
Similarly, volumeMounts and restartPolicy are arguments to the container block, and need to be indented accordingly.
apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: in-progress
spec:
schedule: '*/5 * * * *'
concurrencyPolicy: Replace
startingDeadlineSeconds: 200
suspend: false
successfulJobsHistoryLimit: 3
failedJobsHistoryLimit: 1
jobTemplate:
spec:
template:
metadata:
labels:
parent: cronjobInProgress
spec:
containers:
- name: in-progress
image: <image name>
command:
- php
- inProgress.php
restartPolicy: OnFailure
volumeMounts:
- mountPath: /data-pv
name: log-vol
volumes:
- name: log-vol
persistentVolumeClaim:
claimName: data-pv
Thanks for the informative response larsks.
OpenShift displayed the following when I copied your manifest suggestions
$ oc create -f InProgress.yml The CronJob "in-progress" is invalid:
spec.jobTemplate.spec.template.spec.restartPolicy: Unsupported value:
"Always": supported values: "OnFailure", " Never"
As your answer was very helpful I was able to resolve this problem by moving restartPolicy: OnFailure so the final manifest is below.
apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: in-progress
spec:
schedule: "*/5 * * * *"
concurrencyPolicy: "Replace"
startingDeadlineSeconds: 200
suspend: false
successfulJobsHistoryLimit: 3
failedJobsHistoryLimit: 1
jobTemplate:
spec:
template:
metadata:
labels:
parent: "cronjobInProgress"
spec:
restartPolicy: OnFailure
containers:
- name: in-progress
image: <image name>
command: ["php", "updateToInProgress.php"]
volumeMounts:
- mountPath: /data-pv
name: log-vol
volumes:
- name: log-vol
persistentVolumeClaim:
claimName: data-pv

Event Hub triggered Azure Function running on AKS with KEDA does not scale out

I have deployed an Event Hub triggered Azure Function written in Java on AKS. The function should scale out using KEDA.
The function is correctly triggerd and working but it's not scaling out when the load increases. I have added sleep calls to the function implementation to make sure it's not burning through the events too fast and should be forced to scale out but this did not show any change as well.
kubectl get hpa shows the following output
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
keda-hpa-eventlogger Deployment/eventlogger 64/64 (avg) 1 20 1 3m41s
This seems to be a first indicator that something is not correct as i assume the first number in the targets column is the number of unprocessed events in event hub. This stays the same no matter how many events i pump into the hub.
The Function was deployed using the following Kubernetes Deployment Manifest
data:
AzureWebJobsStorage: <removed>
FUNCTIONS_WORKER_RUNTIME: amF2YQ==
EventHubConnectionString: <removed>
apiVersion: v1
kind: Secret
metadata:
name: eventlogger
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: eventlogger
labels:
app: eventlogger
spec:
selector:
matchLabels:
app: eventlogger
template:
metadata:
labels:
app: eventlogger
spec:
containers:
- name: eventlogger
image: <removed>
env:
- name: AzureFunctionsJobHost__functions__0
value: eventloggerHandler
envFrom:
- secretRef:
name: eventlogger
readinessProbe:
failureThreshold: 3
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 240
httpGet:
path: /
port: 80
scheme: HTTP
startupProbe:
failureThreshold: 3
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 240
httpGet:
path: /
port: 80
scheme: HTTP
---
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: eventlogger
labels:
app: eventlogger
spec:
scaleTargetRef:
name: eventlogger
pollingInterval: 5
cooldownPeriod: 5
minReplicaCount: 0
maxReplicaCount: 20
triggers:
- type: azure-eventhub
metadata:
storageConnectionFromEnv: AzureWebJobsStorage
connectionFromEnv: EventHubConnectionString
---
The Connection String of Event Hub contains the "EntityPath=" Section as described in the KEDA Event Hub Scaler Documentation and has Manage-Permissions on the Event Hub Namespace.
The output of kubectl describe ScaledObject is
Name: eventlogger
Namespace: default
Labels: app=eventlogger
scaledobject.keda.sh/name=eventlogger
Annotations: <none>
API Version: keda.sh/v1alpha1
Kind: ScaledObject
Metadata:
Creation Timestamp: 2022-04-17T10:30:36Z
Finalizers:
finalizer.keda.sh
Generation: 1
Managed Fields:
API Version: keda.sh/v1alpha1
Fields Type: FieldsV1
fieldsV1:
f:metadata:
f:annotations:
.:
f:kubectl.kubernetes.io/last-applied-configuration:
f:labels:
.:
f:app:
f:spec:
.:
f:cooldownPeriod:
f:maxReplicaCount:
f:minReplicaCount:
f:pollingInterval:
f:scaleTargetRef:
.:
f:name:
f:triggers:
Manager: kubectl-client-side-apply
Operation: Update
Time: 2022-04-17T10:30:36Z
API Version: keda.sh/v1alpha1
Fields Type: FieldsV1
fieldsV1:
f:metadata:
f:finalizers:
.:
v:"finalizer.keda.sh":
f:labels:
f:scaledobject.keda.sh/name:
f:status:
.:
f:conditions:
f:externalMetricNames:
f:lastActiveTime:
f:originalReplicaCount:
f:scaleTargetGVKR:
.:
f:group:
f:kind:
f:resource:
f:version:
f:scaleTargetKind:
Manager: keda
Operation: Update
Time: 2022-04-17T10:30:37Z
Resource Version: 1775052
UID: 3b6a68c1-c3b9-4cdf-b5d5-41a9721ac661
Spec:
Cooldown Period: 5
Max Replica Count: 20
Min Replica Count: 0
Polling Interval: 5
Scale Target Ref:
Name: eventlogger
Triggers:
Metadata:
Connection From Env: EventHubConnectionString
Storage Connection From Env: AzureWebJobsStorage
Type: azure-eventhub
Status:
Conditions:
Message: ScaledObject is defined correctly and is ready for scaling
Reason: ScaledObjectReady
Status: False
Type: Ready
Message: Scaling is performed because triggers are active
Reason: ScalerActive
Status: True
Type: Active
Status: Unknown
Type: Fallback
External Metric Names:
s0-azure-eventhub-$Default
Last Active Time: 2022-04-17T10:30:47Z
Original Replica Count: 1
Scale Target GVKR:
Group: apps
Kind: Deployment
Resource: deployments
Version: v1
Scale Target Kind: apps/v1.Deployment
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal KEDAScalersStarted 10s keda-operator Started scalers watch
Normal ScaledObjectReady 10s keda-operator ScaledObject is ready for scaling
So i'm a bit stucked as i don't see any errors but it's still not behaving as expected.
Versions:
Kubernetes version: 1.21.9
KEDA Version: 2.6.1 installed using kubectl apply -f https://github.com/kedacore/keda/releases/download/v2.6.1/keda-2.6.1.yaml
Azure Functions using Java 11 and extensionBundle in host.json is configured using version [2.8.4, 3.0.0)
Was able to find a solution to the problem.
Event Hub triggered Azure Functions deployed on AKS show the same scaling characteristics as Azure Functions on App Service show:
You only get one consumer per partition to allow for ordering per partition.
This characteristic overrules maxReplicaCount in the Kubernetes Deployment Manifest.
So to solve my own issue: By increasing the partitions for Event Hub i get a pod per partition and KEDA scales the workload as expected.

FluXCD Helm deployment from Azure ACR - no chart name found error

I am attempting to deploy a Helm chart to AKS using FluxCD. The chart has been pushed to Azure ACR using the Helm cli - "helm push ...". The chart is declared in the ACR as helm/release-services:0.1.0
I am receiving the following error after a Flux reconcile:
'chart pull error: failed to get chart version for remote reference:
no chart name found'
with helm-controller logs as follows
{"level":"info","ts":"2022-02-07T12:40:18.121Z","logger":"controller.helmrelease","msg":"HelmChart 'flux-system/release-services-test-release-services' is not ready","reconciler group":"helm.toolkit.fluxcd.io","reconciler kind":"HelmRelease","name":"release-services","namespace":"release-services-test"}
{"level":"info","ts":"2022-02-07T12:40:18.135Z","logger":"controller.helmrelease","msg":"reconcilation finished in 15.458307ms, next run in 5m0s","reconciler group":"helm.toolkit.fluxcd.io","reconciler kind":"HelmRelease","name":"release-services","namespace":"release-services-test"}
Below is the HelmChart resource in AKS:
apiVersion: source.toolkit.fluxcd.io/v1beta1
kind: HelmChart
metadata:
creationTimestamp: "2022-02-07T07:30:16Z"
finalizers:
- finalizers.fluxcd.io
generation: 1
name: release-services-test-release-services
namespace: flux-system
resourceVersion: "105266699"
selfLink: /apis/source.toolkit.fluxcd.io/v1beta1/namespaces/flux-system/helmcharts/release-services-test-release-services
uid: e4820a70-8885-44a1-8dfd-0e2bf7256915
spec:
chart: release-services
interval: 5m0s
reconcileStrategy: ChartVersion
sourceRef:
kind: HelmRepository
name: psbombb-helm-acr-dev
version: '>=0.1.0'
status:
conditions:
- lastTransitionTime: "2022-02-07T11:02:49Z"
message: 'chart pull error: failed to get chart version for remote reference:
no chart name found'
reason: ChartPullFailed
status: "False"
type: Ready
observedGeneration: 1
and the HelmRelease is as follows
apiVersion: helm.toolkit.fluxcd.io/v2beta1
kind: HelmRelease
metadata:
creationTimestamp: "2022-02-07T04:34:14Z"
finalizers:
- finalizers.fluxcd.io
generation: 9
labels:
kustomize.toolkit.fluxcd.io/name: apps
kustomize.toolkit.fluxcd.io/namespace: flux-system
name: release-services
namespace: release-services-test
resourceVersion: "105341484"
selfLink: /apis/helm.toolkit.fluxcd.io/v2beta1/namespaces/release-services-test/helmreleases/release-services
uid: 6a6e5f5c-951d-4655-9c15-fa9fe7421a04
spec:
chart:
spec:
chart: release-services
reconcileStrategy: ChartVersion
sourceRef:
kind: HelmRepository
name: psbombb-helm-acr-dev
namespace: flux-system
version: '>=0.1.0'
install:
remediation:
retries: 3
interval: 5m
releaseName: release-services
timeout: 12m
values:
image:
name: release-services
pullPolicy: IfNotPresent
registry: <repository>.azurecr.io
repository: <repository>.azurecr.io/helm/release-services
tag: 0.1.0
postgres:
secret:
create: false
existingName: release-services-secrets
status:
conditions:
- lastTransitionTime: "2022-02-07T08:27:13Z"
message: HelmChart 'flux-system/release-services-test-release-services' is not
ready
reason: ArtifactFailed
status: "False"
type: Ready
failures: 50
helmChart: flux-system/release-services-test-release-services
observedGeneration: 9
Is there anything I am missing that anyone can spot for me please?
Thank you kindly
I think your issue is that the Azure Container Registry stores Helm Charts as OCI Artifacts.
The Flux source controller will pull the index.yaml from a HTTP Helm Chart repo to look for tags and this is not working with an OCI registry.
Here is the GitHub issue for this were you can see that the Flux guys will work on this as of now the OCI Feature is stable with Helm 3.8.0.

alertmanagerconfig matches about namespace not work

my alertmanagerconfigs:
apiVersion: monitoring.coreos.com/v1alpha1
kind: AlertmanagerConfig
metadata:
name: configlinkflowalertmanager
labels:
alertmanagerConfig: linkflowAlertmanager
spec:
route:
groupBy: ['alertname']
groupWait: 30s
groupInterval: 5m
repeatInterval: 12h
receiver: 'webhook'
matchers:
- name: alertname
value: KubePodCrashLooping
- name: namespace
value: linkflow
receivers:
- name: 'webhook'
webhookConfigs:
- url: 'http://xxxxx:1194/'
the web shows: namespace become monitoring ? why? and alerts only in monitoring can send out
can I send other namespace or all namespace alerts?
route:
receiver: Default
group_by:
- namespace
continue: false
routes:
- receiver: monitoring-configlinkflowalertmanager-webhook
group_by:
- namespace
match:
alertname: KubePodCrashLooping
namespace: monitoring
continue: true
group_wait: 30s
group_interval: 5m
repeat_interval: 12h
This is a feature:
That's kind of the point of the feature, otherwise it's possible that alertmanager configs in different namespaces conflict and Alertmanager won't be able to start.
There is an Issue (#3737) to make namespace label matching optional / configurable. The related PR still has to be merged (as of Today), but it will allow you to define global alerts.

Kubernetes CronJob Failure to start a job "Timeout" and "Job already exists"

I am trying to run a cronjob in kubernetes, but keep to having these two errors:
type: 'Warning' reason: 'FailedCreate' Error creating job: jobs.batch "dev-cron-1516702680" already exists
and
type: 'Warning' reason: 'FailedCreate' Error creating job: Timeout: request did not complete within allowed duration
Below are my cronjob yaml
apiVersion: batch/v1beta1
kind: CronJob
metadata:
creationTimestamp: 2018-01-23T09:45:10Z
name: dev-cron
namespace: dev
resourceVersion: "16768201"
selfLink: /apis/batch/v1beta1/namespaces/dev/cronjobs/dev-cron
uid: 1a32eb94-0022-11e8-9256-065eb556d6a2
spec:
concurrencyPolicy: Allow
failedJobsHistoryLimit: 1
jobTemplate:
metadata:
creationTimestamp: null
spec:
template:
metadata:
creationTimestamp: null
spec:
containers:
- args:
- for country in th;
- do
- 'curl -X POST -d "{'footprint':'xxxx-xxxx'}"-H "Content-Type: application/json" https://dev.xxx.com/xxx/xxx'
- done
image: appropriate/curl:latest
imagePullPolicy: Always
name: cron
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
dnsPolicy: ClusterFirst
restartPolicy: Never
schedulerName: default-scheduler
securityContext: {}
terminationGracePeriodSeconds: 30
schedule: '* * * * *'
startingDeadlineSeconds: 10
successfulJobsHistoryLimit: 3
suspend: false
status: {}
I am not sure why this is keep happening. I am running Kubernetes version 1.9.1, in AWS cluster. Any idea why?
It turned out to happen because there is an auto injector by Istio initializer. Once I disabled Istio initializer injection for cronjobs, it works fine.

Resources