How to have different messages for FIRING and RESOLVED status in Prometheus Alertmanager? - prometheus-alertmanager

Consider the following alerting template that creates alerts on all jobs in .Values.jobs and triggers an alert when a job with given job_name has not succeeded for a given interval:
{{- range $job, $value := .Values.jobs }}
- alert: Long Time Since Job Success ({{ $key }})
expr: time() - max_over_time(max(mm_spark_succeeded{job_name="{{ $value.job_name }}"}) [168h:1m]) > 3600
labels:
severity: {{ $value.severity }}
slack: {{ $value.slack }}
annotations:
format: verbose
summary: "Long time since the job `{{ $job }}` was last successful."
exclude_labels: k8sCluster,region,env,severity
{{- end }}
This triggers an alert and sends a slack message whenever the status of the alert is FIRING or RESOLVED.
In my use case, I want to show the annotations only when the status was FIRING. For RESOLVED status, I wish to show the alert but without the annotations (i.e. without summary message). How do I select/ignore annotations based on status of the alert?
I tried following but it didn't work:
{{- range $job, $value := .Values.jobs }}
- alert: Long Time Since Job Success ({{ $key }})
expr: time() - max_over_time(max(mm_spark_succeeded{job_name="{{ $value.job_name }}"}) [168h:1m]) > 3600
labels:
severity: {{ $value.severity }}
slack: {{ $value.slack }}
{{- if eq .Status "firing"}}
annotations:
format: verbose
summary: "Long time since the job `{{ $job }}` was last successful."
exclude_labels: k8sCluster,region,env,severity
{{- end }}
{{- end }}
It throws an error: eq: invalid type for comparison

Related

Helm Chart: error converting YAML to JSON: yaml: mapping values are not allowed in this context

I am trying to install akv28s secrets using helm template but it fails, I am unable to diagnose the issue in helm, have tried online yaml validators but no help.
Using --debug flags renders me the expected manifest
values.yaml
akv2k8s:
enabled: true
vaults:
vaultcmms:
secretkey: secretvalue
secretkey1: secretvalue1
vaulttenant:
secretkey: secretvalue
secretkey1: secretvalue2
akv28s.yaml
{{- if .Values.akv2k8s.enabled -}}
{{- range $vault, $content := .Values.akv2k8s.vaults }}
{{- range $key, $value := $content }}
apiVersion: spv.no/v2beta1
kind: AzureKeyVaultSecret
spec:
vault: {{ $vault }}
name: {{ $key}}
object:
name: {{ $value}}
type: secret
{{- end }}
{{- end }}
{{- end }}
I was making a mistake by specifying the vault value in the wrong hierarchy
It should be like
spec:
vault:
name: {{ $vault }}
object:
name: {{ $value }}
type: secret
This solved my issue.

If condition does not work in state triggered by reactor

I am using an if condition utilizing grain item within a state which triggered by reactor.
and I got an error message Jinja variable 'dict object' has no attribute 'environment'
=================================================
REACTOR config:
cat /etc/salt/master.d/reactor.conf
reactor:
- 'my/custom/event':
- salt://reactor/test.sls
==============================
test.sls
cat /srv/salt/reactor/test.sls
sync_grains:
local.saltutil.sync_grains:
- tgt: {{ data['id'] }}
{% if grains['environment'] in ["prod", "dev", "migr"] %}
test_if_this_works:
local.state.apply:
- tgt: {{ data['id'] }}
- arg:
- dummy_state
{% endif %}
===================================
dummy_state/init.sls
cat /srv/salt/dummy_state/init.sls
create_a_directory:
file.directory:
- name: /tmp/my_test_dir
- user: root
- group: root
- makedirs: True
=================================================
salt 'salt-redhat-23.test.local' grains.item environment
salt-redhat-23.test.local:
----------
environment:
prod
=================================================
salt-redhat-23 ~]# cat /etc/salt/grains
role: MyServer
environment: prod
================================================
If I change the test.sls and use instead of custom grain a grain which salt-master is taking by default then it will works. Also it will work without the if condition in the state.
Do you know why this is happening?
Thank you all in advance.
Issue resolved.
You cannot use custom grains with Reactor directly, you need to call another state to be able to add condition there.
for instance:
cat /etc/salt/master.d/reactor.conf
reactor:
- 'my/custom/event':
- salt://reactor/test.sls
test.sls
# run a state using reactor
test_if_this_works:
local.state.apply:
- tgt: {{ data['id'] }}
- arg:
- reactor.execute
execute.sls
{% set tst = grains['environment'] %}
{% if tst in ['prod', 'dev', 'test', 'migr'] %}
create_a_directory:
file.directory:
- name: /tmp/my_test_dir
- user: root
- group: root
- makedirs: True
{% endif %}
this will work with the if condition, if you try to add the if statement on the test.sls it will not work.

Gitlab CICD sets wrong service url in the production environment

After production deployment the application has not the endpoint of the environment.url from the .gitlab-ci.yml, but a combination of the groupname, projectname and basedomain:
<groupname>-<projectname>.basedomain.
The Gitlab project belongs to a Gitlab group, which has an Kubernetes cluster. De group has a basedomain which is used in the .gitlab-ci.yml:
//part of .gitlab-ci.yml
...
apply production secret configuration:
stage: prepare-deploy
extends: .auto-deploy
needs: ["build", "generate production configuration"]
dependencies:
- generate production configuration
script:
- auto-deploy check_kube_domain
- auto-deploy download_chart
- auto-deploy ensure_namespace
- kubectl create secret generic tasker-secrets-development --from-file=config.tar --dry-run -o yaml | kubectl apply -f -
environment:
name: production
url: http://app.$KUBE_INGRESS_BASE_DOMAIN
action: prepare
rules:
- if: '$CI_COMMIT_BRANCH == "master"'
...
I expected http://app.$KUBE_INGRESS_BASE_DOMAIN as the endpoint for the application.
The Ingress (I removed the minio part):
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
name: {{ template "fullname" . }}
labels:
app: {{ template "appname" . }}
chart: "{{ .Chart.Name }}-{{ .Chart.Version| replace "+" "_" }}"
release: {{ .Release.Name }}
heritage: {{ .Release.Service }}
annotations:
cert-manager.io/cluster-issuer: {{ .Values.leIssuer }}
acme.cert-manager.io/http01-edit-in-place: "true"
{{- if .Values.ingress.annotations }}
{{ toYaml .Values.ingress.annotations | indent 4 }}
{{- end }}
{{- with .Values.ingress.modSecurity }}
{{- if .enabled }}
nginx.ingress.kubernetes.io/modsecurity-transaction-id: "$server_name-$request_id"
nginx.ingress.kubernetes.io/modsecurity-snippet: |
SecRuleEngine {{ .secRuleEngine | default "DetectionOnly" | title }}
{{- range $rule := .secRules }}
{{ (include "secrule" $rule) | indent 6 }}
{{- end }}
{{- end }}
{{- end }}
{{- if .Values.prometheus.metrics }}
nginx.ingress.kubernetes.io/server-snippet: |-
location /metrics {
deny all;
}
{{- end }}
spec:
{{- if .Values.ingress.tls.enabled }}
tls:
- hosts:
{{- if .Values.service.commonName }}
- {{ template "hostname" .Values.service.commonName }}
{{- end }}
- {{ template "hostname" .Values.service.url }} <<<<<<<<<<<<<<<<<<<
{{- if .Values.service.additionalHosts }}
{{- range $host := .Values.service.additionalHosts }}
- {{ $host }}
{{- end -}}
{{- end }}
secretName: {{ .Values.ingress.tls.secretName | default (printf "%s-cert" (include "fullname" .)) }}
{{- end }}
rules:
- host: {{ template "hostname" .Values.service.url }} <<<<<<<<<<<<<<<<<
http:
&httpRule
paths:
- path: /
backend:
serviceName: {{ template "fullname" . }}
servicePort: {{ .Values.service.externalPort }}
{{- if .Values.service.commonName }}
- host: {{ template "hostname" .Values.service.commonName }}
http:
<<: *httpRule
{{- end -}}
{{- if .Values.service.additionalHosts }}
{{- range $host := .Values.service.additionalHosts }}
- host: {{ $host }}
http:
<<: *httpRule
{{- end -}}
{{- end -}}
What I have done so far:
removed deployment from cluster, cleared the Gitlab runners cache, cleared the Gitlab cluster cache. Deleted the environment (stop and delete). Created a new environment 'production' with the right URL 'Operations>Environments>production>Edit'. After push the url has been replaced with the wrong one.
hard coded the url in Ingress (at the arrows in the snippet), it worked
changed the value in gitlab-ci.yml without http://. No result.
check the use of 'apply production secret configuration' in the gitlab-ci.yml, by adding echo 'message!'. Conclusion: this part of the file is used for production
A CICD variable Settings>CICD: GITLAB_ENVIRONMENT_URL. No effect.
UPDATE:
Maybe the .Values.gitlab.app is used for the URL.
The file .gitlab-ci.yml includes a template which overrides the value.
//.gitlab-ci.yml
include:
- template: Jobs/Deploy.gitlab-ci.yml # https://gitlab.com/gitlab-org/gitlab-foss/blob/master/lib/gitlab/ci/templates/Jobs/Deploy.gitlab-ci.yml
The override in the template:
.production: &production_template
extends: .auto-deploy
stage: production
script:
- auto-deploy check_kube_domain
- auto-deploy download_chart
- auto-deploy ensure_namespace
- auto-deploy initialize_tiller
- auto-deploy create_secret
- auto-deploy deploy
- auto-deploy delete canary
- auto-deploy delete rollout
- auto-deploy persist_environment_url
environment:
name: production
url: http://$CI_PROJECT_PATH_SLUG.$KUBE_INGRESS_BASE_DOMAIN <<<<<<<<<<<<<<
artifacts:
paths: [environment_url.txt, tiller.log]
when: always

New Line character in email alerts

How to print new line character when sending emails? I'm sending it to gmail. The character \n prints literally. I even tried </br> tag and yaml mutliline and none of them work.
- alert: KubernetesPodImagePullBackOff
expr: kube_pod_container_status_waiting_reason{reason=~"ContainerCreating|CrashLoopBackOff|ErrImagePull|ImagePullBackOff"} > 0
for: 1s
labels:
severity: warning
annotations:
summary: "Kubernetes pod crash looping (instance {{ $labels.instance }}"
description: "Pod {{ $labels.pod }} is crash looping\n VALUE = {{ $value }}\n LABELS: {{ $labels }}"
You need to rewrite default template in Alertmanager for E-mails.
You need to replace something like
{{ .Annotations.description }}
in template by
{{ .Annotations.description | safeHtml }}
I wrote for my own email template, if you have not your own, you may create it from
https://github.com/prometheus/alertmanager/blob/master/template/default.tmpl
and edit
{{ range .Annotations.SortedPairs }} - {{ .Name }} = {{ .Value }}
in the same manner with
{{ .Value | safeHtml }}
Also read this answer
prometheus using html content in alerts annotations and using it in email template

Prometheus alertmanager - invalid leading UTF-8 octet

I am trying to configure the slack notification from Prometheus alertmanager with below yml.
global:
resolve_timeout: 1m
slack_api_url: 'https://hooks.slack.com/services/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'
route:
receiver: 'slack-notifications'
receivers:
- name: 'slack-notifications'
slack_configs:
- channel: '#monitoring-instances'
send_resolved: true
icon_url: https://avatars3.githubusercontent.com/u/3380462
title: |-
[{{ .Status | toUpper }}{{ if eq .Status "firing" }}:{{ .Alerts.Firing | len }}{{ end }}] {{ .CommonLabels.alertname }} for {{ .CommonLabels.job }}
{{- if gt (len .CommonLabels) (len .GroupLabels) -}}
{{" "}}(
{{- with .CommonLabels.Remove .GroupLabels.Names }}
{{- range $index, $label := .SortedPairs -}}
{{ if $index }}, {{ end }}
{{- $label.Name }}="{{ $label.Value -}}"
{{- end }}
{{- end -}}
)
{{- end }}
text: >-
{{ range .Alerts -}}
*Alert:* {{ .Annotations.title }}{{ if .Labels.severity }} - `{{ .Labels.severity }}`{{ end }}
*Description:* {{ .Annotations.description }}
*Details:*
{{ range .Labels.SortedPairs }} • *{{ .Name }}:* `{{ .Value }}`
{{ end }}
{{ end }}
When i start my alert-manager container it keep on restarting and shows below error.
alertmanager | level=error ts=2021-01-12T04:08:19.040Z caller=coordinator.go:124 component=configuration msg="Loading configuration file failed" file=/etc/alertmanager/alertmanager.yml err="yaml: invalid leading UTF-8 octet"
Have validated Here it shown as valid yaml
Also checked with notepad++ the encoding already showing as UTF-8 Any other way to fix this?
Even this code also shows same error.
slack_configs:
- channel: '#monitoring-instances'
send_resolved: false
title: '[{{ .Status | toUpper }}{{ if eq .Status "firing" }}:{{ .Alerts.Firing | len }}{{ end }}] Monitoring Event Notification'
text: >-
{{ range .Alerts }}
*Alert:* {{ .Annotations.summary }} - `{{ .Labels.severity }}`
*Description:* {{ .Annotations.description }}
*Graph:* <{{ .GeneratorURL }}|:chart_with_upwards_trend:> *Runbook:* <{{ .Annotations.runbook }}|:spiral_note_pad:>
*Details:*
{{ range .Labels.SortedPairs }} • *{{ .Name }}:* `{{ .Value }}`
{{ end }}
{{ end }}
I am using Centos-8.2 system, is it something wrong in my system? Can anyone help me out here.
In my occasion there was an issue with the config/application.yml
It was all Gibberish so I had to deleted it and recreate it.
After that the issue was resolved.
Resolved by removing bullet • in the code and launched the container. It works now.

Resources