alertmanager getting randomly error message unexpected status code 422 - prometheus-alertmanager

I have deployed prometheus from community-helm chart(14.6.0) where is running alertmanager which is showing time-to-time errors (templating issues) with error message showing nothing extra useful. Question is that i have retested config via amtool and didnt receive error in config
level=error ts=2021-08-17T14:43:08.787Z caller=dispatch.go:309 component=dispatcher msg="Notify for alerts failed" num_alerts=2 err="opsgenie/opsgenie[0]: notify retry canceled due to unrecoverable error after 1 attempts: unexpected status code 422: {\"message\":\"Request body is not processable. Please check the errors.\",\"errors\":{\"message\":\"Message can not be empty.\"},\"took\":0.0,\"requestId\":\"38c37c18-5635-48bc-bb69-bda03e232cce\"}"
level=debug ts=2021-08-17T14:43:08.798Z caller=notify.go:685 component=dispatcher receiver=opsgenie integration=opsgenie[0] msg="Notify success" attempts=1
level=error ts=2021-08-17T14:43:08.804Z caller=dispatch.go:309 component=dispatcher msg="Notify for alerts failed" num_alerts=2 err="opsgenie/opsgenie[0]: notify retry canceled due to unrecoverable error after 1 attempts: unexpected status code 422: {\"message\":\"Request body is not processable. Please check the errors.\",\"errors\":{\"message\":\"Message can not be empty.\"},\"took\":0.001,\"requestId\":\"70d2ac84-3422-4fe6-9d8b-e601fdc37b25\"}"
Monitoring is working and getting alerts just would like to understand how this error can be translated .. what could be wrong as enabling debug mode didnt provide more information.
alertmanager config:
global: {}
receivers:
- name: opsgenie
opsgenie_configs:
- api_key: XXX
api_url: https://api.eu.opsgenie.com/
details:
Prometheus alert: ' {{ .CommonLabels.alertname }}, {{ .CommonLabels.namespace }}, {{ .CommonLabels.pod }}, {{ .CommonLabels.dimension_CacheClusterId }}, {{ .CommonLabels.dimension_DBInstanceIdentifier }}, {{ .CommonLabels.dimension_DBClusterIdentifier }}'
http_config: {}
message: '{{ .CommonAnnotations.message }}'
priority: '{{ if eq .CommonLabels.severity "critical" }}P2{{ else if eq .CommonLabels.severity "high" }}P3{{ else if eq .CommonLabels.severity "warning" }}P4{{ else }}P5{{ end }}'
send_resolved: true
tags: ' Prometheus, {{ .CommonLabels.namespace }}, {{ .CommonLabels.severity }}, {{ .CommonLabels.alertname }}, {{ .CommonLabels.pod }}, {{ .CommonLabels.kubernetes_node }}, {{ .CommonLabels.dimension_CacheClusterId }}, {{ .CommonLabels.dimension_DBInstanceIdentifier }}, {{ .CommonLabels.dimension_Cluster_Name }}, {{ .CommonLabels.dimension_DBClusterIdentifier }} '
- name: deadmansswitch
webhook_configs:
- http_config:
basic_auth:
password: XXX
send_resolved: true
url: https://api.eu.opsgenie.com/v2/heartbeats/prometheus-nonprod/ping
- name: blackhole
route:
group_by:
- alertname
- namespace
- kubernetes_node
- dimension_CacheClusterId
- dimension_DBInstanceIdentifier
- dimension_Cluster_Name
- dimension_DBClusterIdentifier
- server_name
group_interval: 5m
group_wait: 10s
receiver: opsgenie
repeat_interval: 5m
routes:
- group_interval: 1m
match:
alertname: DeadMansSwitch
receiver: deadmansswitch
repeat_interval: 1m
- match_re:
namespace: XXX
- match_re:
alertname: HighMemoryUsage|HighCPULoad|CPUThrottlingHigh
- match_re:
namespace: .+
receiver: blackhole
- group_by:
- instance
match:
alertname: PrometheusBlackboxEndpoints
- match_re:
alertname: .*
- match_re:
kubernetes_node: .*
- match_re:
dimension_CacheClusterId: .*
- match_re:
dimension_DBInstanceIdentifier: .*
- match_re:
dimension_Cluster_Name: .*
- match_re:

Related

Kube Prometheus Stack Chart - Alertmanager

When I enable the alertmanager a secret gets created with name alertmanager-{chartName}-alertmanager. But no pods or statefulset of alertmanager gets created.
When I delete this secret with kubectl delete and upgrade the chart again, then new secrets get created alertmanager-{chartName}-alertmanager , alertmanager-{chartName}-alertmanager-generated. In this case i can see the pods and statefulset of alertmanager. But the -generated secret has default values which are null. This secret alertmanager-{chartName}-alertmanager has updated configuration.
Checked the alertmanager.yml with amtool and it shows valid.
Chart - kube-prometheus-stack-36.2.0
#Configuration in my values.yaml
alertmanager:
enabled: true
global:
resolve_timeout: 5m
smtp_require_tls: false
route:
receiver: 'email'
receivers:
- name: 'null'
- name: 'email'
email_configs:
- to: xyz#gmail.com
from: abc#gmail.com
smarthost: x.x.x.x:25
send_resolved: true
#Configuration from the secret alertmanager-{chartName}-alertmanager
global:
resolve_timeout: 5m
smtp_require_tls: false
inhibit_rules:
- equal:
- namespace
- alertname
source_matchers:
- severity = critical
target_matchers:
- severity =~ warning|info
- equal:
- namespace
- alertname
source_matchers:
- severity = warning
target_matchers:
- severity = info
- equal:
- namespace
source_matchers:
- alertname = InfoInhibitor
target_matchers:
- severity = info
receivers:
- name: "null"
- email_configs:
- from: abc#gmail.com
send_resolved: true
smarthost: x.x.x.x:25
to: xyz#gmail.com
name: email
route:
group_by:
- namespace
group_interval: 5m
group_wait: 30s
receiver: email
repeat_interval: 12h
routes:
- matchers:
- alertname =~ "InfoInhibitor|Watchdog"
receiver: "null"
templates:
- /etc/alertmanager/config/*.tmpl

smb/cifs mountOptions failed to apply in kubernetes

What happened:
Unable to use mountOptions for onprem smb mount
What you expected to happen:
Create manifest with mountOptions
how to reproduce it (as minimally and precisely as possible):
apiVersion: apps/v1
kind: Deployment
metadata:
name: "test"
spec:
{{- if not .Values.autoscaling.enabled }}
replicas: {{ .Values.replicaCount }}
{{- end }}
selector:
matchLabels:
{{- include "smbmount.selectorLabels" . | nindent 6 }}
template:
metadata:
{{- with .Values.podAnnotations }}
annotations:
{{- toYaml . | nindent 8 }}
{{- end }}
labels:
{{- include "smbmount.selectorLabels" . | nindent 8 }}
spec:
{{- with .Values.imagePullSecrets }}
imagePullSecrets:
{{- toYaml . | nindent 8 }}
{{- end }}
containers:
- name: {{ .Chart.Name }}
securityContext:
{{- toYaml .Values.securityContext | nindent 12 }}
image: "{{ .Values.image.repository }}:{{ .Values.image.tag | default .Chart.AppVersion }}"
imagePullPolicy: {{ .Values.image.pullPolicy }}
resources:
limits:
memory: "16048Mi"
cpu: "16000m"
volumeMounts:
- name: smb01
mountPath: /smb/01
volumes:
- name: smb01
csi:
driver: file.csi.azure.com
volumeAttributes:
server: 10.10.10.100
shareName: share01
secretName: smbcreds
mountOptions:
- dir_mode=0777
what error getting:
Error: unable to build kubernetes objects from release manifest: error
validating "": error validating data:
ValidationError(Deployment.spec.template.spec.volumes[0].csi.volumeAttributes.mountOptions):
invalid type for io.k8s.api.core.v1.CSIVolumeSource.volumeAttributes:
got "array", expected "string"
Did I am using right place for mountOptions? or anything did I made mistake in deployment file
values.yaml
# This is a YAML-formatted file.
# Declare variables to be passed into your templates.
replicaCount: 2
image:
repository: prxzzzzjjkkk.azurecr.io/smbtest
pullPolicy: Always
# Overrides the image tag whose default is the chart appVersion.
tag: latest
imagePullSecrets:
- name: acr-pull-secrets
nameOverride: ""
fullnameOverride: ""
serviceAccount:
# Specifies whether a service account should be created
create: true
# Annotations to add to the service account
annotations: {}
# The name of the service account to use.
# If not set and create is true, a name is generated using the fullname template
name: ""
podAnnotations: {}
podSecurityContext: {}
# fsGroup: 2000
securityContext: {}
# capabilities:
# drop:
# - ALL
# readOnlyRootFilesystem: true
# runAsNonRoot: true
# runAsUser: 1000
service:
type: ClusterIP
port: 80
ingress:
enabled: false
className: ""
annotations: {}
# kubernetes.io/ingress.class: nginx
# kubernetes.io/tls-acme: "true"
hosts:
- host: chart-example.local
paths:
- path: /
pathType: ImplementationSpecific
tls: []
# - secretName: chart-example-tls
# hosts:
# - chart-example.local
resources: {}
# We usually recommend not to specify default resources and to leave this as a conscious
# choice for the user. This also increases chances charts run on environments with little
# resources, such as Minikube. If you do want to specify resources, uncomment the following
# lines, adjust them as necessary, and remove the curly braces after 'resources:'.
# limits:
# cpu: 100m
# memory: 128Mi
# requests:
# cpu: 100m
# memory: 128Mi
autoscaling:
enabled: false
minReplicas: 1
maxReplicas: 100
targetCPUUtilizationPercentage: 80
# targetMemoryUtilizationPercentage: 80
nodeSelector: {}
tolerations: []
Please modify the deployment.spec.template.spec.volumes.csi.volumeAttributes.mountOptions to a string of comma separated key value pairs, instead of an array.
So your modified Deployment manifest should have:
...
volumes:
- name: smb01
csi:
driver: file.csi.azure.com
volumeAttributes:
server: 10.10.10.100
shareName: share01
secretName: smbcreds
mountOptions: "dir_mode=0777" #correct format
instead of:
...
volumes:
- name: smb01
csi:
driver: file.csi.azure.com
volumeAttributes:
server: 10.10.10.100
shareName: share01
secretName: smbcreds
mountOptions: #incorrect format
- dir_mode=0777
Reference: https://raw.githubusercontent.com/kubernetes-sigs/azurefile-csi-driver/master/deploy/example/nginx-pod-azurefile-inline-volume.yaml
In error clearly mentioned at a place of CSIVolumeSource.volumeAttributes: got "array", expected "string" it got th Array instead of String
your YAML should be like
volumes:
- name: smb01
csi:
driver: file.csi.azure.com
volumeAttributes:
server: 10.10.10.100
shareName: share01
secretName: smbcreds
mountOptions: dir_mode=0777

Prometheus alert multiple routes and receiver doesn't send one rules

I have this alert configuration and expect this behavior.
If destination: bloom and severity: info send to slack-alert-info - it's work
Error there. If destination: bloom and severity: warning|critical send to slack-alert-multi - this works with error.
Sverity: warning sending as expected to both Slack's channels but critical sending only to default channel.
May someone help me understand my error, please?
Amtool gives no error.
amtool config routes test --config.file=/opt/prometheus/etc/alertmanager.yml --tree --verify.receivers=slack-alert-multi severity=warning destination=bloom
Matching routes:
.
└── default-route
└── {destination=~"^(?:bloom)$",severity=~"^(?:warning|critical)$"} receiver: slack-alert-multi
slack-alert-multi
amtool config routes test --config.file=/opt/prometheus/etc/alertmanager.yml --tree --verify.receivers=slack-alert-multi severity=critical destination=bloom
Matching routes:
.
└── default-route
└── {destination=~"^(?:bloom)$",severity=~"^(?:warning|critical)$"} receiver: slack-alert-multi
slack-alert-multi
Alert configuration
...
labels:
alerttype: infrastructure
severity: warning
destination: bloom
...
---
global:
resolve_timeout: 30m
route:
group_by: [ 'alertname', 'cluster', 'severity' ]
group_wait: 30s
group_interval: 30s
repeat_interval: 300s
receiver: 'slack'
routes:
- receiver: 'slack-alert-multi'
match_re:
destination: bloom
severity: warning|critical
- receiver: 'slack-alert-info'
match_re:
destination: bloom
severity: info
receivers:
- name: 'slack-alert-multi'
slack_configs:
- api_url: 'https://hooks.slack.com/services/T0/B0/V2'
channel: '#alert-upload'
send_resolved: true
icon_url: 'https://avatars3.githubusercontent.com/u/3380462'
title: '{{ template "custom_title" . }}'
text: '{{ template "custom_slack_message" . }}'
- api_url: 'https://hooks.slack.com/services/T0/B0/J1'
channel: '#alert-exports'
send_resolved: true
icon_url: 'https://avatars3.githubusercontent.com/u/3380462'
title: '{{ template "custom_title" . }}'
text: '{{ template "custom_slack_message" . }}'
# Default receiver
- name: 'slack'
slack_configs:
- api_url: 'https://hooks.slack.com/services/T0/B0/2x'
channel: '#aws-notification'
send_resolved: true
icon_url: 'https://avatars3.githubusercontent.com/u/3380462'
title: '{{ template "custom_title" . }}'
text: '{{ template "custom_slack_message" . }}'
- name: 'slack-alert-info'
slack_configs:
- api_url: 'https://hooks.slack.com/services/T0/B0/EA'
channel: '#alert-info'
send_resolved: true
icon_url: 'https://avatars3.githubusercontent.com/u/3380462'
title: '{{ template "custom_title" . }}'
text: '{{ template "custom_slack_message" . }}'
templates:
- '/opt/alertmanager_notifications.tmpl'
Try add
continue: true
into
- receiver: 'slack-alert-info'
match_re:
destination: bloom
severity: info
continue: true

Alertmanager email route

I am trying configure the "route" of alertmanager, below is my configuration:
route:
group_by: ['instance']
group_wait: 30s
group_interval: 5m
repeat_interval: 7m
receiver: pager
routes:
- match:
severity: critical
receiver: email
- match_re:
severity: ^(warning|critical)$
receiver: support_team
receivers:
- name: 'email'
email_configs:
- to: 'xxxxxx#xx.com'
- name: 'support_team'
email_configs:
- to: 'xxxxxx#xx.com'
- name: 'pager'
email_configs:
- to: 'alert-pager#example.com'
Now the e-mail can only be send to the default receiver "pager", will not further route to the custom one.
You need this line for each route when want alerts to be routed to the other ones.
continue: true
e.g.
route:
group_by: ['instance']
group_wait: 30s
group_interval: 5m
repeat_interval: 7m
receiver: pager
routes:
- match:
severity: critical
receiver: email
continue: true
- match_re:
severity: ^(warning|critical)$
receiver: support_team
continue: true
receivers:
- name: 'email'
email_configs:
- to: 'xxxxxx#xx.com'
- name: 'support_team'
email_configs:
- to: 'xxxxxx#xx.com'
- name: 'pager'
email_configs:
- to: 'alert-pager#example.com'
btw. imho receiver should be at the same level as match in yaml structure.

alertmanager won't send out e-mail

I am trying to use "E-mail" to receive alert from Prometheus with alertmanager, however, It is keeping print such log like: "Error on notify: EOF" source="notify.go:283" and "Notify for 3 alerts failed: EOF" source="dispatch.go:261". My alertmanager config is like below:
smtp_smarthost: 'smtp.xxx.com:xxx'
smtp_from: 'xxxxx#xxx.com'
smtp_auth_username: 'xxxx#xxx.com'
smtp_auth_password: 'xxxxxxx'
smtp_require_tls: false
route:
group_by: ['instance']
group_wait: 30s
group_interval: 5m
repeat_interval: 7m
receiver: email
routes:
- match:
severity: critical
receiver: email
- match_re:
severity: ^(warning|critical)$
receiver: support_team
receivers:
- name: 'email'
email_configs:
- to: 'xxxxxx#xx.com'
- name: 'support_team'
email_configs:
- to: 'xxxxxx#xx.com'
- name: 'pager'
email_configs:
- to: 'alert-pager#example.com'
Any suggest?
I use smtp.xxx.com:587 fixed the issue,but also need to set smtp_require_tls: true

Resources