I tried inhibiting the alerts so same alert in critical can inhibit warning alert.
The below configs didnt work...Please suggest whats the issue with this config..
inhibit_rules:
- source_match:
alertname: Inhibit
severity: critical
target_match:
severity: warning
alertname: KubePodNotReady
equal: ['alertname', 'namespace', 'pod', 'prometheus']
inhibit_rules:
- source_match:
severity: 'critical'
target_match:
severity: 'warning'
equal: ['alertname', 'namespace', 'pod','prometheus']
The below config works for me.
inhibit_rules:
- source_match:
severity: critical
alertname: KubePodNotReady
target_match:
severity: warning
alertname: KubePodNotReady
equal: ['namespace', 'pod', 'prometheus']
Related
When I enable the alertmanager a secret gets created with name alertmanager-{chartName}-alertmanager. But no pods or statefulset of alertmanager gets created.
When I delete this secret with kubectl delete and upgrade the chart again, then new secrets get created alertmanager-{chartName}-alertmanager , alertmanager-{chartName}-alertmanager-generated. In this case i can see the pods and statefulset of alertmanager. But the -generated secret has default values which are null. This secret alertmanager-{chartName}-alertmanager has updated configuration.
Checked the alertmanager.yml with amtool and it shows valid.
Chart - kube-prometheus-stack-36.2.0
#Configuration in my values.yaml
alertmanager:
enabled: true
global:
resolve_timeout: 5m
smtp_require_tls: false
route:
receiver: 'email'
receivers:
- name: 'null'
- name: 'email'
email_configs:
- to: xyz#gmail.com
from: abc#gmail.com
smarthost: x.x.x.x:25
send_resolved: true
#Configuration from the secret alertmanager-{chartName}-alertmanager
global:
resolve_timeout: 5m
smtp_require_tls: false
inhibit_rules:
- equal:
- namespace
- alertname
source_matchers:
- severity = critical
target_matchers:
- severity =~ warning|info
- equal:
- namespace
- alertname
source_matchers:
- severity = warning
target_matchers:
- severity = info
- equal:
- namespace
source_matchers:
- alertname = InfoInhibitor
target_matchers:
- severity = info
receivers:
- name: "null"
- email_configs:
- from: abc#gmail.com
send_resolved: true
smarthost: x.x.x.x:25
to: xyz#gmail.com
name: email
route:
group_by:
- namespace
group_interval: 5m
group_wait: 30s
receiver: email
repeat_interval: 12h
routes:
- matchers:
- alertname =~ "InfoInhibitor|Watchdog"
receiver: "null"
templates:
- /etc/alertmanager/config/*.tmpl
I try to control the severity level of PagerDuty alerts using configuration of Alertmanager.
I hard-coded the severity level to warning in the receiver of Alertmanager:
- name: 'whatever_pd_service'
pagerduty_configs:
- send_resolved: true
service_key: SERVICE_KEY
url: https://events.pagerduty.com/v2/enqueue
client: '{{ template "pagerduty.default.client" . }}'
client_url: '{{ template "pagerduty.default.clientURL" . }}'
severity: 'warning'
description: '{{ (index .Alerts 0).Annotations.summary }}'
details:
firing: '{{ template "pagerduty.default.instances" .Alerts.Firing }}'
information: '{{ range .Alerts }}{{ .Annotations.information }}
{{ end }}'
num_firing: '{{ .Alerts.Firing | len }}'
num_resolved: '{{ .Alerts.Resolved | len }}'
resolved: '{{ template "pagerduty.default.instances" .Alerts.Resolved }}'
but still in the alerts generated, the Severity level was set to critical:
Is there a way to set the Severity level in PagerDuty?
Found out the reason why the severity field in Alertmanager receiver configuration is not working - we are using a Prometheus (Events API v1) integration in the PagerDuty Service, and according to the specification of PD Events API v1 (https://developer.pagerduty.com/docs/ZG9jOjExMDI5NTc4-send-a-v1-event), there is no severity field.
So there are two ways to solve this problem (achieve Dynamic Notification for PagerDuty) - either use Events API v2, or use service orchestration (https://support.pagerduty.com/docs/event-orchestration#service-orchestrations)
my alertmanagerconfigs:
apiVersion: monitoring.coreos.com/v1alpha1
kind: AlertmanagerConfig
metadata:
name: configlinkflowalertmanager
labels:
alertmanagerConfig: linkflowAlertmanager
spec:
route:
groupBy: ['alertname']
groupWait: 30s
groupInterval: 5m
repeatInterval: 12h
receiver: 'webhook'
matchers:
- name: alertname
value: KubePodCrashLooping
- name: namespace
value: linkflow
receivers:
- name: 'webhook'
webhookConfigs:
- url: 'http://xxxxx:1194/'
the web shows: namespace become monitoring ? why? and alerts only in monitoring can send out
can I send other namespace or all namespace alerts?
route:
receiver: Default
group_by:
- namespace
continue: false
routes:
- receiver: monitoring-configlinkflowalertmanager-webhook
group_by:
- namespace
match:
alertname: KubePodCrashLooping
namespace: monitoring
continue: true
group_wait: 30s
group_interval: 5m
repeat_interval: 12h
This is a feature:
That's kind of the point of the feature, otherwise it's possible that alertmanager configs in different namespaces conflict and Alertmanager won't be able to start.
There is an Issue (#3737) to make namespace label matching optional / configurable. The related PR still has to be merged (as of Today), but it will allow you to define global alerts.
I have this iamRoleStatements on my serverless.yml, which should allow those actions to my lambda functions:
- Effect: Allow
Action:
- dynamodb:Query
- dynamodb:Scan
- dynamodb:GetItem
- dynamodb:PutItem
- dynamodb:UpdateItem
- dynamodb:DeleteItem
- dynamodb:BatchWriteItem
- dynamodb:BatchReadItem
Resource: "arn:aws:dynamodb:${self:provider.region}:*:table/${self:custom.tableName}"
And this my lambda yml:
functions:
scraping:
handler: handler.scraping
memorySize: 1536
layers:
- !Sub 'arn:aws:lambda:${AWS::Region}:764866452798:layer:chrome-aws-lambda:22'
timeout: 15
events:
- schedule:
rate: ${self:custom.scheduleRate}
name: schedule-scraping-${self:provider.stage}
description: scraping each 5 minute
enabled: ${self:custom.enabled}
In my handle function, I try to insert an item, but I'm getting this error:
AccessDeniedException: User: arn:aws:sts::006977245882:assumed-role/BestSellers-qa-us-east-1-lambdaRole/BestSellers-qa-scraping is not authorized to perform: dynamodb:BatchWriteItem on resource: arn:aws:dynamodb:us-east-1:006977245882:table/TABLE_NAME
at Request.extractError (/var/runtime/node_modules/aws-sdk/lib/protocol/json.js:52:27)
at Request.callListeners (/var/runtime/node_modules/aws-sdk/lib/sequential_executor.js:106:20) ...
Unless you've edited/redacted TABLE_NAME in the error message, my guess is that you're inadvertently attempting to write to a table which probably doesn't exist (TABLE_NAME).
You haven't posted your handler code, but I'd check your code and verify that your actual table name is being set/interpolated correctly before your handler code attempts to insert an item with the DynamoDB API.
I am trying to use "E-mail" to receive alert from Prometheus with alertmanager, however, It is keeping print such log like: "Error on notify: EOF" source="notify.go:283" and "Notify for 3 alerts failed: EOF" source="dispatch.go:261". My alertmanager config is like below:
smtp_smarthost: 'smtp.xxx.com:xxx'
smtp_from: 'xxxxx#xxx.com'
smtp_auth_username: 'xxxx#xxx.com'
smtp_auth_password: 'xxxxxxx'
smtp_require_tls: false
route:
group_by: ['instance']
group_wait: 30s
group_interval: 5m
repeat_interval: 7m
receiver: email
routes:
- match:
severity: critical
receiver: email
- match_re:
severity: ^(warning|critical)$
receiver: support_team
receivers:
- name: 'email'
email_configs:
- to: 'xxxxxx#xx.com'
- name: 'support_team'
email_configs:
- to: 'xxxxxx#xx.com'
- name: 'pager'
email_configs:
- to: 'alert-pager#example.com'
Any suggest?
I use smtp.xxx.com:587 fixed the issue,but also need to set smtp_require_tls: true