trying to create a message alert from alertmanager(my instance is having pod restarts in production namespace), using lambda to trigger SNS messaging in case the alert jumps in. so its- alertmanager(via webhooks configured to api gateway endpoint) > apigateway > lambda > SNS > my SMS.
my problem is how to extract different alertmanager firing. from the records in lambda? the volume of body in message is pretty big, and i need something to extract a specific message to trigger the lambda to send SNS.
should i simply go and stream edit the body/and trigger according to my own description in the alertmanager receiver config?
or is there a faster better way to do it?
the part of the helm chart for prometheus:
additionalPrometheusRules:
- name: custom-pod-restarts
groups:
- name: pod-restarts
rules:
- alert: HighFrontendRestarts
expr: increase(kube_pod_container_status_restarts_total{namespace="production"}[1h]) > 2
for: 10m
labels:
severity: critical
annotations:
summary: "High restart rate in frontend pod"
description: "Pod {{ $labels.namespace }}/{{ $labels.pod }} in the frontend deployment has restarted more than 2 times in the last hour."
- alert: HighBackendRestarts
expr: increase(kube_pod_container_status_restarts_total{namespace="production"}[1h]) > 2
for: 10m
labels:
severity: critical
annotations:
summary: "High restart rate in backend pod"
description: "Pod {{ $labels.namespace }}/{{ $labels.pod }} in the backend deployment has restarted more than 2 times in the last hour."
the alertmanager chart part of the receiver/webhook:
- name: 'aws-lambda-webhook'
webhook_configs:
- url: 'https://XXXXXXX.execute-api.us-east-2.amazonaws.com/tested-production/alerts'
send_resolved: true
route:
group_by: ['namespace']
group_wait: 30s
group_interval: 5m
repeat_interval: 12h
receiver: 'aws-lambda-webhook'
routes:
- receiver: 'aws-lambda-webhook'
matchers:
- alertname = "Watchdog"
templates:
- '/etc/alertmanager/config/*.tmpl'
the Lambda part to extract the message from alertmanager:
record = event.get('body', [])
print("printing indent 2 of event :", json.dumps(event, indent=2))
if record :
print(f"these are the records: ${record[0]}")
else:
print("no records, or unprinteable this way")
message = event['body'][0]['Sns']['Message']
# add condition here in lambda that detects/search/extracts from record
where can i read how the message is formatted and processed in the lambda? again, since alertmanager can be firing lots of alerts, and i just want specific ones to trigger SNS.