Prometheus: Alertmanager Web UI alerts Silence
Configuring Alertmanager's Web UI to temporary mute alerts sending via Silence.
/pla
Active alerts sending frequency via Alertmanager is configured via the repeat_interval
in the /etc/alertmanager/config.yml
file.
We have this interval set to 15 minutes, and as a result, we have notifications about alerts in our Slack every fifteen minutes.
Still, some alerts are such a “known issues”, when we already started the investigation or fixing it, but the alert is repeatedly sent to Slack.
To mute those alerts to prevent them to be sent over and over, they can be disabled by marking them as “silenced”.
An alert can be silenced with the Web UI of the Alertmanager, see the documentation.
So, what we will do in this post:
update Alertmanager’s startup options to enable Web UI
update an NGINX virtualhost to get access to the Alertmanager’s Web UI
will check and configure a Prometheus server to send alerts
will add a test alert to check how to Silence it
Alertmanager Web UI configuration
We have our Alertmanager running from a Docker Compose file, let’s add two parameters to the command field - a web.route-prefix
which will specify a URI for the Alertmanager Web UI and a web.external-url
, to set a full URL.
This full URL will look like dev.monitor.example.com/alertmanager — add them:
...
alertmanager:
image: prom/alertmanager:v0.21.0
networks:
- prometheus
ports:
- 9093:9093
volumes:
- /etc/prometheus/alertmanager_config.yml:/etc/alertmanager/config.yml
command:
- '--config.file=/etc/alertmanager/config.yml'
- '--web.route-prefix=/alertmanager'
- '--web.external-url=https://dev.monitor.example.com/alertmanager'
...
Alertmanager is working in a Docker container and is accessible via localhost:9093 from the monitoring host:
root@monitoring-dev:/home/admin# docker ps | grep alert
24ae3babd644 prom/alertmanager:v0.21.0 “/bin/alertmanager -…” 3 seconds ago Up 1 second 0.0.0.0:9093->9093/tcp prometheus_alertmanager_1
In the NGINX’s virtualhost config, add a new upstream with the Alertmanager's Docker container:
...
upstream alertmanager {
server 127.0.0.1:9093;
}
...
Also, add a new location
in this file which will proxy-pass all requests to the dev.monitor.example.com/alertmanager to this upstream:
...
location /alertmanager {
proxy_redirect off;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_pass [http://alertmanager$request_uri;](http://alertmanager%24request_uri;)
}
...
Save and reload NGINX and Alertmanager.
Now, open the dev.monitor.example.com/alertmanager URL and you must see the Alertmanager Web UI:
Here are no alerts yet — wait for Prometheus to send new alerts.
Prometheus: “Error sending alert” err=”bad response status 404 Not Found”
After a new alert in the Prometheus server will appear, you can see the following error in its log:
caller=notifier.go:527 component=notifier alertmanager=alertmanager:9093/api/v1/alerts count=3 msg=”Error sending alert” err=”bad response status 404 Not Found”
It happens because currently, we have the alertmanagers
set as:
...
alerting:
alertmanagers:
- static_configs:
- targets:
- alertmanager:9093
...
So, need to add the URI of the Alertmanager by using the path_prefix
setting:
...
alerting:
alertmanagers:
- path_prefix: "/alertmanager/"
static_configs:
- targets:
- alertmanager:9093
...
Restart the Prometheus, and wait again for alerts:
At this time, you must see them in the Alertmanager Web UI too:
Alertmanager: an alert Silence
Now, let’s add a Silence for an alert to stop sending them.
For example, to disable re-sending of the alertname=”APIendpointProbeSuccessCritical”, click on the + button on the right side:
Then on the Silence button:
The alertname label was added to the silencing condition with the default rule of 2 hours, add an author and description of why it was silenced:
Click Create — and it’s done:
You can check this alert via API now:
root@monitoring-dev:/home/admin# curl -s [http://localhost:9093/alertmanager/api/v1/alerts](http://localhost:9093/alertmanager/api/v1/alerts) | jq ‘.data[1]’
{
“labels”: {
“alertname”: “APIendpointProbeSuccessCritical”,
“instance”: “http://push.example.com",
“job”: “blackbox”,
“monitor”: “monitoring-dev”,
“severity”: “critical”
},
“annotations”: {
“description”: “Cant access API endpoint http://push.example.com!",
“summary”: “API endpoint down!”
},
“startsAt”: “2020–12–30T11:25:25.953289015Z”,
“endsAt”: “2020–12–30T11:43:25.953289015Z”,
“generatorURL”: “https://dev.monitor.example.com/prometheus/graph?g0.expr=probe_success%7Binstance%21%3D%22https%3A%2F%2Fokta.example.com%22%2Cjob%3D%22blackbox%22%7D+%21%3D+1&g0.tab=1",
“status”: {
“state”: “suppressed”,
“silencedBy”: [
“ec11c989-f66e-448e-837c-d788c1db8aa4”
],
“inhibitedBy”: null
},
“receivers”: [
“critical”
],
“fingerprint”: “01e79a8dd541cf69”
}
So, this alert will not be sent to the Slack or wherever else because of the "state": "suppressed"
field:
…
“status”: {
“state”: “suppressed”,
“silencedBy”: [
“ec11c989-f66e-448e-837c-d788c1db8aa4”
],…
Done.
Originally published at RTFM: Linux, DevOps and system administration.