Silencing Alerts
Sometimes, alerts need to be silenced. This may be because of planned maintenance, or maybe it’s because of an outage that is being tracked in issues for follow-up. No matter what, silencing isn’t a rare occurrence. This wiki page documents how to handle that process. Alert Manager is where alerts are deduplicated, grouped, and messaged. This is where any sort of silencing will take place. Note that the alert will continue to fire in upstream systems (i.e. Prometheus), and this is expected. We want the downtime and alert condition to be tracked, just not to be messaged so as to avoid creating noise.
There are two steps to silencing the alert: identifying the pattern and writing the rest of the details. The steps are as follows:
- Navigate to alert manager
- Confirm that the alert you wish to silence is appearing on this page; if you’re uncertain as to which alert relates to a specific message, click on the “+ Info” button to see that detail
- Once you’ve found the alert that you wish to silence, click on the “Silence” button
- Move on to “Writing the rest of the details” section below
- Navigate to alert manager
- Confirm that the alert you wish to silence is appearing on this page; if you’re uncertain as to which alert relates to a specific message, click on the “+ Info” button to see that detail
- Once you’ve found the alerts that you wish to silence, review the relevant labels displayed for it, as these values are how to will define a silencing rule; most of the time, you’ll want to silence an alert for a given
instance
andjob
; for instance, you may want to silence all alerts matchinginstance="allstar.leb.memhamwan.net:5038"
andjob="allstar"
- In the top section of the page under the “Filter” heading, add the matchers one at a time, and confirm that just the item you wish to silence is being displayed; the right level of specificity must be given to avoid accidentally silencing future alerts too
- Click on the “Silence” button in order to create the new silence pre-populated with your matchers
- Move on to “Writing the rest of the details” section below
- Confirm that there is a GitLab issue logged that is describing what needs to be done for the silence to go away; add the “alertmanager silence” label to the issue
- Consider how long the silence should be active; try to err on underestimating rather than overestimating the timeframe to avoid mistakenly leaving a silence running for too long; this value should be stated in terms of hours (“h”), days (“d”), or weeks (“w”); enter this value in the “duration” field
- Under the “Creator” field, enter your GitLab username
- Under the “Comment” field, write a brief explanation of why you are creating the silence; there should always be an accompanying GitLab Issue created, and be sure to link to it in your comment
- Submit the silence
- Copy and paste the URL to the silence into a comment on the GitLab issue so that it can be traced