Introduction
DataSet Alerts have a number of helpful features that can be used to test and diagnose missing alert notifications. This article covers some of the best practices we use.
Testing
Instead of waiting until your alert is triggered, you can test your alert's notifications immediately by setting the trigger to a boolean value of true
After 2-3 minutes, you should begin to see notification activity.
Note: If you are using the #lastLogLines#
token, no log lines will be displayed if you are using a trigger of true
(since no logs were actually processed)
Confirm your alert's status
Review the Alert's Status
Reviewing Alert Status
The tag='alertState'
query returns log events associated with alerts on your DataSet account. These diagnostics simplify the process of verifying whether the alert is behaving as expected.
Note! A new alert may not trigger within the usual 2-3 minutes because its histograms are still being created. Once these histograms have been generated, the alert will function as expected. If no matching log lines are returned by the tag='alertState'
query for a relatively new alert, there's a good chance that histogram generation is the underlying cause.
Checking For State Changes
When testing alerts, you want to be sure that the specified condition is triggering the alert (and therefore, working as expected).
The simplest way to check this is by using the tag='alertStateChange'
query, as it returns the events that are recorded when an alert is triggered / returns to a dormant state. For example,
- The alert changes from a dormant state to an active state (
lastStatus
=OK ->status
=TRIGGERED) - ~5m later, the alert is resolved and returns to a dormant state (
lastStatus
=TRIGGERED ->status
=OK) - For the purposes of our discussion, the most important fields are the
description
(used for filtering) and thestatus
attributes.
Review the Alert Notifications
The status of any notifications that are issued relative to the alert can be examined with tag='alertNotification'
. I typically use a query similar to:
tag in ('alertNotification','alertStateChange') description contains 'DESCRIPTION_TITLE'
This displays notifications relative to the alertStateChange
events that triggered them in chronological order. As highlighted in the above screenshot (with arrows),
status
=TRIGGERED indicates a notification for an active alertstatus
=OK indicates a resolution notification for an alert that returned to a dormant state
Check for Webhook errors
If the alert is functioning as expected but you are still not receiving notifications, confirm that the notification mechanism(s) are working. When testing alerts for the first time, I typically use my email address to verify that the notifications are being delivered as expected.
For example, configuring webhooks with JSON payloads may require more initial effort due to their formatting requirements. One way to quickly spot a problem is to search for tag='webhookError'
If a faulty payload is causing your notifications to be rejected, you'll see log events similar to:
In the above example, I realized that I had incorrectly escaped (\\\") my JSON payload in the alerts configuration, which resulted in the payload error (and no notifications being sent to Slack)
Comments
0 comments
Please sign in to leave a comment.