Introduction:
Grouped Alerts are similar to traditional alerts, we now allow you to disaggregate based on one field evaluate the alert, and send notifications for each value of that field. Grouped alerts, let you set the alert condition once, and then apply it to each value in a field. For example, you can create an alert for high latency, select the serverHost field, and the alert will be evaluated on each server. On the Alerts page, Grouped Alerts are managed as a single alert, so you are not overwhelmed with multiple instances.
In this article I will walk you through the basics of Grouped Alerting we will cover Grouped Alert Count, Grouped Alert Notifications, and Grouped Alert by Value.
Grouped Alert Count:
In this example, we will set up a grouped alert for when someone enters the wrong password more than once. With Traditional alerts, you can set up an alert to show that someone is having issues with logging in. The beauty of grouped alerts is because we are alerting on a specific field you can now set up an alert that show's you exactly who's having the issue logging in. In our Invalid Login Credentials grouped alert example, we can now point to the specific email addresses experiencing the issue.
We can start by creating the alert with a simple search
Step 1. Log into DataSet, Click on Search and New Search in the top menu.
Step 2. Type in the information you would like to run the search on in the search bar
In my example below notice, that I am running an Invalid Login Credentials Search.
(Note If you try this on your own, you may need to perform a different search if you don’t have a tag set for authAudit or a status field that is = to invalidLoginCredentials Feel free to replace it with your own search criteria that is relevant for you)
Search Syntax:
tag = 'authAudit' status = 'invalidLoginCredentials' |
Step 3. Click on the Save button that has the star to the left of it and, Save As an Alert.
Step 4. Once on the alerts page, you can name the alert. Notice that the filter will be automatically populated with our previous search criteria.
Step 5. From here we will make sure to highlight the count (if not already highlighted) as this will count on the number of logs that match these criteria for a specific period.
Step 6: Click on the +ForEach button and select the field you wish to group by. userEmail is the field we are grouping by in our Invalid LogIns grouped alert.
Step 7: Set the Alert Conditions. We will Trigger If the number of events in 10 minutes is greater than 1. So if we have more than one invalid login for any given person we want a notification about that person.
Step 8: Click on Add Email, enter the recipient email address or Webhook details, and click Save.
Step 9: Now that our alert has been created if we wait a few minutes and click on the alerts page we will see the most recent alert is at the bottom. If the alert has been triggered we will see red lines like in the Testing grouped alert example below.
Step 10: Click on the alert name to get to the alert details page.
You will notice in my screenshot above this is different from the traditional alert’s details page You will see that I have these two disaggregated values here. Alice and Bob were the two invalid log-ins so they are listed here individually. We can see the history of the triggered status so if we had logged in earlier for Alice with a bad log it would have been triggered for a bit longer.
Grouped Alert Notifications: Now I will check my email where I expect to get notifications about these two users who had bad logins. When I log into my email account, I see I have one email. The reason I have only one email because we automatically batch our email notifications to avoid spamming people. If they had occurred at slightly different times you would have received two emails.
Notice here in the subject we include the name of the user because that’s what we were grouping by the userEmail field. We have these two instances we also include the triggering value email which shows you Alice had 4 bad logins while Bob had 2. If these were issued via Slack or some other notification channel we don’t batch those notifications so you would receive two individual notifications. One for Alice and one for Bob.
Grouped Alerts Value:
Similar to are grouped alert count we will create our alert through a search.
The search example I have here is something our SRE team uses to measure CPU Consumption.
Step 1. Log into DataSet, Click on Search and New Search in the top menu.
Step 2. Type in the information you would like to run the search on in the search bar
Search Syntax:
metric='proc.stat.cpu_rate' type='user' source='tsdb' |
Step 3: Click on the value in the left-hand menu under fields and select Graph Values and let's click in the breakdown menu and breakdown by serverHost.
Step 4: Click on Show mean, under the line chart in the upper right-hand corner of the UI, and in the Line chart settings set the intervals to 5 mins.
Step 5: Click on Save as Alert
Step 6: Once on the alerts page, you can name the alert. Notice that the filter will be automatically populated with our previous search criteria.
Step 7. From here we will make sure to highlight Value (if not already highlighted)
Step 8. Click on the +ForEach button and select the field you wish to group by. (In our example we will select serverHost).
Step 9: Set the Alert Conditions. We will trigger the alert if the mean value in the past 5 minutes is greater than 20.
Step 10: I will add an email here and stick with the defaults for the rest of the options in the alert and save it.
If it has been a long time and the alert has not been triggered you can always go back in and edit the alert to greater than 5 for example instead of greater than 20.
If you have updated your thresholds and are still experiencing issues please refer to our Troubleshooting Alerts article.
Comments
0 comments
Please sign in to leave a comment.