« How Do We Explain the Unreasonable Effectiveness of IT? | Main | Sponsored Post: Datadog, Tumblr, Power Admin, Learninghouse, MongoDB, Internap, Aerospike, SignalFx, InMemory.Net, Couchbase, VividCortex, MemSQL, Scalyr, AiScaler, AppDynamics, ManageEngine, Site24x7 »
Monday
Jun152015

How to Configure Alerts to Prevent Too Many Server Notifications

This is a guest post by Ashish Mohindroo, who leads Product for Happy Apps, a new uptime and performance monitoring system.

It isn't uncommon for system administrators to receive a stream of alarms for situations that either can wait to be addressed, will remedy themselves, or simply weren't problems to begin with. The other side of the equation is missing the reports that indicate a problem that has to be addressed right away. Options for customizing server notifications let you decide the conditions that trigger alerts, set the level of alerts, and choose the recipients based on each alert's importance.

The only thing worse than receiving too many notifications is not receiving the one alert that would keep a small glitch from becoming a big problem. Preventing over-notification requires fine-tuning system alerts so that the right people find out about problems and potential problems at the right time. Here's a three-step approach to customizing server alerts:

  • First, set the conditions that will trigger alerts with various levels of importance, usually along the lines of "failure," "warning," and "information/update."

  • Second, determine the action that each trigger setting will generate, whether an automatic reboot, a specific settings adjustment, or a subsequent alert.

  • Third, decide who will receive the notifications at each level of importance, how the notification will be delivered, and whether to delay the alert or repeat it at a predetermined interval.

System monitors have evolved in recent years from the traditional command-line approach to a console-based interface. The old and new methods are represented here by the OpenNMS open-source network management system, and the new Happy Apps app-management service.

Step one: Set the conditions that trigger the alerts

Event management and notifications are one of OpenNMS's three functional areas; the others are determining network-service availability, and using SNMP to collect performance data. A tutorial on the OpenNMS wiki explains how to set up notifications in the system. Listed below is a configuration that sends an OpenNMS trap with nodelabel in the varbind:

https://s3-us-west-1.amazonaws.com/happyapps-staging/system/spud_media/53/original/serveralerts1.png?1431008154

The OpenNMS "SnmpTrapNotificationStrategy" listed here sends a trap with nodelabel in the varbind. Source: OpenNMS Wiki

Parameters defined in the trap are passed to the notification command as switches that appear in the notificationCommands.xml file:

https://s3-us-west-1.amazonaws.com/happyapps-staging/system/spud_media/54/original/serveralerts2.png?1431008242

The trap parameters are transmitted to the OpenNMS notification command as switches, as shown in the notificationCommands.xml file. Source: OpenNMS Wiki

Any condition defined as a parameter in the notification must also be defined as a switch in the notification command, the defaults for which are shown below:

https://s3-us-west-1.amazonaws.com/happyapps-staging/system/spud_media/55/original/serveralerts3.png?1431008319

Any parameter defined in the OpenNMS notification must be defined as a switch in the notification command. Source: OpenNMS Wiki

In this example, the single allowed "trapVarbind" is sent with an object id of ".1.3.6.1.4.1.5813.20.1" and an object type of "DisplayString".

Or skip the commands and use a console to configure alerts

In contrast to OpenNMS's command approach to setting notifications, the Happy Apps app-monitoring service presents all the required configuration options in clear, straightforward dialogs and menus. For example, the Add Alert dialog lets you create an alert with custom settings in just a few seconds. To open the dialog, click the Add Rule button on the right side of the Alert Rules window under the Admin tab.

appy Apps Creating Alerts - uptime performance

Open the Add Alert dialog by clicking the Add Rule button in the Alert Rules window under the Admin tab. 

The Alert Rules feature in the Happy Apps app-monitoring service lets you set notifications based on severity for individual components, groups of components, or all the components comprising an app. To create an alert, click the Admin tab at the top-right of the main Happy Apps screen, choose Alert Rules in the Admin menu bar, and select the Add Rule button.

In the Add Alert dialog, enter a name for the alert, and choose the minimum severity level that will trigger the alert (Critical, Warning, or Info). Set the minimum duration before the alert is transmitted in the text box below the severity-level drop-down menu. The default setting is Immediate, but you can enter any number of minutes to delay the notification, either by typing it or using the up and down arrow keys. To prevent the alert from sending any notifications, uncheck the Active option in the top-right corner.

https://s3-us-west-1.amazonaws.com/happyapps-staging/system/spud_media/57/original/serveralerts5.png?1431008432

Give the new alert a name, select the severity level that will cause a notification to be sent, and set the delay before sending the alert. 

Step two: Determine the actions that will be triggered by the alert

In OpenNMS, the "notifd" notification daemon runs by default and is managed by three files: destinationPaths.xml (the destination), notificationCommands.xml (the method, including Java notification methods in addition to executing external commands), and notifications.xml (the notifications themselves). When the system starts, notifd builds a list of event UEIs to listen for based on the notification-configuration settings in notifications.xml. For each even received from the "event" event daemon, a check is performed to determine whether notifications are turned on, the event settings match, and the name and value match with an event parameter.

Once all these criteria are satisfied, the notification is sent. Where, when, and how notifications are set is determined by the destinationPaths.xml file. (See step three below for more on OpenNMS's destination-path settings.)

The information in each notification includes the name, event, description, rule, destination path, subject, text message, and on/off. You can also configure an automatic acknowledgement when the outage generating the alert is resolved. For example, the nodeDown and nodeUp events, as shown in the notifd-configuration.xml code section below:

https://s3-us-west-1.amazonaws.com/happyapps-staging/system/spud_media/58/original/serveralerts7.png?1431012035

OpenNMS notifications can be configured to be sent automatically, such as when an event pair (nodeDown and nodeUp here) indicate the problem has been resolved. 

Rather than configuring notifications by editing system files, Happy Apps simplifies the process by offering a straightforward interface for setting your notification options. In the Filters section of the Add Alert dialog you determine the apps, groups, and checks that the alert will apply to. You can either choose the specific components the alert will affect, or associate the alert with all your apps, groups, or checks by checking the Select All option at the top of the list. The chosen items appear in the Selected section.

https://s3-us-west-1.amazonaws.com/happyapps-staging/system/spud_media/59/original/serveralerts6.png?1431012126

Choose the apps, groups, and checks the alert will affect in the Filters section of the Add Alert dialog, or check the Select All option to apply it to all apps, groups, or checks.

Step three: Specify the notification recipients

As the OpenNMS Wiki describes, notification recipients are specified in the destinationPaths element of the XML configuration file. The destination path is "walked" each time a notification passes the event test above. As soon as the path starts the alert goes out once the wait period has expired; the default initial delay setting is zero seconds. Subsequent notifications are sent based on the configuration settings.

By contrast, to add a recipient to receive the alert in Happy Apps, click the Add Recipient button in the Recipients section at the bottom of the Add Alert dialog. Enter the recipient's name in the user field, which automatically searches your contacts as you type. Select the notification method in the drop-down menu that appears once a name is selected: email, SMS, both, or neither. Check the Notify on Change box to have the alert sent to the person when an incident escalates. Select the Notify on Close box to alert the user when the check is passing again and the incident closes.

https://s3-us-west-1.amazonaws.com/happyapps-staging/system/spud_media/60/original/serveralerts8.png?1431012204

When you add an alert recipient, you enter a name from your contacts, choose the notification method, and set the criteria that will trigger the alert. 

When the time comes for an alert to be altered, you can do so by clicking the pencil icon to the right of its entry on the Alert Rules screen. Make the required changes in the Edit Alert dialog that opens, and then click the Save button in the bottom-left corner of the window. To delete an alert, simply click the trashcan icon next to the Edit button in its listing on the main Alert Rules screen.

One of the simplest and most-efficient methods of configuring system alerts is by using the Happy Apps app-management service. All checks performed on your apps are collected in easy-to-read reports that can be analyzed to identify repeating patterns and performance glitches over time.

Happy Apps is a robust app-management service to support SSH and agent-based connectivity to all your apps on public, private, and hybrid clouds. The service provides dependency maps for determining the impact your IT systems will have on other apps. 

 

Reader Comments (2)

Well,

I am a bit sad to see a product comparison biased by the author working for the company of one of the exposed products...

If you really want to help SMC/TMC/NOC or other people working on fault and alerting management, you should talk about the facts and the main guidelines to achieve your goal and not saying "blah blah Haapy Apps" is better because everything is in clear...

My 2 cents ;)

July 6, 2015 | Unregistered Commenterdamajor

What about baseline thresholding? Instead of setting static thresholds, something like NetCrunch can alert you when performance parameters exceed historical norms. This way you can be alerted if CPU usage exceeds the historical norm by 20%. You can even have it ignore the spikes, is you're only alerted if that condition is met and maintained over a specific time or number of checks.

July 8, 2015 | Unregistered CommenterMichael Rojek

PostPost a New Comment

Enter your information below to add a new comment.
Author Email (optional):
Author URL (optional):
Post:
 
Some HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>