Overview
This page reviews monitoring alert notifications.
CONTENTS | RELATED RESOURCES
|
WAS THIS PAGE HELPFUL? |
1.0 About Notifications and Downtime
A monitoring system is more useful if it can send messages when equipment being monitored fails. Notifications and escalations are about connecting the alerts that occur when monitors detect failure or over limit conditions to effective communications with configured contacts.
If a threshold value is breached for a monitored attribute, an event is generated and you can create notification rules to warn you about this type of event. The decision to send notifications is made in the service check and host check logic, and happens when a hard state change occurs (when the defined maximum checks attempts are reached), or when a host or service remains in a hard non-OK state, and also when various host and service filters are passed.
Each host and service definition has a Contact Group who receives notifications, and the contact filters determine if a contact is notified. Regarding notification methods, you can be notified of problems and recoveries pretty much anyway you want including by cellphone, email, instant message, or an audio alert.
GroundWork Monitor makes full use of the notification and escalation features of Nagios and also has also integrated the Notification Manager application (NoMa), which may be a preferable option to configure system notifications. It is important to note that Nagios notifications are only effective for Nagios monitored objects and GroundWork includes monitoring that goes beyond Nagios. We needed a way to send alerts for GDMA, Cacti, Cloud Hub discovered elements, and syslog and trap externals to Nagios, and NoMa notifications and escalations work for both Nagios and non-Nagios configured elements.
Scheduling of downtime can be very useful during system maintenance as it suppresses notifications to those entities in downtime. The GroundWork Monitor Downtimes tool is used to manage the scheduling of downtime for all monitored entities including hosts, services, host groups, and service groups for regular and recurring (e.g., daily, weekly, monthly, yearly) downtime.
During the specified downtime, alert notifications will not be sent out about the monitored entities. This is useful in the event of taking a server down for an upgrade or maintenance, etc. Scheduling downtime also avoids alarm fatigue, provides more accurate data for SLA reporting, and reinforces change control discipline. When the scheduled downtime expires, notifications for the hosts and services will resume as normal. Scheduled downtimes are preserved across program shutdowns and restarts.
Please see the following link recommended when working with very large host groups How to set downtime for large host groups. |
2.0 Notifications Subsystem (NoMa) Overview
Since inception GroundWork has relied upon the notification and escalation functionality of Nagios to notify and escalate alerts to contact groups and contacts. With the introduction of Cloud Hub and the associated RESTful API GroundWork has introduced a free standing notification and escalation subsystem which no longer requires the use of Nagios to alert contacts and contact groups. Cloud Hub bypasses Nagios for several reasons, one of which is to permit changes to the server virtual infrastructure to be made fluidly and automatically with alerts no longer needing to be processed by a batch configuration commit process. Secondly, bypassing Nagios for alerts, avoids processing overhead associated with Nagios that in some scenarios avoids a capacity limitation.
A free standing notification subsystem also permits changes to notification and escalation schedules to be made in run time by roles having lower privileges than the system administration role which makes the system more flexible to operate. NoMa also incorporates typical business rules and conditions into its user interface that are easier to understand and configure. Finally, by removing notifications and escalations from the Nagios configuration system as part of the multi-release re-engineering of that subsystem, we simplify the task of making the subsystem fully multi-tenant.
For the present release, notifications and escalations using Nagios remain available for configuration and maintenance as shown by the dotted line in the image below. This permits customers that have made investments in time or scripting for Nagios alerts to continue to use these methods and documentation is described in the How to configure notifications using Nagios reference. Alternatively all versions of GroundWork since release 6.7 can be reconfigured so that Nagios alerts are sent directly to NoMa which is configured to handle them, in order to gain the benefits previously described.
As shown below, the alert flow coming from Cloud Hub and Cacti since release 7.0.2 is processed by unique feeders that send alerts to the RESTful API for the Foundation database. In turn, NoMa subscribes to alert messages via the REST API to perform notifications and if needed escalations. This figure shows the relationship between Nagios, NoMa, and the GroundWork data management subsystem.
Figure: Simplified data flow
2.1 The NoMa Schema
By using the NoMa front-end, hosts and services can be assigned to notifications. When NoMa receives notifications either directly from Nagios or indirectly through the GroundWork REST API, it searches for matching host and service definitions in its database. If a matching setup (configured notification) has been found, escalation levels, receivers and methods are determined and notifications will be sent. The following diagram from the NoMa documentation has been modified to reflect the way the product is employed within GroundWork Monitor.
Figure: NoMa schema
2.2 NoMa and GroundWork Monitor
NoMa uses a notifier script which takes notifications from Nagios to process them as defined via the NoMa front-end. The addition of NoMa to the GroundWork platform provides an alternative way to configure and use alerting on state changes that is more flexible than the traditional Nagios notification and escalation method. NoMa requires certain libraries and perl modules to be present, all of which have been included in the build.
Standard NoMa - The standard NoMa daemon is started from init and displayed to the user as a web app from Apache, thus there is an init script and an apache configuration file. Users are authenticated in the browser via local Apache, LDAP, or other method. The NoMa daemon keeps the record of how and when it should produce alerts in a database which can be MySQL or SQLite3. NoMa presents to the user a way to select the conditions for making a notification by choosing from a list of variables that exist in Nagios/Icinga including a recipient (contact), host, service, host group, and service group.
The selections are made according to a lookup in the NDOutils database which must be present as the event broker for Nagios. The combination is stored in the NoMa database as one of many filters, also called Notifications.
A script is provided for inclusion with Nagios as the notification command that will be triggered for every host or service you desire to be controlled via NoMa.
The following updated commands are included in the GroundWork Monitor 7.2.0 release and above. Administrators should check existing commands and update if necessary for prior versions of GroundWork Monitor. |
- host-notify-by-noma
/usr/local/groundwork/noma/notifier/alert_via_noma.pl -c h -s "$HOSTSTATE$" -H "$HOSTNAME$" -G "$HOSTGROUPNAMES$" -n "$NOTIFICATIONTYPE$" -i "$HOSTADDRESS$" -o "$HOSTOUTPUT$" -t "$TIMET$" -u "$$(( $HOSTPROBLEMID$ ? $HOSTPROBLEMID$ : $LASTHOSTPROBLEMID$ ))" -A "$$([ -n "$NOTIFICATIONAUTHORALIAS$" ] && echo "$NOTIFICATIONAUTHORALIAS$" || echo "$NOTIFICATIONAUTHOR$")" -C "$NOTIFICATIONCOMMENT$" -R "$NOTIFICATIONRECIPIENTS$" - service-notify-by-noma
/usr/local/groundwork/noma/notifier/alert_via_noma.pl -c s -s "$SERVICESTATE$" -H "$HOSTNAME$" -G "$HOSTGROUPNAMES$" -E "$SERVICEGROUPNAMES$" -S "$SERVICEDESC$" -o "$SERVICEOUTPUT$" -n "$NOTIFICATIONTYPE$" -a "$HOSTALIAS$" -i "$HOSTADDRESS$" -t "$TIMET$" -u "$$(( $SERVICEPROBLEMID$ ? $SERVICEPROBLEMID$ : $LASTSERVICEPROBLEMID$ ))" -A "$$([ -n "$NOTIFICATIONAUTHORALIAS$" ] && echo "$NOTIFICATIONAUTHORALIAS$" || echo "$NOTIFICATIONAUTHOR$")" -C "$NOTIFICATIONCOMMENT$" -R "$NOTIFICATIONRECIPIENTS$"
You have to associate that command with a contact, that contact with a contact group, and the contact group with every object that will be using NoMa to alert, along with the appropriate notify conditions and time periods. On the occasion of some state change, this notification transmits the specifics to the NoMa daemon by calling the script; the daemon uses the specifics passed in the script compared to the stored filters in the NoMa database and on a match sends out a NoMa message to the contacts named in that matching filter. NoMa sends the message using a different script, one of several available in the filter setup.
NoMa in GroundWork Monitor - GroundWork has made these changes to the way NoMa is integrated into GroundWork Monitor:
- Administration of notifications, contacts, contact groups, holidays, timeframes, and methods via secure web front-end user interface.
- Easily set up host and service definitions using wildcards.
- Authentication via the GroundWork single sign on and role based access controls (JOSSO and JBoss Portal).
- Viewer for the notification log.
- Filter for non-privileged users which can only view and edit their "own" notifications.
- Notification plugins are easily created to extend the functionality.
- Apache configuration also adds a pointer to a distinct php instance, which has a php.ini stored in the same directory (for ease of locating it). This php.ini and the php that is called differs from the main one in GroundWork Monitor by including the yaml.so library. That lib is not compatible with our Java aware apps so we have a separate copy for NoMa.
- The NoMa daemon startup is included with the GroundWork control script rather than being a stand alone startup. We pay attention to the NoMa daemon generated Process ID and use that in the normal way to start and stop the process.
- The SQLite3 database choice is enforced. This is a lightweight file based engine suitable for the low volume use case. This is where NoMa keeps the notification settings and other configuration choices.
- The NoMa code is adjusted to use not the NDOUtils database for looking up filter conditions, but instead a view of the Foundation database. This means that hosts, host groups, services and service groups generated anywhere in the system can feed notifications, and that there is no sole dependence on Nagios for notifications. The recipient choice is allowed to be null, or to be a choice from the NoMa contact list, and we expect you will use the same user name (example noma) in the NoMa contact, as the contact you create in Nagios for the forwarding. It is OK to be null.
- The ability to select a Nagios contact as a condition of the notification filter from NoMa feature is lost. This element is not present anywhere but in Nagios and Monarch. It is only significant for those who must rely on Nagios settings. If this is necessary to your use case then you must import the contacts that you use in Nagios, into NoMa so they will be available for your choice.
- Preview of host and services has been deactivated for now.
- Host and Service downtime status is checked before notifications are sent out.
- Changes in notification configuration will create an audit entry via REST API.
- The NoMa daemon is set to poll for changes at 1 minute.
2.3 NoMa/Nagios Configuration Process
The NoMa user interface includes tabs for notification rules (definitions), contacts, and contact groups, holidays to indicate when a contact is not to be notified, time frames for delivery, and methods of distribution. These are all used to maintain the configuration determining who gets notified and when. The last tab in the UI, Logs, displays run time to monitor the notification and escalation process.
To use NoMa as an alerting agent for Nagios you must perform the following in NoMa: 1) Add the contacts to use as Recipient names in the notification setup, from the Nagios configuration, 2) Add the contacts to be alerted when NoMa sends out an alert, these may be different than the Nagios set, and 3) Create notifications in NoMa, associating objects (chosen from Foundation) with contacts, time periods, and conditions for alerting.
And in Nagios: 1) Add the contact to Nagios that you will associate with the forwarding of the alerting, 2) Associate the alert via NoMa script as a notification command for hosts and for services with that contact, and 3) Add that contact to the contact groups according to your needs.
3.0 The Notification Process in Nagios
Contact notifications are communications to contacts or contact groups about the status of a monitored element. Notifications can be configured for circumstances including any hard state change, if a host or service remains in a non-OK state, and for acknowledgments.
When do notifications occur?
- When a hard state change occurs and all filters are passed.
- When a host or service remains in a hard non-OK state and the time specified by the <notification_interval> option in the host or service definition has passed since the last notification was sent out (for that specified host or service).
Who gets notified?
- Host - Each host may belong to one or more host groups. Each host group has a <contact_groups> option that specifies what contact groups receive notifications for hosts in that particular host group.
- Service - Each service definition has a <contact_groups> option that specifies what contact groups receive notifications for that particular service.
What filters must be passed in order for notification to be sent out?
- Program Wide Filters
- Host and Service Filters
- Contact Filters
3.1 Notification Objects
Contact groups are associated with escalation trees which are then used for host and service notifications. Notifications in GroundWork Monitor are communications made to contacts or contact groups about the status of a monitored element. Notifications can be configured (through the contact template directives) for circumstances including any hard state change, if a host or service remains in a non-OK state, and for acknowledgments. Below the various notification objects are describe each with a process flow.
Contact Templates - Typically store generalized contact information which is consistent across multiple contacts definitions such as time periods, specific host and service states for which notifications can be sent out, and commands used to notify of a host or service problem or recovery. Specific contact information such as e-mail addresses and phone numbers would be contained in the contact definition. Contact definitions inherit generalized information from contact templates.
Figure: Contact templates
Contacts - Contacts contain individual settings defining who should get notified in the event of a problem on your network. Contact definitions also indicate which notification options will be used for the contact based on the selected contact template.
Figure: Contacts
Contact Groups - Contact Groups are definitions of one or more contacts for the purpose of sending out alert/recovery notifications to one or more contacts. Contact definitions can be grouped into contact groups typically by area of expertise or geographic location. For example you might have one contact group called network-administrators and perhaps another contact group called baltimore-support. Then, when a host or service has a problem or recovers, Nagios will find the appropriate contact groups to send notifications to and notify all contacts in those contact groups.
Figure: Contact groups
Escalations - Service escalations are used to escalate notifications for a particular service. Host escalations are used to escalate notifications for a particular host. An escalation tree is a grouping of multiple escalations which can be applied to a host, host profile, host group, or a service.
Figure: Escalations
Escalation Trees - Notification escalation trees are optional and are used in the GroundWork Monitor Nagios engine to alert users when monitoring services and hosts change between states. Escalation trees combine specified contact groups that are to be notified when a notification is escalated.
There are two methods for assigning contact groups for notifications; the first being a direct contact group assignment through a host or service template to an object (e.g. host, service) or directly in a host group or service group definition; and the second an escalation tree assignment to an object (host, host group, or service).
GroundWork Monitor does not assume that just because a notification has been sent that the underlying problem is under control. It requires the recipient of the page to log into the system and acknowledge having received a notification. If that acknowledgment does not occur within a period of time identified by the notification interval, subsequent notifications will be sent out. The monitoring System Administrator can configure how many notifications get sent at each escalation level before escalating the problem to a higher level of support.
Notifications are escalated if one or more escalation definitions matches the current notification that is being sent out. If a host or service notification does not have any valid escalation definitions that applies to it, the contact group(s) specified in either the host group or service definition will be used for the notification.
Figure: Escalation trees