Deprecated - Please install GWME-7.1.1-10 - Updated NoMa fixes instead. |
Problem
NoMa has been found to have a few bugs affecting the reliable delivery of notifications.
- GWMON-10961: The fas.executor.interrupt property should be present in our standard foundation.properties file.
- GWMON-12574: Error in handling a socket file descriptor.
- GWMON-12857: Perl warning messages produced by the NoMa daemon.
- GWMON-12863: Foundation may time out the alert_via_noma.pl script before it can even start.
- GWMON-12997: NoMa does not escalate beyond the first run of rules, even when using rollover.
- GWMON-13006: NoMa.yaml needs restricted permissions.
Solution
This patch rolls up all the available NoMa-related fixes into one patch for the GWME 7.1.1 release. Some NoMa files are replaced, and the config/foundation.properties file is augmented with a new configuration option.
The new option (fas.executor.interrupt) controls how long the Java thread that runs the alert_via_noma.pl script for CloudHub-related notifications can run. Field experience shows that the historical hardcoded timeout has been too small for reliable operation in the context of NoMa. Exposing this parameter in the config file allows it to be adjusted if necessary. The default in the config file is now set an order of magnitude larger, which should be sufficient to prevent problems even on large, heavily-loaded systems.
Installing
- Download the patch file tar archive to, for example, the /tmp directory.
- Unroll the downloaded tar archive. The patch files will appear in the TB7.1.1-9.noma_fixes/ subdirectory. Go there and run the install script.
service groundwork stop noma tar xvfz TB7.1.1-9.noma_fixes.tar.gz cd TB7.1.1-9.noma_fixes ./TB7.1.1-9_install
The original files which are affected by this patch are first backed up, then the changes are applied, and the patch directory is adjusted to reflect the application of this patch.
- Bounce NoMa, to run using the replacement files. Also bounce Foundation, to pick up the non-default setting for the fas.executor.interrupt parameter.
service groundwork restart noma service groundwork restart gwservices
Uninstalling
- Go back to the patch directory, and run the uninstall script.
service groundwork stop noma cd TB7.1.1-9.noma_fixes ./TB7.1.1-9_uninstall
The backup directory will be accessed to restore the original files, and the patch directory will be processed to reflect the restoration of those files.
- Bounce NoMa and Foundation, to revert back to the original files and the original setting for the fas.executor.interrupt parameter.
service groundwork restart noma service groundwork restart gwservices
Configuration
The Nagios commands to send notifications to NoMa us the alert_via_noma.pl script. The flag used to pass the incident id (-u) needs to be updated to point to the PROBLEMID instead of the NOTIFICATIONID. This change is necessary for subsequent notifications on the same incident to function properly.
These are the updated commands:
- host-notify-by-noma command line:
/usr/local/groundwork/noma/notifier/alert_via_noma.pl -c h -s "$HOSTSTATE$" -H "$HOSTNAME$" -G "$HOSTGROUPNAMES$" -n "$NOTIFICATIONTYPE$" -i "$HOSTADDRESS$" -o "$HOSTOUTPUT$" -t "$TIMET$" -u "$$(( $HOSTPROBLEMID$ ? $HOSTPROBLEMID$ : $LASTHOSTPROBLEMID$ ))" -A "$$([ -n "$NOTIFICATIONAUTHORALIAS$" ] && echo "$NOTIFICATIONAUTHORALIAS$" || echo "$NOTIFICATIONAUTHOR$")" -C "$NOTIFICATIONCOMMENT$" -R "$NOTIFICATIONRECIPIENTS$"
- service-notify-by-noma command line:
/usr/local/groundwork/noma/notifier/alert_via_noma.pl -c s -s "$SERVICESTATE$" -H "$HOSTNAME$" -G "$HOSTGROUPNAMES$" -E "$SERVICEGROUPNAMES$" -S "$SERVICEDESC$" -o "$SERVICEOUTPUT$" -n "$NOTIFICATIONTYPE$" -a "$HOSTALIAS$" -i "$HOSTADDRESS$" -t "$TIMET$" -u "$$(( $SERVICEPROBLEMID$ ? $SERVICEPROBLEMID$ : $LASTSERVICEPROBLEMID$ ))" -A "$$([ -n "$NOTIFICATIONAUTHORALIAS$" ] && echo "$NOTIFICATIONAUTHORALIAS$" || echo "$NOTIFICATIONAUTHOR$")" -C "$NOTIFICATIONCOMMENT$" -R "$NOTIFICATIONRECIPIENTS$"