Setting up Trend Graphs

Background

GroundWork includes a plugin called check_rrd_ls.pl for calculating, checking, and displaying trends on RRD-based performance data. This article describes how to use the plugin, and provides a small sample profile. It also describes how to set up thresholds to get alarms when the predicted value is out of a specified threshold range. This can be VERY USEFUL when trying to PREDICT and AVOID outages!

Here is a sample of the graphing output:

Here is the plugin help text, for refernce:

# /usr/local/groundwork/nagios/libexec/check_rrd_ls.pl --help
check_rrd_ls v$Revision: 1.0 $ (nagios-plugins 1.4.15)
The nagios plugins come with ABSOLUTELY NO WARRANTY. You may redistribute
copies of the plugins under the terms of the GNU General Public License.
For more information about these matters, see the file named COPYING.
Copyright (c) 2004 Thomas Stocking
Perl RRD Trend check plugin for Nagios
Usage: check_rrd_ls
        [-r ] (Reference RRD name and path)
        [-R ] (Reference data store name)
        [-w <low:high>] (actual warning threshold)
        [-W ] (Trend Warning - minutes)
        [-c <low:high>] (actual critical threshold - must be outside -w range)
        [-C]  (Trend Critical - minutes. Must be higher than -W)
        [-p] (constrain to positive real values)
        [-i] (Trend interval to consider - minutes)
        [-G] (Graphing RRD toggle)
        [-D] (debug) [-h] (help) [-V] (Version)
-r, --reference_rrd
   The RRD which you want to check for trends (required)
-R, --reference_data_store
   Data store in reference rrd to check (required)
-i, --interval
   Time in minutes to consider for trending (default 60)
-w, --warning
   Actual warning threshold <low:high>
-W, --trend_warning
   Time threshold for predicted warnings (in minutes). Set this to the amount of time within which a predicted warning will result in a warning state.
-c, --critical
        Actual critical threshold <low:high>
-C, --trend_critical
   Time threshold for predicted critical errors (in minutes). Set this to the amount of time within which a predicted critical error will result in a critical state.
-p, --positive
   Set if you want to restrict data range considered to positive real values.
-D, --debug
   Turn on debugging.  (Verbose)
-h, --help
   Print help
-V, --Version
   Print version of plugin

Installation

The plugin is included, in the folder /usr/local/groundwork/nagios/libexec.

Import the profile xml by using the Profile Importer. You may use it to upload the xml as well as import it, and we suggest you also upload the perfconfig.xml files, as the graphs are a bit tricky to set up the first time. To do this, download the attached xml files to your hard disk, and then navigate to the Configuration-Profiles-Profile Importer. At the bottom of the right-hand panel, select the XML files one by one and Upload them:

Select the service-profile-ssh_trends.xml and import:

Using the Plugin

Now you can trend any RRD that exists on the disk. Ususally, this means looking at an RRD that has been set up by an existing service, like "ssh_disk_root", which uses a plugin to check disk space on the root partition in Linux systems. The main service does the check, and creates the RRD, with the datasouce name "root". The supplied profile specifies this as an argument. There are some default thresholds supplied as well, which may or may not make sense for your disk check. Let's break the arguments down, so you can adjust them. Here's the command line:

check_rrd_least_squares!ssh_disk_root!root!120!3500!6000!1440!1800

Note that the plugin will take two more arguments:
-p This makes the metric considered restricted to positive numbers. This helps to have the thresholds make sense in this case, since otherwise the trending needs to accept negative thresholds.
-G This makes the plugin generate a temporary rrd, with the same name as the reference rrd with _graph.rrd appended to the file name. This ancillary RRD is used in graphing.

Graphing the trend

The RRD that the plugin produces with the -G option can be used for graphing, but the graph command needs to specify this special RRD. If you are not familiar with the custom RRD graph commands, you might want to review the bookshelf section called "Home > USING APPLICATIONS > Configuration > Configuration Scenarios > Creating Performance Graphs". This explains the various available macros and substitutions available in this section. This example will work as imported, but you will probably want to extend it. Here are exact instructions:

This is what the perfdata configuration looks like in our example:

Once you have this working, you can copy the configuration and create more trending services, adjusting the "ssh_disk_root" and "root" strings to match the new services you make. For example, say you want to trend disk usage on the "var" partition. You will need a source service like "ssh_disk_var", which creates an RRD with the DS named "var".
Then you can create a service "ssh_disk_var_trend", with a command line like:

check_rrd_least_squares!ssh_disk_root!var!120!3500!6000!1440!1800

Then copy the ssh_disk_root_trend performance config, and make it look like this:

That should get you trending on the usage of the var partition. The same approach can be used for any data you want to make trends for.

Files

Here are the 3 files, referenced above. The original service "ssh_disk_root" and the trend service "ssh_disk_root_trend" are included, as is their respective performance data definitions. This pair of service should be a good template for you to work from in setting up other services. Happy trending!

Name Size Creator Creation Date Comment  
service-profile-ssh_disk_trends.xml 4 kB Thomas Stocking Apr 20, 2011 22:12    
perfconfig_ssh_disk.xml 2 kB Thomas Stocking Apr 20, 2011 22:12    
perfconfig_ssh_disk_root_trend.xml 1 kB Thomas Stocking Apr 20, 2011 22:12