GroundWork includes a plugin called check_rrd_ls.pl for calculating, checking, and displaying trends on RRD-based performance data. This article describes how to use the plugin, and provides a small sample profile. It also describes how to set up thresholds to get alarms when the predicted value is out of a specified threshold range. This can be VERY USEFUL when trying to PREDICT and AVOID outages!
Here is a sample of the graphing output:
Here is the plugin help text, for refernce:
# /usr/local/groundwork/nagios/libexec/check_rrd_ls.pl --help check_rrd_ls v$Revision: 1.0 $ (nagios-plugins 1.4.15) The nagios plugins come with ABSOLUTELY NO WARRANTY. You may redistribute copies of the plugins under the terms of the GNU General Public License. For more information about these matters, see the file named COPYING. Copyright (c) 2004 Thomas Stocking Perl RRD Trend check plugin for Nagios Usage: check_rrd_ls [-r ] (Reference RRD name and path) [-R ] (Reference data store name) [-w <low:high>] (actual warning threshold) [-W ] (Trend Warning - minutes) [-c <low:high>] (actual critical threshold - must be outside -w range) [-C] (Trend Critical - minutes. Must be higher than -W) [-p] (constrain to positive real values) [-i] (Trend interval to consider - minutes) [-G] (Graphing RRD toggle) [-D] (debug) [-h] (help) [-V] (Version) -r, --reference_rrd The RRD which you want to check for trends (required) -R, --reference_data_store Data store in reference rrd to check (required) -i, --interval Time in minutes to consider for trending (default 60) -w, --warning Actual warning threshold <low:high> -W, --trend_warning Time threshold for predicted warnings (in minutes). Set this to the amount of time within which a predicted warning will result in a warning state. -c, --critical Actual critical threshold <low:high> -C, --trend_critical Time threshold for predicted critical errors (in minutes). Set this to the amount of time within which a predicted critical error will result in a critical state. -p, --positive Set if you want to restrict data range considered to positive real values. -D, --debug Turn on debugging. (Verbose) -h, --help Print help -V, --Version Print version of plugin
The plugin is included, in the folder /usr/local/groundwork/nagios/libexec.
Import the profile xml by using the Profile Importer. You may use it to upload the xml as well as import it, and we suggest you also upload the perfconfig.xml files, as the graphs are a bit tricky to set up the first time. To do this, download the attached xml files to your hard disk, and then navigate to the Configuration-Profiles-Profile Importer. At the bottom of the right-hand panel, select the XML files one by one and Upload them:
Select the service-profile-ssh_trends.xml and import:
Now you can trend any RRD that exists on the disk. Ususally, this means looking at an RRD that has been set up by an existing service, like "ssh_disk_root", which uses a plugin to check disk space on the root partition in Linux systems. The main service does the check, and creates the RRD, with the datasouce name "root". The supplied profile specifies this as an argument. There are some default thresholds supplied as well, which may or may not make sense for your disk check. Let's break the arguments down, so you can adjust them. Here's the command line:
check_rrd_least_squares!ssh_disk_root!root!120!3500!6000!1440!1800
/usr/local/groundwork/rrd/$HOSTNAME$_$ARG1$.rrd
note Note: This means you can check any rrd on the disk, even one created by another tool. Just adjust the command definition to look at an RRD in another location. |
rrdtool info <rrdname>
Don't forget to load the GroundWork Environment before trying to use this command at the command line on your groundwork server (source /usr/local/groundwork/scripts/setenv.sh) |
Note These numbers are in relation to the metric as stored in the DS. That may or may not be the same as the numbers used in the original service as thresholds, since performance data may be collected and graphed in different units of measure than is used for thresholds. |
Note that the plugin will take two more arguments:
-p This makes the metric considered restricted to positive numbers. This helps to have the thresholds make sense in this case, since otherwise the trending needs to accept negative thresholds.
-G This makes the plugin generate a temporary rrd, with the same name as the reference rrd with _graph.rrd appended to the file name. This ancillary RRD is used in graphing.
The RRD that the plugin produces with the -G option can be used for graphing, but the graph command needs to specify this special RRD. If you are not familiar with the custom RRD graph commands, you might want to review the bookshelf section called "Home > USING APPLICATIONS > Configuration > Configuration Scenarios > Creating Performance Graphs". This explains the various available macros and substitutions available in this section. This example will work as imported, but you will probably want to extend it. Here are exact instructions:
This is what the perfdata configuration looks like in our example:
Once you have this working, you can copy the configuration and create more trending services, adjusting the "ssh_disk_root" and "root" strings to match the new services you make. For example, say you want to trend disk usage on the "var" partition. You will need a source service like "ssh_disk_var", which creates an RRD with the DS named "var".
Then you can create a service "ssh_disk_var_trend", with a command line like:
check_rrd_least_squares!ssh_disk_root!var!120!3500!6000!1440!1800
Then copy the ssh_disk_root_trend performance config, and make it look like this:
That should get you trending on the usage of the var partition. The same approach can be used for any data you want to make trends for.
Here are the 3 files, referenced above. The original service "ssh_disk_root" and the trend service "ssh_disk_root_trend" are included, as is their respective performance data definitions. This pair of service should be a good template for you to work from in setting up other services. Happy trending!