Creating Performance Graphs

1.1 Performance Graphs Process
1.2 Performance Data Handling in GroundWork Monitor
1.3 Performance Data Handling Parameters
1.4 Performance Process Data Flow
1.5 Implementing String Lists in Performance Configuration
1.6 Performance Testing and Debugging
1.7 Importing and Exporting Performance Configuration
1.8 Importing Exported File

2.0 Creating Performance Graphs

2.1 Creating and modifying graphs
2.2 Customizing RRDtool Graph Command

3.0 Creating Remote RRD Graphs

3.1 Background
3.2 Requirements
3.3 Configuration Steps
3.4 Considerations
3.5 Maintenance
3.6 References

1.0 About Performance Graphs

The Performance option in GroundWork Monitor (Configuration > Performance) enables users to generate performance graphs with data gathered from the Nagios monitoring system.

1.1 Performance Graphs Process

Nagios is configured to pass service check performance data to a special event handler. The event handler gets chart parameters from a configuration database, interprets the performance data, then uses RRDtool to create or update RRD (Round Robin Database) data files each time the service check is executed. CGI programs are provided to display the graphs. A default configuration database that matches installed GroundWork service profiles is delivered with the GroundWork Monitor package. This data can be modified by accessing the Performance option.

Once RRD databases are created, there are several methods for displaying this data. In the Status page you can view performance graphs under a service if an RRD is associated with that host and service. The figure below displays a performance graph in Status.

No additional configuration other than the procedure listed in this section is required. You may also show these as links off the Nagios service detail pages. In order to do this, you must create Nagios extended service information links and install graphing CGI programs. Generic versions of these are included with GroundWork Monitor.

Figure: Performance Graphs as seen in Status

1.2 Performance Data Handling in GroundWork Monitor

Any checks that are processed by Nagios may return performance data. The link Nagios Plugin Development Guidelines defines the format for plugin performance data.

1.2.1 Performance Data Handling Process

In the GroundWork Monitor package, Nagios writes all data (plugin output including performance data) to the service-perfdata.dat file. Every 300 seconds Nagios runs the launch_perfdata_process command, which runs the launch_perf_data_processing script, which starts the process_service_perfdata_file script if it is not already running, and that script reads a renamed copy of the service-perfdata.dat file.

In the service performance data file /usr/local/groundwork/nagios/var/service-perfdata.dat the service performance data file processing interval 300 and the service performance data file processing command launch_perfdata_process are configurable in the configuration page under Control > Nagios Main Configuration (on page 3).

The launch_perfdata_process command invokes the script process_service_perfdata_file which writes performance data into two places:

RRD Files - The script creates an RRD file whose name is a concatenation of the host name and the service name. The data in these RRDs are presented graphically in both the Status and Performance applications.
Foundation Database - A summary of the performance data is also sent to Foundation which has a listener for performance data. How the data is persisted is described in detail below.

At the end of processing Nagios reopens the service-perfdata.dat file in append mode, which either continues to collect data in an unprocessed file, or starts a new file if the previous file was renamed for processing by the launch_perf_data_processing script.

1.2.2 RRD Files for Performance Data

Performance data is stored into RRD files. Format and data aggregation information can be found at http://oss.oetiker.ch/rrdtool/doc/rrdtool.en.html.

1.2.3 Performance Data in Foundation

Performance data is sent to Foundation as XML in a Web Services call, for efficient bulk-data transfer.

The process_service_perfdata_file script includes the host name, service description, label, timestamp and performance value in the post to Foundation.

The business object in Foundation handling the incoming performance data is by default configured to average the performance data values for the check over a day. Along with the daily average, the maximum and minimum values for the day are also stored. Through configuration the range for average can be changed from a day to an hour interval. For details about changing the interval you can refer to GroundWork Foundation in the Developer Reference section.

The performance data values are stored in the LogPerformanceData table. For each service that provides performance data an entry per day is created.

1.2.4 Reporting on Performance Data

The Reports option in GroundWork Monitor includes two reports with performance data stored in Foundation by host group or by host. The reports allow drill-down to performance data by individual services. These reports are located under Reports > BIRT Report Viewer > Performance.

Performance Report by Host (epr-host): This report shows the performance indicators identified by the Label value across a selected Host and Time Range.
Performance Report by Host Multi Variable (epr-host multi variable): Charts a report with up to two individually selected Hosts, units, and performance indicators present in the selected Hosts.
Performance Report by Hostgroup (epr-hostgroup): This report shows the performance indicators identified by the Label value across a selected Host Group and Time Range.
Performance Report by Hostgroup Multi Variable (epr-hostgroup multi variable): This report charts long-term performance trends for performance data for a selected Host Group. This report can help identify areas where additional capacity is needed due to steady increases in load or demand.
Performance Report by Hostgroup Top Five (epr-hostgroup topfive): This report charts a selected performance indicator present in the selected Host Group.

1.3 Performance Data Handling Parameters

Nagios has the ability to process performance data from both hosts and services. Service checks are executed at regular intervals. Host checks, on the other hand, may never be executed at all. Nagios only executes host checks when it is doing dependency calculations. Therefore the sporadic nature of host checks renders Host performance data unsuitable for graphing. This is why, in GroundWork Monitor, we only concern ourselves with service performance data.

The Configuration page can to be used to properly configure the Nagios Main Configuration file to enable performance data handling. This should be already set up by the GroundWork installer. But these are the crucial configuration parameters. The image shows the parameters in the Nagios Main Configuration screen that enable performance data handling.

Most of the plugins in the GroundWork Monitor distribution output formatted performance data. The standard that defines how this data should be formatted is in the Nagios Plugin Development Guidelines.

Figure: Configuring Performance Data

1.4 Performance Process Data Flow

When Nagios schedules a plugin to execute, the plugin returns two types of data on standard output. Both of these fields are in the same line. These two fields are separated by the pipe operator "|". Everything before the pipe operator Nagios considers to be status text, and is inserted in the status field of the Nagios (and Status) user interface. The status text is also inserted into the Nagios macro $SERVICEOUTPUT$ . The text that follows the pipe operator is inserted into the macro $SERVICEPERFDATA$ and is also written into the service-perfdata.dat file.

A typical plugin output should look something like this:

OK - load average: 0.35, 0.29, 0.20 | load1=0.350;5.000;10.000;0; load5=0.290;4.000;6.000;0; load15=0.200;3.000;4.000;0;

Everything before the pipe operator is status text and everything after it is formatted performance data.

If we properly configured the service_perfdata configuration directives, Nagios takes this plugin output and records it in a log file:

/usr/local/groundwork/nagios/var/service-perfdata.dat

At 5 minute intervals, this can be adjusted using the service_perfdata_file_processing_interval, Nagios runs the performance eventhandler command launch_perfdata_process. The process_service_perfdata_file script that it eventually launches in turn performs several tasks. It reads from the service-perfdata.dat file to extract the performance data Nagios has written there. For each service check result it finds there, it does a database lookup of the service name in the performanceconfig table in the Monarch database. This table (indexed by service name) contains the unique RRD create commands and RRD update commands appropriate for the data returned by that particular plugin.

The process_service_perfdata_file script uses this information to create the RRDs in the first instance, then to update the data in them on subsequent executions of the service.

Those RRDs are read by the CGI specified in the performanceconfig entry (these can be customized) and then presented for viewing in the Status application.

There is also a graphical user interface on the performanceconfig table, so the operator can adjust RRD create and update strings, or even specify which CGI will be used to graph them.

The process_service_perfdata_file script does more, however. Whenever it has to create a new RRD, it writes the path and filename of that RRD into the datatype table in the Monarch database, and makes a corresponding entry in the host_service table. These tables are used by the Performance application to locate the various RRDs in the system. Performance is able to read in the data from multiple RRDs and consolidate that data into a single graph.

This event handler also does a Web Services post to Foundation, which inserts summary performance data into the GWCollageDB for use by the EPR reports.

Finally, process_service_perfdata_file has the ability to generate a debug log file which is very helpful in diagnosing RRD problems in the system. The file is named process_service_perfdata_file.log and logging to it can be turned on and off using the debug_level in the perfdata.properties file. To increase debug logging, edit perfdata.properties and change this line:

debug_level=1

to this:

debug_level=3

The logging is quite voluminous and this file can get to be very large in a relatively short period of time. Remember to turn this off (by setting debug_level=1) when you are finished troubleshooting your RRD problem and then kill the process_service_perfdata_file script so it gets restarted and picks up the new value.

This occurs automatically the next time Nagios is restarted which happens during a Commit, or you can force it manually with the following command:
service groundwork restart nagios

Figure: Performance Process Data Flow

1.5 Implementing String Lists in Performance Configuration

Under Configuration > Performance, set up one or more service-host entries for the passive services you defined. You may create these in any manner you like, but ensure that the RRD Create Command entry is of the following form:

$RRDTOOL$ create $RRDNAME$ --step 300 --start n-1yr $LISTSTART$DS:$LABEL#$:GAUGE:900:U:U$LISTEND$ RRA:AVERAGE:0.5:1:8640 RRA:AVERAGE:0.5:12:9480

Basically, everything between $LISTSTART$ and $LISTEND$ will be replicated for each label=value pair in the performance data. You may, of course, change the DS type from GAUGE to any supported value, or change any of the RRA parameters. Similarly, ensure that the RRD Update Command is of the following form:

$RRDTOOL$ update $RRDNAME$ -t $LABELLIST$ $LASTCHECK$:$VALUELIST$ 2>&1

The $LABELLIST$ and $VALUELIST$ macros will be expanded to the derived lists of labels and values parsed from the performance data.

1.6 Performance Testing and Debugging

Use the following steps to ensure that the performance handler is working as expected. The performance handler log file is /usr/local/groundwork/nagios/var/log/process_service_perfdata_file.log, as configured in the perfdata.properties file. At a high debug_level setting, the following information is entered in the log for each plugin execution:

Performance eventhandler execution time stamp
Host name
Service name
Last check time in UTC format
Status Text
Performance data string
Parsing results
Interpreted RRD create command string
Interpreted RRD update command string
RRD command results
Execution time

1.6.1 Service Entry Log Results

To debug a performance handler problem, look at the log results for your Service entry. Check the following steps:

Service is being parsed properly
The configuration entry information is correct
Performance or status information is being parsed correctly
The correct entry of the performanceconfig database is used
RRD commands are properly interpreted
RRD commands are executing without an error message

1.6.2 Chart Generation Error

To debug a chart generation error, check the following:

Make sure the RRD is being generated for your Host/Service. RRDs are stored in the directory: /usr/local/groundwork/rrd
Check to make sure you have the correct CGI program referenced in the Service extended information template
Make sure the browser is opening the referenced CGI program when you click on the graph icon
Make sure the CGI program references the correct data set names defined in the RRD creation command

The logging is quite voluminous and this file can get to be very large in a relatively short period of time. Don't forget to turn this off (by setting debug_level=1) when you are finished troubleshooting your RRD problem and then kill the process_service_perfdata_file script so it gets restarted and picks up the new value.

This occurs automatically the next time Nagios is restarted which happens during a Commit, or you can force it manually with the following command:

service groundwork restart nagios

1.7 Importing and Exporting Performance Configuration

Definitions in the performance configuration database may be exported to transfer to another system or for backup purposes. To export the entire performance configuration database, select the Export All button at the top of the Performance Configuration utility page. To export a specific performance configuration entry, select the Export button for that entry. The exported file is placed by default in the /tmp directory. This is an XML file describing each field entry. A sample file is shown below.

<groundwork_performance_configuration>
<service_profile name="gwsp-service_ping">
<graph name="Ping response time ">
<host>*</host>
<service regx="1"><![CDATA[Host Alive]]></service>
<type>nagios</type>
<enable>1</enable>
<label>Ping Response Time</label>
<rrdname><![CDATA[/usr/local/groundwork/rrd/$HOST$_$SERVICE$.rrd]]></rrdname>
<rrdcreatestring><![CDATA[$RRDTOOL$ create $RRDNAME$ --step 300 --start n-1yr DS:number:GAUGE:900:U:U RRA:AVERAGE:0.5:1:2880 RRA:AVERAGE:0.5:5:4032 RRA:AVERAGE:0.5:15:5760 RRA:AVERAGE:0.5:60:8640]]></rrdcreatestring>
<rrdupdatestring><![CDATA[$RRDTOOL$ update $RRDNAME$ $LASTSERVICECHECK$:$VALUE1$ 2>&1]]></rrdupdatestring>
<graphcgi><![CDATA[/nagios/cgi-bin/number_graph.cgi]]></graphcgi>
<parseregx first="0"><![CDATA[]]></parseregx>
<perfidstring></perfidstring>
</graph>
</service_profile>
</groundwork_performance_configuration>

1.8 Importing Exported File

To import an exported file, execute the following script which will read the exported XML file and insert the entry into the performance configuration database.

/usr/local/groundwork/tools/profile_scripts/import_perfconfig.pl <xml_file_name>

2.0 Creating Performance Graphs

The following text describes the procedure to create your own performance graphs or if you wish to modify an existing graph.

2.1 Creating and modifying graphs

Select Configuration > Performance.
Select Create New Entry or select the Copy option to copy an existing service definition that come close to what you want. Continue with the steps below to enter or edit the service definition properties.
In most cases, you will want to graph either a number or a percent. The easiest way to do this is to make a copy of an existing configuration entry:
- Copy the GENERIC_NUMBER or GENERIC_PERCENT entry in the performance configuration database, then rename the service name to your entry.
- If performance data is already being generated, this is all you need to do to create the RRD.
- If performance data is not being generated, enter the status parsing regular expression to parse the number or percent from the output text.
- Specify either number_graph.cgi or percent_graph.cgi the CGI graphing program in the Configuration extended information service template. These graphing programs are installed with GroundWork Monitor. If you wish, you can also specify the graph.gif icon. You will need to commit the changes to Nagios in order for the service CGI to appear on the Nagios interface.

Table: Definition Properties

Export File	The entry for this filed is the name of a file in the `/tmp` directory into which the exported performance config entry will be written. This is just the name of the file itself, not including any preceding path.
Graph Label	Enter a graph label to define the heading for this graph's window in Status viewer.
Service	Enter a service to define the service name for Performance Graphs. The service is a string or expression which must match the name of the service in order for this performance config entry to be applied during performance-data processing. Unless you are also specifying "Use Service as a Regular Expression", enter the exact service name to which this entry applies. The service name is case-sensitive. If you have entries for both a specific literal service name and a regular expression that matches the service name, the entry for the specific service name will take precedence.
Use Service as a Regular Expression	If you want this performance config entry to match multiple service names (e.g., `snmp_if_interface_1`, `snmp_if_interface_2`, ...), check the "Use Service as a Regular Expression" option. You can then include regular-expression matching syntax in the Service field, and it will be used as a regular expression instead of a simple literal string for matching purposes. Except for service names that match a separate literal-string Service entry, all service names that match this entry's Service field will use this entry to create and update RRDs, and to produce graphs. Be careful with this; if a service name matches the Service field in more than one regular-expression performance-config entry, the system might pick the wrong one to use.
Host	Enter a host name. The host is either a simple literal hostname to match for this entry to be applicable to a service, or a single asterisk (``) character to match all hostnames. If you have an entry for both a specific hostname and a wildcarded (``) hostname for the same service-name matching, the entry for the specific hostname will take precedence.
Status Text Parsing Regular Expression	This field is used when you are working with a plugin that does not return properly formatted performance data. This field is used in conjunction with the next field "Use Status Text Parsing instead of Performance Data" to enable Perl regular-expression-based parsing of the plugin-output status text to find performance metrics of interest. For example, using the regular expression "`(\d+) (\d+)`" (without the enclosing quotes) will parse through the status text looking for the occurrence of two single- or multiple-digit numbers separated by a single space character. These numbers would be captured as $VALUE1$ and $VALUE2$ and could be passed to the RRD create and/or update commands using those variable names. The end result would be that numbers were extracted from the status text field of the plugin output and inserted into performance graphs despite the fact that the plugin returned no performance data in the standard plugin-output format for such data. Note: Parentheses in a regular expression are needed to specify that the string or value that matches the enclosed part of the regular expression is to be captured into a variable. In the example shown, those variables would be $VALUE1$ and $VALUE2$.
Use Status Text Parsing instead of Performance Data	This field enables (1) or disables (0) the status text parsing function which is defined by the "Status Text Parsing Regular Expression" described above.
RRD Name	This field defines the absolute pathname of the RRD file that stores accumulated performance data, for each host-service that matches this entry. The following macros may be used as part of the path or filename, to make the RRD file unique to the host-service: $HOST$ Name of the host whose service output is being handled by the performance-data processor. $SERVICE$ Name of the service whose output is being handled by the performance-data processor. For example, the following string will create an RRD with the Host and Service name in the RRD file: `/usr/local/groundwork/rrd/$HOST$_$SERVICE$.rrd` The performance-data processor will automatically make certain adjustments to substituted values in order to guarantee that a valid filename is produced, so the final result might be slightly different from what you specify here.
RRD Create Command	Enter this command to define the RRD creation string. The command is used to create a new RRD file if performance data comes in for a host-service for which an RRD file does not already exist. You can reference the RRDtool documentation for RRD file creation options. The following macros may be used: $RRDTOOL$ RRDtool program, including file location. $RRDNAME$ Name of the RRD file, as defined in this configuration tool. Example of an RRD Create Command $RRDTOOL$ create $RRDNAME$ --step 300 --start n-1yr DS:number:GAUGE:900:U:U RRA:AVERAGE:0.5:1:2880 RRA:AVERAGE:0.5:5:4032 RRA:AVERAGE:0.5:15:5760 RRA:AVERAGE:0.5:60:8640
RRD Update Command	Enter this command to define the RRD update string. The RRD Update Command is used to insert a new set of performance-data values into the RRD file. The command must include an associated timestamp used to position the new data in the RRD file. In each update, the timestamp value must always be larger than the timestamp given in any previous update, or an error will result. Such errors can therefore occur if a performance-data file is reprocessed, but in that case they can be ignored. See the RRDtool documentation for RRD file update options. In addition to the macros mentioned in the help messages for earlier items, the following macro may be used: $LASTCHECK$ Service-check time that the plugin executed, in UTC format (whole seconds since the system time epoch). Example of an RRD Update Command This example updates the RRD file with the first value from the performance data string or status text parse: `$RRDTOOL$ update $RRDNAME$ $LASTCHECK$:$VALUE1$ 2>&1`
Custom RRDtool Graph Command	This command defines how the graph for this service is drawn in the Status viewer. If no graph command is specified here, the graph command defined for the DEFAULT service will be used instead. This setting also affects graphing in the Reports > Performance View application. There are three host-view options in that application, namely "Expanded", "Consolidated by host", and "Consolidated". The Custom RRDtool Graph Command you specify here only affects the appearance of the RRD graph when using the Expanded view. To change the appearance of the graph, see the full documentation available through the Help button on this page, under "Custom RRDtool Graph Command".
Enable	The Enable option, if checked, enables this performance-config entry. If disabled (unchecked), RRD creation and updating will not be executed for this entry.

Figure: Creating Performance Graphs

2.2 Customizing RRDtool Graph Command

To change the appearance of the graph, paste in a command that produces a graph from the command line to the Custom RRD Graph Command field. Any valid command will work, and any rrd accessible will produce a graph, even for an unrelated rrd. However, this is probably not what is desired, so substitution of certain strings is used to produce the desired effect. This process is triggered if the command inserted contains the string: "rrdtool graph"

Substitutions:

rrd_source is replaced by the rrd selected by the cgi for the host and service
ds_source_0 is replaced by the first DS
ds_source_N is replaced by the Nth DS in the RRD, where N is an integer

This allows a fair amount of flexibility in specifying what is to be graphed and how.

But there is more; if you place a $LISTSTART$ $LISTEND$ pair in the rrdtool graph command the following values will be substituted:

$DEFLABEL#$ will become the RRD:DS string, repeated in the supplied context as many times as there are DS in the RRD,
$CDEFLABEL#$ will be a short string, to be used to serialize the CDEFS. The strings are taken from the sequence a, b, c, ..., z, aa, ab, ac, ..., az, ba, bb, bc, ..., and so forth.
$DSLABEL#$ will be the DS name, repeated in context as above.
$COLORLABEL#$ will be a color selected from the @colors array, the same one used to select colors in the default graphs. Use it if you don't know (or don't care) what colors get shown.

What this means is that a custom command like this:

/usr/local/groundwork/common/bin/rrdtool graph - \
--imgformat=PNG \
--title="All Disk Partitions" \
--rigid \
--base=1000 \
--height=120 \
--width=500 \
--alt-autoscale-max \
--lower-limit=0 \
--vertical-label="Kilobytes" \
--slope-mode \
$LISTSTART$ \
DEF:$DEFLABEL#$:AVERAGE \
CDEF:cdef$CDEFLABEL#$=$CDEFLABEL#$ \
LINE:cdef$CDEFLABEL#$$COLORLABEL#$:"$DSLABEL#$" \
GPRINT:cdef$CDEFLABEL#$:LAST:" Current\:%8.2lf %s" \
GPRINT:cdef$CDEFLABEL#$:AVERAGE:" Average\:%8.2lf %s" \
GPRINT:cdef$CDEFLABEL#$:MAX:" Maximum\:%8.2lf %s" \
$LISTEND$

Ends up looking something like this:

/usr/local/groundwork/common/bin/rrdtool graph
/usr/local/groundwork/apache2/htdocs/performance/rrd_img/view_1193523936_localhost_All-Partitions_h_1.png
--imgformat=PNG --title="All Disk Partitions" --rigid --base=1000
--height=120 --width=500 --alt-autoscale-max --lower-limit=0
--vertical-label="Kilobytes" --slope-mode
DEF:a=/usr/local/groundwork/rrd/localhost_All_Partitions.rrd:_boot:AVERAGE
CDEF:cdefa=a,8,* LINE:cdefa#8DD9E0:"_boot" GPRINT:cdefa:LAST:"
Current\:%8.2lf %s" GPRINT:cdefa:AVERAGE:" Average\:%8.2lf %s"
GPRINT:cdefa:MAX:" Maximum\:%8.2lf %s"
DEF:b=/usr/local/groundwork/rrd/localhost_All_Partitions.rrd:_dev_shm:AVERAGE
CDEF:cdefb=b,8,* LINE:cdefb#64A2B8:"_dev_shm" GPRINT:cdefb:LAST:"
Current\:%8.2lf %s" GPRINT:cdefb:AVERAGE:" Average\:%8.2lf %s"
GPRINT:cdefb:MAX:" Maximum\:%8.2lf %s"
DEF:c=/usr/local/groundwork/rrd/localhost_All_Partitions.rrd:root:AVERAGE
CDEF:cdefc=c,8,* LINE:cdefc#D3DB00:"root" GPRINT:cdefc:LAST:"
Current\:%8.2lf %s" GPRINT:cdefc:AVERAGE:" Average\:%8.2lf %s"
GPRINT:cdefc:MAX:" Maximum\:%8.2lf %s" --start 1194999156 --end
1195002756 --height 200 --width 600

3.0 Creating Remote RRD Graphs

3.1 Background

GroundWork Monitor includes a feature known as the Remote RRD Graph Web Service (RRGWS). This section describes this web service, its uses, and configuration.

The RRGWS was developed to support large configurations spanning multiple GroundWork monitoring servers. In large configurations, the volume of data transferred can become an issue. The largest volumes of data are those associated with performance measures, as this information is useful only if collected regularly. Status information, by contrast, is generated only when the state of the object changes. In the context of multiple GroundWork servers, we can leverage this dynamic to greatly reduce the data transfers between child (polling) servers and parent systems.

Figure: Polling Servers

In this scenario, the polling servers are doing all the active checks, and are collecting performance data. When a host or service changes state, this information is forwarded to the GroundWork Monitor server, but the results of individual checks that do not contain a state change (the vast majority) are not forwarded. All the performance graphs are created and hosted on the child server, yet the user primarily (or even exclusively) uses the GroundWork server to interact with the system. The graphs are displayed from the child server on the GroundWork server by means of the RRGWS.

This approach entails several components. The child must publish the location of the RRD graphs to the GroundWork server for display. This is done by configuring the child server to forward the location of the RRDs, and the graph commands to the GroundWork server. The GroundWork server will similarly be configured, for those services whose graphs are hosted on a child server, to not store the local RRD graph commands or path locations. It will store this information only for checks it performs locally, such as the local GroundWork server checks. This is done automatically.

The child server must be configured to send only changes of state to the GroundWork server. This is done using the state changes detected by the status feeder, and by periodically sending heartbeat messages from the child to the GroundWork server, posting the current state of all objects. The heartbeat portion of this operation is expensive, but the interval and rate of transmission can be tuned, so it is deemed acceptable considering the benefit. No one wants to be looking at stale data, so the system should be able to accommodate the trade-off between performance and data age. In any case, state changes will always be forwarded immediately, so critical data will remain up-to-date.

Figure: Remote RRD Graph Web Service

3.2 Requirements

You must have GroundWork Monitor set up and operating as a child server, accepting configuration files from a GroundWork server. It is helpful to configure the child server to forward results to the GroundWork server, as well, although we will not cover this in detail, as the commands are changed.

The child server must be able to contact the GroundWork server on the following network ports:

5667/tcp (for posting results)
4913/tcp (for posting RRD graph locations)

The GroundWork server must be able to contact the child server on the following network ports:

22/tcp (ssh for configuration transfers)
80/tcp or 443/tcp for web services

3.3 Configuration Steps

3.3.1 GroundWork Server

There is no special configuration to be done on the GroundWork server.

3.3.2 Setup Forwarding of RRD Locations

The child server is configured with the following procedure:

Edit the file:

/usr/local/groundwork/config/perfdata.properties

Un-comment the section:

#   <foundation_host MYPARENTHOST>
#       foundation_port = 4913
#       child_host      = MYHOST
#       send_RRD_data   = true
#       send_perf_data  = false
#   </foundation_host>

to:

    <foundation_host MYPARENTHOST>
        foundation_port = 4913
        child_host      = MYCHILDHOST
        send_RRD_data   = true
        send_perf_data  = true
    </foundation_host>

where:
MYPARENTHOST is the DNS name of the parent server (must be resolvable from the child server).
MYCHILDHOST is the DNS name of the child server.

This CANNOT be localhost or 127.0.0.1. It must be resolvable from the parent server.

Optionally, you may decide not to send the perf_data to the parent. If you send it, this data is posted directly to Foundation, and is used in the performance reports under the Reports tab. These are the EPR reports found under Reports > BIRT Report Viewer > Performance Reports: epr host, epr host multi variable, epr hostgroup, epr hostgroup multi variable, and epr hostgroup topfive. In contrast, the Reports > Performance View tool does not support the Remote RRD configuration, as it relies on direct local access to RRD files. There is a performance load introduced on the parent by sending the detailed perf_dat however, this is not anticipated to be a large load, since the data is bundled. The advantage is that reports run on the parent will have all performance data from the child. You may also choose to keep the performance data on the child, or not to post it at all.

The recommended configuration is to send the perf_data to the parent if you have one or two child servers. If you have more than two child servers, you should probably consider another configuration where this data is not sent, as the load on the parent will likely be significant.

To not send the perf_data to the parent, change the line in the section above:
from

send_perf_data   = true

to

send_perf_data   = false

DO NOT remove the following section as it is required for operation. You can, however, choose to set send_perf_data=false if you do not want the perf data to be sent to Foundation on the server on which that data originates.

    <foundation_host localhost>
        foundation_port = 4913
        child_host      = ""
        send_RRD_data   = true
        send_perf_data  = true
    </foundation_host>

To have these changes take effect, you can:

Perform a configuration change (build instance for the group for this child server on the GroundWork server) or,

Kill the process for the process_service_perfdata_file program, at the command prompt, type:

ps -ef | grep process_service_perfdata_file

You will see output similar to the following:

nagios   23260    1  0  Jun14 ?        00:00:42  /groundwork/perl/bin/.perl.bin -I/usr/local/groundwork/perl/lib/5.8.8 -I/usr/local/groundwork/perl/lib/site_perl/5.8.8 -I/usr/local/groundwork/nagios/libexec -I/usr/local/groundwork/perl/custom/lib/5.8.8 -I/usr/local/groundwork/perl/custom/lib/site_perl/5.8.8 -w -- /usr/local/groundwork/nagios/eventhandlers/process_service_perfdata_file
root     28917  4118  0 14:06 pts/1    00:00:00 grep process_service_perfdata_file

In this case, the PID is 23260 for the process. Kill it by typing:

kill 23260

The process will automatically restart in a few minutes. After approximately 10 minutes, the _GroundWork_ server interface will begin to show you graphs generated on the child server.

3.3.2 Set Up Heartbeat Operation

The child server must be set up to forward periodic updates for all hosts and services. This is done with the following procedure:
Edit the file:

/usr/local/groundwork/config/status-feeder.properties

Change the following lines:
```
send_state_changes_by_nsca=false
```
to
```
send_state_changes_by_nsca=true
```
```
primary_parent=""
```
to
```
primary_parent="MYPARENTHOST"
```
where MYPARENTHOST is the DNS name of the parent that will be receiving this data. DO NOT neglect the quotes.
Optionally set up any secondary servers by changing the appropriate lines. If you are not using secondary servers, just leave these alone. You do not need to change any of the remaining parameters, but of course, you can tune them to fit your installation.
The defaults are:
Send heartbeats every hour:
```
nsca_heartbeat_interval = 60 * 60
```
Send full dumps every 8 hours:
```
nsca_full_dump_interval = 8 * 60 * 60
```
Send a maximum of 100 messages at a time:
```
max_messages_per_send_nsca = 100
```
Wait for 2 seconds between batches of results:
```
nsca_batch_delay = 2
```
You may want these less frequently (for large configurations) or more frequently. It depends on the bandwidth available, and the load on the GroundWork server. You can also elect to send the heartbeat in small batches, rather than the default of 100 results, and to open a larger gap between the batches. Be advised that the feeding of results to the database on the child server will be affected if you make the sending of heartbeats too frequent or too long (with many batches and long batch delays), but you may not be concerned about this, as child servers are often not accessed at all by users.
Save the file when you are finished editing. You will need to restart gwservices on the child server when you are done which can be done by typing the following command:
```
/etc/init.d/groundwork restart gwservices
```

3.3.3 Set Up Forwarding of State Changes and Heartbeats Via Spooler - (Optional)

The child server can optionally spool results to be forwarded to the parent. This can be useful if, for example, the network link from the child to the parent is intermittent. It also provides a small amount of enhanced reliability for the data transfers.

The spooler works in the same way as the GDMA spooler code. It will keep a programmable number of results for a programable interval, and will transmit the saved results when the parent becomes available after an interval of downtime.

The spooler is actually a separate method of sending results for the NSCA method. You probably do not want to use both methods to send results to the same server. You can also send results to one server with NSCA and another with the spooler. It's up to you. Generally, though, if you use the spooler, you will be disabling the NSCA method.

If you monitor the child server from the parent, you should consider setting up a passive service named gdma_spooler on the child host. This will show you the spooler statistics. There's no harm if you do not set it up, but it is sometimes useful to know how much data is flowing in from a given child server.

To configure the spooler option, make the following additional changes to the file:

/usr/local/groundwork/config/status-feeder.properties

Change:
```
send_state_changes_by_gdma=false
```
to:
```
send_state_changes_by_gdma=true
```
Optionally change the defaults for gdma_heartbeat_interval, gdma_full_dump_interval, and max_unspooled_results_to_save. These values are explained in the comments, and are analogous to those for the similarly named NSCA settings.
Next, find the file called gwmon_HOSTNAME.cfg, in /usr/local/groundwork/gdma/config where HOSTNAME is the name of this child server host. Edit this file and change:
```
Target_Server="http://gdma-autohost"
```
to:
```
Target_Server="http://PARENTHOSTNAME"
```
where PARENTHOSTNAME is the name of the parent host to send to. Note that you can specify more than one, in a comma separated list. The name must be specified as a URL. Be sure to uncomment the Target_Server line.

Adjustment of other parameters is optional, and should be done only if necessary. Refer to the GDMA documentation for explanations of the parameters in this file that control the spooler.
Save the file.
Restart gwservices on the child to make this configuration active:
```
service groundwork restart gwservices
```

3.4 Considerations

3.4.1 perfdata.properties

The perfdata.properties file contains a mixture of scalar-value settings and some XML-like sections. It must be edited manually to preserve this structure. It is listed in the GroundWork Administration > Foundation > Manage Configuration screen as a file that can be edited there, but attempting to do so will effectively destroy the content of this file. A bug report (GWMON-10097) has been filed to remove this filename from the list of edit-able files in this screen. A copy of the original file is included in the reference section of this document.

Editing of the perfdata.properties file should be done on the child server, not on the parent server, as it is the child server that needs to know what data to send and where to send it. You will need to either do a configuration push to the child (build instance for the child group on the Groundwork server), or restart the process_service_perfdata_file process on the child to have these changes take effect. See above.

3.4.2 Encryption

Using either the NSCA or GDMA spooler methods makes use of the NSCA program, and the Bronx event broker. These programs are capable of sending and receiving encrypted data, but are set up by default not to do so. Also, the Bronx event broker (which processes the data received at the parent) can support what is known as wide packets, or an enhancement to NSCA that makes it much faster in high-load configurations. Wide packets support is enabled by default.

If you set up encryption, you should note that:

Encryption must be the same on the parent and all child servers, as well as any system that sends data to the parent via NSCA (for example a GDMA system)
Encryption adds overhead, and may slow down the data transfer process
Encryption is more secure, which can be important in some environments

3.5 Maintenance

There is no special maintenance for this feature. However, keeping track of performance on the GroundWork server is a good idea. If things seem to be slow, it may be a good idea to consider adjusting the heartbeat frequency to a less frequent interval. If you do so, ensure that any freshness checks on the GroundWork server hosts and services are synchronized with this interval. Freshness intervals for passive checks should always be kept longer than the update cycles to avoid false positive results.

3.6 References

perfdata.properties

This is the default file contents (with comments).

# perfdata.properties
#
# Copyright 2010-2013 GroundWork Open Source, Inc. ("GroundWork")
# All rights reserved.  This program is free software; you can
# redistribute it and/or modify it under the terms of the GNU
# General Public License version 2 as published by the Free
# Software Foundation.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the Free Software
# Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
# 02110-1301, USA.
######################################################################
## GroundWork Performance Data Processing Configuration Properties
######################################################################
# The values specified here are used to control the behavior of the
# process_service_perfdata_file script.
# Possible debug_level values:
# 0 = no info of any kind printed, except for startup/shutdown
#     messages and major errors
# 1 = print just error info and summary statistical data
# 2 = also print basic debug info
# 3 = print detailed debug info
debug_level = 1
# Create and update RRD files.  [true/false]
process_rrd_updates = true
# Use the newer XML web-service API to post performance data to the
# Foundation databases configured below.  Highly recommended, for
# efficiency.  [true/false]
post_performance_using_xml = true
# How many performance data updates to bundle together in a single
# message to Foundation, when the XML web-service API is used.  This
# is a loose limit; it is only checked after adding all the data for
# a {host, service}, which might contain multiple performance values.
max_performance_xml_bundle_size = 20
# A limit on the number of items sent to Foundation in a single
# packet.
max_bulk_send = 200
# Timeout, specified in seconds, if the older HTTP API is used
# to post performance data to the Foundation database.
foundation_http_submission_timeout = 2
# Timeout, specified in seconds, to address GWMON-7407.
# The usual value is 30; set to 0 to disable.
socket_send_timeout = 30
# Specify whether to use a shared library to implement RRD file
# access, or to fork an external process for such work (the legacy
# implementation).  Set to true (recommended) for high performance,
# to false only as an emergency fallback or for special purposes.
# [true/false]
use_shared_rrd_module_for_create = true
use_shared_rrd_module_for_update = true
use_shared_rrd_module_for_info   = true
# Where the rrdtool binary lives.
rrdtool = /usr/local/groundwork/common/bin/rrdtool
# What files to read for results to be processed.  The perfdata_file paths
# are defined by external scripts, such as launch_perf_data_processing
# for the service-perfdata.dat.being_processed pathname, so they cannot
# be changed here arbitrarily.  The perfdata_source labels must be unique,
# and each such label must reflect the name of the Application Type in
# Foundation for the corresponding data stream.  The seek_file path for
# each source must also name a unique file, so there is no confusion as to
# what its contents represent.
#
# Each upstream provider is responsible for atomically renaming the file
# it uses to collect the performance data into the perfdata_file pathname
# listed here, at a point in time when the upstream provider is no longer
# writing (and will no longer write, if the rename happens) into that file.
# That way, there can never be any confusion as to whether the file is
# ready for processing here.
<service_perfdata_files>
    # Nagios performance data.
    <perfdata_source NAGIOS>
	perfdata_file = "/usr/local/groundwork/nagios/var/service-perfdata.dat.being_processed"
	seek_file     = "/usr/local/groundwork/nagios/var/service-perfdata.dat.seek"
    </perfdata_source>
    # Virtual Environments Monitoring Agent performance data.
    <perfdata_source VEMA>
	perfdata_file = "/usr/local/groundwork/core/vema/var/vema-perfdata.dat.being_processed"
	seek_file     = "/usr/local/groundwork/core/vema/var/vema-perfdata.dat.seek"
    </perfdata_source>
    # Cloud Hub for Red Hat Virtualization performance data.
    <perfdata_source CHRHEV>
	perfdata_file = "/usr/local/groundwork/core/vema/var/chrhev-perfdata.dat.being_processed"
	seek_file     = "/usr/local/groundwork/core/vema/var/chrhev-perfdata.dat.seek"
    </perfdata_source>
    # My-Application performance data.
#    <perfdata_source my_app>
#	perfdata_file = "/usr/local/groundwork/my_app/var/my_app-perfdata.dat.being_processed"
#	seek_file     = "/usr/local/groundwork/my_app/var/my_app-perfdata.dat.seek"
#    </perfdata_source>
</service_perfdata_files>
# What sequence the perfdata sources should be processed in, within a given
# processing cycle.  In general, a round-robin check of sources is made to
# see which of them are ready for processing.  The choices are:
#
# process_each_ready   Process sources in order in each cycle, checking
#                      each at most once per cycle to see if it is ready.
# process_every_ready  Process as many sources as possible in each cycle,
#                      but only process each source at most once per cycle.
# process_all_ready    Keep processing in each cycle until no more sources
#                      are ready.
#
# The usual choice is "process_every_ready", which provides robust behavior
# while always allowing a brief rest between effective cycles.  More detail
# on this option is provided in comments in the code, if it matters to anyone.
source_selection_model = process_every_ready
# How often to update a seek file as the corresponding perfdata file is
# read.  This many lines of a perfdata file are processed at a time before
# the corresponding seek file is updated with the current position.  This
# limits the amount of data that will be reprocessed if the perf script
# dies catastrophically (without first updating the seek file based on the
# update_seek_file_on_failure option).  There is a tradeoff here between
# the i/o needed to update the seek file periodically and the reprocessing
# of some number of lines from the perfdata file in a (presumably rare)
# catastrophe-recovery situation.  Setting this value to 0 will disable
# periodically updating the seeek file, which would only be useful in a
# development situation.
seek_file_update_interval = 1000
# Whether to update a seek file if a processing failure or termination request
# is sensed, before the perf script shuts down.  Normally this will be left
# as "true", so the current line will be skipped when the script starts up
# again.  (This presumes that the slight possible data loss during an ordinary
# termination request is tolerable.)  In some debugging situations, or to
# ensure that the last line of possibly partially-processed data is re-read
# on startup, you may want to set this to "false", so an input failure can
# be easily replicated (or that data that was in-progress at the time of a
# termination request is not lost) by having it be reprocessed on startup.
update_seek_file_on_failure = true
# Where the log file is to be written.
debuglog = /usr/local/groundwork/nagios/var/log/process_service_perfdata_file.log
# The wait time between cycles of the process_service_perfdata_file
# script, which runs as a daemon.  Specified in seconds.
loop_wait_time = 15
# Whether to emit a log message to Foundation at the end of every processing
# cycle where errors or warnings were detected.  This is disabled by default
# because it can generate a large number of messages when the setup is broken.
# But it can be valuable to provide very visible notice that processing problems
# are occurring, so you know to look in the debug log for details.  [true/false]
emit_status_message = false
# Specify whether to log messages that tell exactly what the script is doing
# at the moment that a termination signal is received.  We don't enable
# these messages by default because logging i/o routines are not necessarily
# re-entrant, which could cause difficulties.  But the messages can be enabled
# during troubleshooting trials to identify which areas of the script need
# improvement in the speed of handling termination signals.  [true/false]
spill_current_action = false
# This section contains the configuration for all access to Foundation
# databases.  It must include one group of lines for the Foundation
# associated with this server (with the child_host value set to an
# empty string).  Additional groups of lines are needed for parent
# servers if you want RRD graphs generated on this child server
# (where the process_service_perfdata_file script is running) to be
# integrated into Status Viewer on a parent server, or if you want
# EPR reports to be created on a server.
#
# The foundation_host value, specified inside the angle-brackets, is
# a qualified or unqualified hostname, or IP address, for a network
# interface on which the Foundation of the respective standalone,
# child, parent, parent-standby, or report server can be accessed.
# Substitute for MYPARENTHOST or MYSTANDBYHOST in the lines below
# as needed.  The foundation_port is the port number on that network
# interface through which Foundation can be contacted.
#
# The child_host value is a qualified or unqualified hostname, or
# IP address, of the machine on which the performance data handling
# script (process_service_perfdata_file) is running, as seen by that
# particular Foundation server.  The specified value must not be
# 127.0.0.1 or localhost, and it may be different for access from
# different Foundation servers (substitute for MYHOST in the lines
# below as needed).  This value must be left empty for the child
# (or standalone) server's own Foundation.
#
# The send_RRD_data value [true/false] specifies whether this
# Foundation should receive information about RRD graphs.
# If child_host is empty, this information will include details
# on RRD filenames and graph commands, so graphs can be directly
# generated as needed.  If child_host is non-empty, this information
# will instead include just the child_host value, so this copy of
# Foundation will know where to reach to obtain the graph.
#
# The send_perf_data value [true/false] specifies whether this
# Foundation should receive a copy of the detailed performance data.
# It should be enabled if and only if this Foundation may be used to
# produce EPR reports.
#
# Lines in this section may be commented out with a leading "#"
# character.  Uncomment and customize groups of lines here as needed.
<foundation>
    # Local Foundation.  It is not a parent server for this data,
    # so the child_host is set to an empty string to distinguish
    # this case.  send_RRD_data must be true for this entry.
    <foundation_host localhost>
	foundation_port = 4913
	child_host      = ""
	send_RRD_data   = true
	send_perf_data  = false
    </foundation_host>
    # Parent-server Foundation, if any.
#    <foundation_host MYPARENTHOST>
#	foundation_port = 4913
#	child_host      = "MYHOST"
#	send_RRD_data   = true
#	send_perf_data  = false
#    </foundation_host>
    # Parent-standby-server Foundation, if any.
#    <foundation_host MYSTANDBYHOST>
#	foundation_port = 4913
#	child_host      = "MYHOST"
#	send_RRD_data   = true
#	send_perf_data  = false
#    </foundation_host>
</foundation>