You can also use the procedures and information here to prepare debugging data for opening a support case, or to try to determine exactly why your graphs do not function normally. Please see My performance graphs all stopped working. What's wrong? if you are having this issue.
Support usually needs the same data you would gather in the procedure above. Here are the steps to quickly gather it for support to analyze.
sed -e '/debug_level/s/1/3/' -i /usr/local/groundwork/config/perfdata.properties
pkill -f process_service_perfdata_file
tail -f /usr/local/groundwork/nagios/var/log/process_service_perfdata_file.log
tar czvf performace_log.tar.gz /usr/local/groundwork/nagios/var/log/process_service_perfdata_file.log
sed -e '/debug_level/s/3/1/' -i /usr/local/groundwork/config/perfdata.properties pkill -f process_service_perfdata_file
The data you need can be collected with the above procedure. Once you are generating the debug info, you can analyze it yourself.
There are several types of entries in the debug log, but the typical entry for processing performance data for a service on a host looks like this:
[]
Host: localhost
Svcdesc: local_users
Lastcheck: 1281471736
Statustext: USERS OK - 1 users currently logged in
Perfdata:users=1;5;20;0
Adding label=users,value=1,warn=5,crit=20,min=0,max=0
Table host_service, host=localhost, service=local_users already has an existing entry for location /usr/local/groundwork/rrd/localhost_local_users.rrd. New entry not added.
Graph RRD command: rrdtool graph - --imgformat=PNG --slope-mode DEF:a=/usr/local/groundwork/rrd/localhost_local_users.rrd:users:AVERAGE CDEF:cdefa=a AREA:cdefa#0033CC:"Number of logged in users" -c BACK#FFFFFF -c CANVAS#FFFFFF -c GRID#C0C0C0 -c MGRID#404040 -c ARROW#FFFFFF-Y --height 120
Nothing changed for localhost local_users
Update RRD command: /usr/local/groundwork/common/bin/rrdtool update /usr/local/groundwork/rrd/localhost_local_users.rrd 1281471736:1 2>&1
Posting data to Foundation
performancedatalabel=users
performancevalue=1
Elapsed Execution Time = 3914.383 seconds
In this example, everything is working normally. The key fields are:
Host: localhost
Svcdesc: local_users
these uniquely identify the host and service.
Lastcheck: 1281471736
this is the unix timestamp that the data was produced, and will be graphed as occurring at
Statustext: USERS OK - 1 users currently logged in
this is the actual plugin output before performance data. Note that this can be parsed with the Status Text Regex in Configuration - Performance to extract perf data (e.g. the number "1", here)
Perfdata:users=1;5;20;0
this is the perfdata from the plugin, in standard Nagios Plugin format. See http://nagiosplug.sourceforge.net/developer-guidelines.html#AEN201
Adding label=users,value=1,warn=5,crit=20,min=0,max=0
this is the result of the parsing we do on the perf data. Loosely, the label and value arrays are used in the rrd create command as $VALUE1$, $VALUE2$... $LABEL1$, $LABEL2$, etc, to create the RRD with the appropriate DS names (typically labels) and populate the RRDs with the data (typically values). The $WARN1$... and $CRIT1$... series are also available, however, and can come in useful to put the thresholds on the graphs. Similarly the $MAX1$... and $MIN1$... series can be used.
this just tells you the result of the lookup of the performance definition, matching the host and service to a particular configuration.Table host_service, host=localhost, service=local_users already has an existing entry for location /usr/local/groundwork/rrd/localhost_local_users.rrd. New entry not added.
Graph RRD command: rrdtool graph - --imgformat=PNG --slope-mode DEF:a=/usr/local/groundwork/rrd/localhost_local_users.rrd:users:AVERAGE CDEF:cdefa=a AREA:cdefa#0033CC:"Number of logged in users" -c BACK#FFFFFF -c CANVAS#FFFFFF -c GRID#C0C0C0 -c MGRID#404040 -c ARROW#FFFFFF-Y --height 120
this is the result of substituting the values from the service and perf data into the RRD Graph Command entered in the performance configuration. This is the command that, if typed at a command line and routed to a .png file can be used to test the actual graph generation for this host and service. It is stored in the foundation database for this host+service, and updated only when changed, thus the message:
"Nothing changed for localhost local_users"
occurs when no change is needed for this command.
Update RRD command: /usr/local/groundwork/common/bin/rrdtool update /usr/local/groundwork/rrd/localhost_local_users.rrd 1281471736:1 2>&1
this is the actual command used to insert data into the RRD for this run of the check of this service on this host. Note that you will probably get an error if you try to run this command at the command line, as data cannot be inserted into an RRD for the same timestamp twice. If there is no data being input to your RRD, this command may be malformed, and so you may want to try it at the command line to diagnose why it fails. To change it, you will need to modify the Performance configuration in Configuration -> Performance for this service.
Posting data to Foundation
performancedatalabel=users
performancevalue=1
this is the record of posting data for this service to foundation for use in the enterprise performance reports. Note that as of 6.2, this data is batched, and sent with XML, so you will not see it in this position in the file after upgrade to 6.2 or above.
The first time a service check's performance data is processed, the RRD is automatically created (or attempted to be created). Often the RRD Create command that is echoed in this file in that case is informative. If you do not see the RRD for your host and service, that command may be malformed. You can try it at the command line and see what error message you may get, and correct it in the Performance configuration for that service.
Other information in the debug log lists the attempts to post the graph commands to foundation, as well as summary data for each run of the process.
The default it to run the process every 5 minutes, by placing the data in a file called /usr/local/groundwork/nagios/var/service-perfdata.dat.being_processed, which is picked up by the process_service_perfdata_file daemon. These processes are launched by nagios with the launch_perfdata_process script, called as the "Service performance data file processing command", as defined in Configuration -> Control -> Nagios Main Configuration -> Page 3. See My performance graphs all stopped working. What's wrong? for an example.