WAS THIS PAGE HELPFUL? Leave Feedback
Overview
GroundWork Monitor includes a feature known as the Remote RRD Graph Web Service (RRGWS). This section describes this web service, its uses, and configuration.
The RRGWS was developed to support large configurations spanning multiple GroundWork monitoring servers. In large configurations the volume of data transferred can become an issue. The largest volumes of data are those associated with performance measures, as this information is useful only if collected regularly. Status information, by contrast, is generated only when the state of the object changes. In the context of multiple GroundWork servers, we can leverage this dynamic to greatly reduce the data transfers between child (polling) servers and parent systems.
Figure: Polling Servers
In this scenario, the polling servers are doing all the active checks, and are collecting performance data. When a host or service changes state, this information is forwarded to the GroundWork server, but the results of individual checks that do not contain a state change (the vast majority) are not forwarded. All the performance graphs are created and hosted on the child server, yet the user primarily (or even exclusively) uses the GroundWork server to interact with the system. The graphs are displayed from the child server on the GroundWork server by means of the RRGWS.
This approach entails several components. The child must publish the location of the RRD graphs to the GroundWork server for display. This is done by configuring the child server to forward the location of the RRDs and the graph commands to the GroundWork server. The GroundWork server will similarly be configured, for those services whose graphs are hosted on a child server, to not store the local RRD graph commands or path locations. It will store this information only for checks it performs locally, such as the local GroundWork server checks. This is done automatically.
The child server must be configured to send only changes of state to the GroundWork server. This is done using the state changes detected by the status feeder, and by periodically sending heartbeat messages from the child to the GroundWork server, posting the current state of all objects. The heartbeat portion of this operation is expensive, but the interval and rate of transmission can be tuned, so it is deemed acceptable considering the benefit. No one wants to be looking at stale data, so the system should be able to accommodate the trade-off between performance and data age. In any case, state changes will always be forwarded immediately, so critical data will remain up-to-date.
Figure: Remote RRD Graph Web Service
Requirements
You must have GroundWork Monitor set up and operating as a child server, accepting configuration files from a GroundWork server. It is helpful to configure the child server to forward results to the GroundWork server, as well, although we will not cover this in detail, as the commands are changed.
The child server must be able to contact the GroundWork server on the following network ports:
- 5667/tcp (for posting results)
- 4913/tcp (for posting RRD graph locations)
The GroundWork server must be able to contact the child server on the following network ports:
- 22/tcp (ssh for configuration transfers)
- 80/tcp or 443/tcp for web services
Configuration Steps
- GroundWork Server - There is no special configuration to be done on the GroundWork server.
- Setup Forwarding of RRD Locations - The child server is configured with the following procedure:
- Edit the file:
/usr/local/groundwork/config/perfdata.properties
- Un-comment the section:
# <foundation_host MYPARENTHOST> # foundation_port = 4913 # child_host = MYHOST # send_RRD_data = true # send_perf_data = false # </foundation_host>
to:
<foundation_host MYPARENTHOST> foundation_port = 4913 child_host = MYCHILDHOST send_RRD_data = true send_perf_data = true </foundation_host>
where:
MYPARENTHOST is the DNS name of the parent server (must be resolvable from the child server).
MYCHILDHOST is the DNS name of the child server.This CANNOT be localhost or 127.0.0.1. It must be resolvable from the parent server. - Optionally, you may decide not to send the perf_data to the parent. If you send it, this data is posted directly to Foundation, and is used in the performance reports under the Reports tab. These are the EPR reports found under Reports > BIRT Report Viewer > Performance Reports: epr host, epr host multi variable, epr hostgroup, epr hostgroup multi variable, and epr hostgroup topfive. In contrast, the Reports > Performance View tool does not support the Remote RRD configuration, as it relies on direct local access to RRD files. There is a performance load introduced on the parent by sending the detailed perf_dat however, this is not anticipated to be a large load, since the data is bundled. The advantage is that reports run on the parent will have all performance data from the child. You may also choose to keep the performance data on the child, or not to post it at all.
- The recommended configuration is to send the perf_data to the parent if you have one or two child servers. If you have more than two child servers, you should probably consider another configuration where this data is not sent, as the load on the parent will likely be significant.
- To not send the perf_data to the parent, change the line in the section above:
from:send_perf_data = true
to:
send_perf_data = false
DO NOT remove the following section as it is required for operation. You can, however, choose to set send_perf_data=false if you do not want the perf data to be sent to Foundation on the server on which that data originates. <foundation_host localhost> foundation_port = 4913 child_host = "" send_RRD_data = true send_perf_data = true </foundation_host>
- To have these changes take effect, you can:
- Perform a configuration change (build instance for the group for this child server on the GroundWork server) or,
- Kill the process for the process_service_perfdata_file program, at the command prompt, type:
ps -ef | grep process_service_perfdata_file
You will see output similar to the following:
nagios 23260 1 0 Jun14 ? 00:00:42 /groundwork/perl/bin/.perl.bin -I/usr/local/groundwork/perl/lib/5.8.8 -I/usr/local/groundwork/perl/lib/site_perl/5.8.8 -I/usr/local/groundwork/nagios/libexec -I/usr/local/groundwork/perl/custom/lib/5.8.8 -I/usr/local/groundwork/perl/custom/lib/site_perl/5.8.8 -w -- /usr/local/groundwork/nagios/eventhandlers/process_service_perfdata_file root 28917 4118 0 14:06 pts/1 00:00:00 grep process_service_perfdata_file
In this case, the PID is 23260 for the process. Kill it by typing:
kill 23260
The process will automatically restart in a few minutes. After approximately 10 minutes, the _GroundWork_ server interface will begin to show you graphs generated on the child server.
- Edit the file:
- Set Up Heartbeat Operation - The child server must be set up to forward periodic updates for all hosts and services. This is done with the following procedure:
- Edit the file:
/usr/local/groundwork/config/status-feeder.properties
- Change the following lines:
send_state_changes_by_nsca=false
to:
send_state_changes_by_nsca=true
and:
primary_parent=""
to:
primary_parent="MYPARENTHOST"
where MYPARENTHOST is the DNS name of the parent that will be receiving this data. DO NOT neglect the quotes.
- Optionally set up any secondary servers by changing the appropriate lines. If you are not using secondary servers, just leave these alone. You do not need to change any of the remaining parameters, but of course, you can tune them to fit your installation.
The defaults are:
Send heartbeats every hour:nsca_heartbeat_interval = 60 * 60
Send full dumps every 8 hours:
nsca_full_dump_interval = 8 * 60 * 60
Send a maximum of 100 messages at a time:
max_messages_per_send_nsca = 100
Wait for 2 seconds between batches of results:
nsca_batch_delay = 2
You may want these less frequently (for large configurations) or more frequently. It depends on the bandwidth available, and the load on the GroundWork server. You can also elect to send the heartbeat in small batches, rather than the default of 100 results, and to open a larger gap between the batches. Be advised that the feeding of results to the database on the child server will be affected if you make the sending of heartbeats too frequent or too long (with many batches and long batch delays), but you may not be concerned about this, as child servers are often not accessed at all by users.
- Save the file when you are finished editing. You will need to restart gwservices on the child server when you are done which can be done by typing the following command:
/etc/init.d/groundwork restart gwservices
- Edit the file:
- Set Up Forwarding of State Changes and Heartbeats Via Spooler - (Optional)
The child server can optionally spool results to be forwarded to the parent. This can be useful if, for example, the network link from the child to the parent is intermittent. It also provides a small amount of enhanced reliability for the data transfers.
The spooler works in the same way as the GDMA spooler code. It will keep a programmable number of results for a programable interval, and will transmit the saved results when the parent becomes available after an interval of downtime.The spooler is actually a separate method of sending results for the NSCA method. You probably do not want to use both methods to send results to the same server. You can also send results to one server with NSCA and another with the spooler. It's up to you. Generally, though, if you use the spooler, you will be disabling the NSCA method. If you monitor the child server from the parent, you should consider setting up a passive service named gdma_spooler on the child host. This will show you the spooler statistics. There's no harm if you do not set it up, but it is sometimes useful to know how much data is flowing in from a given child server.
To configure the spooler option, make the following additional changes to the file:/usr/local/groundwork/config/status-feeder.properties
- Change:
send_state_changes_by_gdma=false
to:
send_state_changes_by_gdma=true
Optionally change the defaults for gdma_heartbeat_interval, gdma_full_dump_interval, and max_unspooled_results_to_save. These values are explained in the comments, and are analogous to those for the similarly named NSCA settings.
- Next, find the file called gwmon_HOSTNAME.cfg, in /usr/local/groundwork/gdma/config where HOSTNAME is the name of this child server host. Edit this file and change:
Target_Server="http://gdma-autohost"
to:
Target_Server="http://PARENTHOSTNAME"
where PARENTHOSTNAME is the name of the parent host to send to. Note that you can specify more than one, in a comma separated list. The name must be specified as a URL. Be sure to uncomment the Target_Server line.
Adjustment of other parameters is optional, and should be done only if necessary. Refer to the GDMA documentation for explanations of the parameters in this file that control the spooler. - Save the file.
- Restart gwservices on the child to make this configuration active:
service groundwork restart gwservices
- Change:
Considerations
- perfdata.properties - The perfdata.properties file contains a mixture of scalar-value settings and some XML-like sections. It must be edited manually to preserve this structure. It is listed in the GroundWork Administration > Foundation > Manage Configuration screen as a file that can be edited there, but attempting to do so will effectively destroy the content of this file. A bug report (GWMON-10097) has been filed to remove this filename from the list of edit-able files in this screen. A copy of the original file is included in the reference section of this document.
Editing of the perfdata.properties file should be done on the child server, not on the parent server, as it is the child server that needs to know what data to send and where to send it. You will need to either do a configuration push to the child (build instance for the child group on the Groundwork server), or restart the process_service_perfdata_file process on the child to have these changes take effect. See above. - Encryption - Using either the NSCA or GDMA spooler methods makes use of the NSCA program, and the Bronx event broker. These programs are capable of sending and receiving encrypted data, but are set up by default not to do so. Also, the Bronx event broker (which processes the data received at the parent) can support what is known as wide packets, or an enhancement to NSCA that makes it much faster in high-load configurations. Wide packets support is enabled by default.
If you set up encryption, you should note that:- Encryption must be the same on the parent and all child servers, as well as any system that sends data to the parent via NSCA (for example a GDMA system)
- Encryption adds overhead, and may slow down the data transfer process
- Encryption is more secure, which can be important in some environments
Maintenance
There is no special maintenance for this feature. However, keeping track of performance on the GroundWork server is a good idea. If things seem to be slow, it may be a good idea to consider adjusting the heartbeat frequency to a less frequent interval. If you do so, ensure that any freshness checks on the GroundWork server hosts and services are synchronized with this interval. Freshness intervals for passive checks should always be kept longer than the update cycles to avoid false positive results.
perfdata.properties
This is the default file contents (with comments).
# perfdata.properties # # Copyright 2010-2013 GroundWork Open Source, Inc. ("GroundWork") # All rights reserved. This program is free software; you can # redistribute it and/or modify it under the terms of the GNU # General Public License version 2 as published by the Free # Software Foundation. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program; if not, write to the Free Software # Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA # 02110-1301, USA. ###################################################################### ## GroundWork Performance Data Processing Configuration Properties ###################################################################### # The values specified here are used to control the behavior of the # process_service_perfdata_file script. # Possible debug_level values: # 0 = no info of any kind printed, except for startup/shutdown # messages and major errors # 1 = print just error info and summary statistical data # 2 = also print basic debug info # 3 = print detailed debug info debug_level = 1 # Create and update RRD files. [] process_rrd_updates = true # Use the newer XML web-service API to post performance data to the # Foundation databases configured below. Highly recommended, for # efficiency. [] post_performance_using_xml = true # How many performance data updates to bundle together in a single # message to Foundation, when the XML web-service API is used. This # is a loose limit; it is only checked after adding all the data for # a {host, service}, which might contain multiple performance values. max_performance_xml_bundle_size = 20 # A limit on the number of items sent to Foundation in a single # packet. max_bulk_send = 200 # Timeout, specified in seconds, if the older HTTP API is used # to post performance data to the Foundation database. foundation_http_submission_timeout = 2 # Timeout, specified in seconds, to address GWMON-7407. # The usual value is 30; set to 0 to disable. socket_send_timeout = 30 # Specify whether to use a shared library to implement RRD file # access, or to fork an external process for such work (the legacy # implementation). Set to true (recommended) for high performance, # to false only as an emergency fallback or for special purposes. # [] use_shared_rrd_module_for_create = true use_shared_rrd_module_for_update = true use_shared_rrd_module_for_info = true # Where the rrdtool binary lives. rrdtool = /usr/local/groundwork/common/bin/rrdtool # What files to read for results to be processed. The perfdata_file paths # are defined by external scripts, such as launch_perf_data_processing # for the service-perfdata.dat.being_processed pathname, so they cannot # be changed here arbitrarily. The perfdata_source labels must be unique, # and each such label must reflect the name of the Application Type in # Foundation for the corresponding data stream. The seek_file path for # each source must also name a unique file, so there is no confusion as to # what its contents represent. # # Each upstream provider is responsible for atomically renaming the file # it uses to collect the performance data into the perfdata_file pathname # listed here, at a point in time when the upstream provider is no longer # writing (and will no longer write, if the rename happens) into that file. # That way, there can never be any confusion as to whether the file is # ready for processing here. <service_perfdata_files> # Nagios performance data. <perfdata_source NAGIOS> perfdata_file = "/usr/local/groundwork/nagios/var/service-perfdata.dat.being_processed" seek_file = "/usr/local/groundwork/nagios/var/service-perfdata.dat.seek" </perfdata_source> # Virtual Environments Monitoring Agent performance data. <perfdata_source VEMA> perfdata_file = "/usr/local/groundwork/core/vema/var/vema-perfdata.dat.being_processed" seek_file = "/usr/local/groundwork/core/vema/var/vema-perfdata.dat.seek" </perfdata_source> # Cloud Hub for Red Hat Virtualization performance data. <perfdata_source CHRHEV> perfdata_file = "/usr/local/groundwork/core/vema/var/chrhev-perfdata.dat.being_processed" seek_file = "/usr/local/groundwork/core/vema/var/chrhev-perfdata.dat.seek" </perfdata_source> # My-Application performance data. # <perfdata_source my_app> # perfdata_file = "/usr/local/groundwork/my_app/var/my_app-perfdata.dat.being_processed" # seek_file = "/usr/local/groundwork/my_app/var/my_app-perfdata.dat.seek" # </perfdata_source> </service_perfdata_files> # What sequence the perfdata sources should be processed in, within a given # processing cycle. In general, a round-robin check of sources is made to # see which of them are ready for processing. The choices are: # # process_each_ready Process sources in order in each cycle, checking # each at most once per cycle to see if it is ready. # process_every_ready Process as many sources as possible in each cycle, # but only process each source at most once per cycle. # process_all_ready Keep processing in each cycle until no more sources # are ready. # # The usual choice is "process_every_ready", which provides robust behavior # while always allowing a brief rest between effective cycles. More detail # on this option is provided in comments in the code, if it matters to anyone. source_selection_model = process_every_ready # How often to update a seek file as the corresponding perfdata file is # read. This many lines of a perfdata file are processed at a time before # the corresponding seek file is updated with the current position. This # limits the amount of data that will be reprocessed if the perf script # dies catastrophically (without first updating the seek file based on the # update_seek_file_on_failure option). There is a tradeoff here between # the i/o needed to update the seek file periodically and the reprocessing # of some number of lines from the perfdata file in a (presumably rare) # catastrophe-recovery situation. Setting this value to 0 will disable # periodically updating the seeek file, which would only be useful in a # development situation. seek_file_update_interval = 1000 # Whether to update a seek file if a processing failure or termination request # is sensed, before the perf script shuts down. Normally this will be left # as "true", so the current line will be skipped when the script starts up # again. (This presumes that the slight possible data loss during an ordinary # termination request is tolerable.) In some debugging situations, or to # ensure that the last line of possibly partially-processed data is re-read # on startup, you may want to set this to "false", so an input failure can # be easily replicated (or that data that was in-progress at the time of a # termination request is not lost) by having it be reprocessed on startup. update_seek_file_on_failure = true # Where the log file is to be written. debuglog = /usr/local/groundwork/nagios/var/log/process_service_perfdata_file.log # The wait time between cycles of the process_service_perfdata_file # script, which runs as a daemon. Specified in seconds. loop_wait_time = 15 # Whether to emit a log message to Foundation at the end of every processing # cycle where errors or warnings were detected. This is disabled by default # because it can generate a large number of messages when the setup is broken. # But it can be valuable to provide very visible notice that processing problems # are occurring, so you know to look in the debug log for details. [] emit_status_message = false # Specify whether to log messages that tell exactly what the script is doing # at the moment that a termination signal is received. We don't enable # these messages by default because logging i/o routines are not necessarily # re-entrant, which could cause difficulties. But the messages can be enabled # during troubleshooting trials to identify which areas of the script need # improvement in the speed of handling termination signals. [] spill_current_action = false # This section contains the configuration for all access to Foundation # databases. It must include one group of lines for the Foundation # associated with this server (with the child_host value set to an # empty string). Additional groups of lines are needed for parent # servers if you want RRD graphs generated on this child server # (where the process_service_perfdata_file script is running) to be # integrated into Status Viewer on a parent server, or if you want # EPR reports to be created on a server. # # The foundation_host value, specified inside the angle-brackets, is # a qualified or unqualified hostname, or IP address, for a network # interface on which the Foundation of the respective standalone, # child, parent, parent-standby, or report server can be accessed. # Substitute for MYPARENTHOST or MYSTANDBYHOST in the lines below # as needed. The foundation_port is the port number on that network # interface through which Foundation can be contacted. # # The child_host value is a qualified or unqualified hostname, or # IP address, of the machine on which the performance data handling # script (process_service_perfdata_file) is running, as seen by that # particular Foundation server. The specified value must not be # 127.0.0.1 or localhost, and it may be different for access from # different Foundation servers (substitute for MYHOST in the lines # below as needed). This value must be left empty for the child # (or standalone) server's own Foundation. # # The send_RRD_data value [] specifies whether this # Foundation should receive information about RRD graphs. # If child_host is empty, this information will include details # on RRD filenames and graph commands, so graphs can be directly # generated as needed. If child_host is non-empty, this information # will instead include just the child_host value, so this copy of # Foundation will know where to reach to obtain the graph. # # The send_perf_data value [] specifies whether this # Foundation should receive a copy of the detailed performance data. # It should be enabled if and only if this Foundation may be used to # produce EPR reports. # # Lines in this section may be commented out with a leading "#" # character. Uncomment and customize groups of lines here as needed. <foundation> # Local Foundation. It is not a parent server for this data, # so the child_host is set to an empty string to distinguish # this case. send_RRD_data must be true for this entry. <foundation_host localhost> foundation_port = 4913 child_host = "" send_RRD_data = true send_perf_data = false </foundation_host> # Parent-server Foundation, if any. # <foundation_host MYPARENTHOST> # foundation_port = 4913 # child_host = "MYHOST" # send_RRD_data = true # send_perf_data = false # </foundation_host> # Parent-standby-server Foundation, if any. # <foundation_host MYSTANDBYHOST> # foundation_port = 4913 # child_host = "MYHOST" # send_RRD_data = true # send_perf_data = false # </foundation_host> </foundation>