Overview
Currently, a set of Nagios commands is provided to check the reachability, health and resource usage of Nagios commands for Remotely monitoring |
The file can be saved anywhere, but the suggested locations will put it next to other Nagios configuration files. |
Include privateserver.commands.cfg from nagios.cfg with a line like the following:
# replace with the full path to your privateserver.commands.cfg
cfg_file=/etc/nagios/objects/privateserver.commands.cfg
Download privateserver.host.cfg.template
. For each of your appliances:
appliance hostname.cfg
, under the /etc/nagios/objects
directory (or /usr/local/nagios/etc/objects
, if Nagios was installed from source code)appliance hostname.cfg
in an editor and:localhost.localdomain
with the hostname of the appliance127.0.0.1
with the IP address (recommended) or the hostname of the applianceappliance hostname.cfg
from another configuration file (e.g. /etc/nagios/nagios.cfg
) with a line like the following:# replace with the full path to the configuration file
cfg_file=/etc/nagios/objects/host1.privateserver.test.cfg
Configuration files created from the template require the following definitions to be already present in your Nagios configuration:
linux-server
host definitiongeneric-service
service definitionAll of the above definitions are present in the default Nagios configuration, but they may be absent in your installation of Nagios
Run the following command:
nagios -v /etc/nagios/nagios.cfg
# if you installed Nagios from source code, you might need to run this instead:
/usr/local/nagios/bin/nagios -v /etc/nagios.cfg
Review the output, and correct any errors reported by Nagios. Nagios only reports the first error it finds, so you will need to verify the configuration after every change, until it reports no errors.
Always verify the configuration before starting or restarting Nagios |
Run the following command:
systemctl restart nagios
Nagios will restart with the new configuration. The changes should be immediately visible in the web interface.
For remote monitoring to work, the appliance must be reachable from the monitoring host. Additionally, most commands need a specific appliance service to be assigned to the network interface used for management (for each command, it will be documented whether this is the case, and which service affects it). The service assignment UI can be found in the web console, under Server Configuration → Applications.
Edit /etc/nagios/nrpe.cfg and set allowed_host to the IP address of Nagios monitoring server.
Start nrpe service:
systemctl start nrpe
systemctl enable nrpe
The commands in this category check for the reachability of the appliance's administration interfaces.
check_privateserver_ping
Checks whether the appliance responds to ping requests. For more information, see the documentation for the Nagios
check_ping
plugin.
This command sends 5 ping requests. |
This command can fail with 100% packet loss if ICMP pings are blocked between the monitoring host and the appliance. |
Status | Meaning |
---|---|
The appliance is alive | |
Average RTT is larger than 3 seconds, or packet loss is 80% or more | |
Average RTT is larger than 5 seconds, or packet loss is 100% |
A typical, healthy output is similar to:
PING OK - Packet loss = 0%, RTA = 33.22 ms
The commands in this category check whether the appliance's services, internal or external, are up and working correctly.
check_privateserver_sip
Performs a test call on the appliance to ensure that the SIP service can handle calls correctly. Requires the
nrpe
service to be enabled on the management network interface.
Status | Meaning |
---|---|
The SIP service is up and running normally, and can currently handle calls correctly. | |
Both participants to the test call completed the call succesfully, but the call was hung up immediately. This can mean the SIP service is responding too slowly, or that the appliance is low on resources. See the output for more information. | |
One or both participants to the test call encountered an error. See the output for more information. |
If the status is CRITICAL, the output contains the exit code of both participants to the test call. At least one will be non-zero, indicating an error. ...
check_privateserver_web_console
Checks that the web-based administration interface of the appliance is reachable and running correctly. Requires the
http
service to be enabled on the management network interface.
Status | Meaning |
---|---|
The web console is reachable and appears to be running correctly. | |
| |
|
check_privateserver_ssh_console
Checks that the appliance is reachable through SSH. Requires the
ssh
service to be enabled on the management network interface.
Status | Meaning |
---|---|
The SSH server is reachable and appears to be running correctly. | |
Should never happen. | |
Fatal error connecting to the SSH server, or malformed response from the SSH server. |
check_privateserver_db_status
Checks that the database service on the appliance is running correctly. Requires the
nrpe
service to be enabled on the management network interface.
Status | Meaning |
---|---|
The database is up and running correctly. | |
Non-fatal error connecting to the server, or no server status available. | |
Fatal error connecting to the server, or error querying server status. |
check_privateserver_db_data
Checks that the database service on the appliance is responding to queries. Requires the
nrpe
service to be enabled on the management network interface.
Status | Meaning |
---|---|
The database is up and running correctly and responding to simple queries. | |
Non-fatal error connecting to the server. | |
Fatal error connecting to the server, or error executing the query. |
The commands in this category monitor the usage of the appliance's finite resources (CPU, memory, etc.).
check_privateserver_cpu
Checks the CPU usage on the appliance. Requires the
nrpe
service to be enabled on the management network interface.
Status | Meaning |
---|---|
CPU usage normal. | |
CPU usage between 90% and 95%. | |
CPU usage 95% or above. |
check_privateserver_memory
Checks the user and swap memory usage on the appliance. User memory is calculated as total memory usage minus buffers and cache. Requires the
nrpe
service to be enabled on the management network interface.
Status | Meaning |
---|---|
Memory usage normal. | |
User memory or swap usage between 90% and 95%. | |
User memory or swap usage above 95%. |
check_privateserver_disk
Checks the disk space usage on the appliance. Requires the
nrpe
service to be enabled on the management network interface.
Status | Meaning |
---|---|
Free disk space normal. | |
Free disk space is 5% or less on any filesystem. | |
Free disk space is 0% on any filesystem. |
check_privateserver_bandwidth
Checks the network bandwidth usage on the appliance. Requires the
nrpe
service to be enabled on the management network interface.
Status | Meaning |
---|---|
Network bandwidth usage normal. | |
Network bandwidth usage between 20 Mb/s and 100 Mb/s on any network interface. | |
Network bandwidth usage above 100 Mb/s on any network interface. |