Overview

appliances can be monitored remotely.

Currently, a set of Nagios commands is provided to check the reachability, health and resource usage of appliances.

Nagios commands for appliances

Remotely monitoring

appliances is best done with the

-specific Nagios commands. This section describes how to add these commands to your monitoring host, how to enable them on the appliance and the meaning of their outputs.

Configuring the monitoring host for monitoring

On the monitoring host, the configuration of Nagios has to be modified. This is a simple process in a few steps:

define -specific commands
Listing appliances on your network
verify the configuration
restart Nagios

Defining -specific commands

Download privateserver.commands.cfg and save it to your monitoring host as:

/etc/nagios/objects/privateserver.commands.cfg
/usr/local/nagios/etc/objects/privateserver.commands.cfg (if Nagios was installed from source code)

The file can be saved anywhere, but the suggested locations will put it next to other Nagios configuration files.

Include privateserver.commands.cfg from nagios.cfg with a line like the following:

# replace with the full path to your privateserver.commands.cfg
cfg_file=/etc/nagios/objects/privateserver.commands.cfg

Listing appliances on your network

Download privateserver.host.cfg.template. For each of your appliances:

make a copy of the template as appliance hostname.cfg, under the /etc/nagios/objects directory (or /usr/local/nagios/etc/objects, if Nagios was installed from source code)
open appliance hostname.cfg in an editor and:
1. replace all occurrences of localhost.localdomain with the hostname of the appliance
2. replace all occurrences of 127.0.0.1 with the IP address (recommended) or the hostname of the appliance
3. save
include appliance hostname.cfg from another configuration file (e.g. /etc/nagios/nagios.cfg) with a line like the following:
```
# replace with the full path to the configuration file
cfg_file=/etc/nagios/objects/host1.privateserver.test.cfg
```

Configuration files created from the template require the following definitions to be already present in your Nagios configuration:

linux-server host definition
generic-service service definition

All of the above definitions are present in the default Nagios configuration, but they may be absent in your installation of Nagios

Verifying the Nagios configuration

Run the following command:

nagios -v /etc/nagios/nagios.cfg

# if you installed Nagios from source code, you might need to run this instead:
/usr/local/nagios/bin/nagios -v /etc/nagios.cfg

Review the output, and correct any errors reported by Nagios. Nagios only reports the first error it finds, so you will need to verify the configuration after every change, until it reports no errors.

Restarting Nagios

Always verify the configuration before starting or restarting Nagios

Run the following command:

systemctl restart nagios

Nagios will restart with the new configuration. The changes should be immediately visible in the web interface.

Configuring the appliance for Nagios monitoring

For remote monitoring to work, the appliance must be reachable from the monitoring host. Additionally, most commands need a specific appliance service to be assigned to the network interface used for management (for each command, it will be documented whether this is the case, and which service affects it). The service assignment UI can be found in the web console, under Server Configuration → Applications.

Edit /etc/nagios/nrpe.cfg and set allowed_host to the IP address of Nagios monitoring server.

Start nrpe service:

systemctl start nrpe

systemctl enable nrpe

Reachability checks

The commands in this category check for the reachability of the appliance's administration interfaces.

`check_privateserver_ping`

Checks whether the appliance responds to ping requests. For more information, see the documentation for the Nagios check_ping plugin.

This command sends 5 ping requests.

This command can fail with 100% packet loss if ICMP pings are blocked between the monitoring host and the appliance.

Status

Status	Meaning
	The appliance is alive
	Average RTT is larger than 3 seconds, or packet loss is 80% or more
	Average RTT is larger than 5 seconds, or packet loss is 100%

Output

A typical, healthy output is similar to:

PING OK - Packet loss = 0%, RTA = 33.22 ms

Service health checks

The commands in this category check whether the appliance's services, internal or external, are up and working correctly.

`check_privateserver_sip`

Performs a test call on the appliance to ensure that the SIP service can handle calls correctly. Requires the nrpe service to be enabled on the management network interface.

Status

Status	Meaning
	The SIP service is up and running normally, and can currently handle calls correctly.
	Both participants to the test call completed the call succesfully, but the call was hung up immediately. This can mean the SIP service is responding too slowly, or that the appliance is low on resources. See the output for more information.
	One or both participants to the test call encountered an error. See the output for more information.

Output

If the status is CRITICAL, the output contains the exit code of both participants to the test call. At least one will be non-zero, indicating an error. ...

`check_privateserver_web_console`

Checks that the web-based administration interface of the appliance is reachable and running correctly. Requires the http service to be enabled on the management network interface.

Status

Status	Meaning
	The web console is reachable and appears to be running correctly.
	Web console reported a client error (HTTP status in the 400 range) HTTPS certificate is about to expire
	Fatal error connecting to the web console I/O error during the request Syntax error in the response Web console reported a server error (HTTP status in the 500 range) Invalid or expired HTTPS certificate

`check_privateserver_ssh_console`

Checks that the appliance is reachable through SSH. Requires the ssh service to be enabled on the management network interface.

Status

Status	Meaning
	The SSH server is reachable and appears to be running correctly.
	Should never happen.
	Fatal error connecting to the SSH server, or malformed response from the SSH server.

`check_privateserver_db_status`

Checks that the database service on the appliance is running correctly. Requires the nrpe service to be enabled on the management network interface.

Status

Status	Meaning
	The database is up and running correctly.
	Non-fatal error connecting to the server, or no server status available.
	Fatal error connecting to the server, or error querying server status.

`check_privateserver_db_data`

Checks that the database service on the appliance is responding to queries. Requires the nrpe service to be enabled on the management network interface.

Status

Status	Meaning
	The database is up and running correctly and responding to simple queries.
	Non-fatal error connecting to the server.
	Fatal error connecting to the server, or error executing the query.

Resource usage checks

The commands in this category monitor the usage of the appliance's finite resources (CPU, memory, etc.).

`check_privateserver_cpu`

Checks the CPU usage on the appliance. Requires the nrpe service to be enabled on the management network interface.

Status

Status	Meaning
	CPU usage normal.
	CPU usage between 90% and 95%.
	CPU usage 95% or above.

`check_privateserver_memory`

Checks the user and swap memory usage on the appliance. User memory is calculated as total memory usage minus buffers and cache. Requires the nrpe service to be enabled on the management network interface.

Status

Status	Meaning
	Memory usage normal.
	User memory or swap usage between 90% and 95%.
	User memory or swap usage above 95%.

`check_privateserver_disk`

Checks the disk space usage on the appliance. Requires the nrpe service to be enabled on the management network interface.

Status

Status	Meaning
	Free disk space normal.
	Free disk space is 5% or less on any filesystem.
	Free disk space is 0% on any filesystem.

`check_privateserver_bandwidth`

Checks the network bandwidth usage on the appliance. Requires the nrpe service to be enabled on the management network interface.

Status

Status	Meaning
	Network bandwidth usage normal.
	Network bandwidth usage between 20 Mb/s and 100 Mb/s on any network interface.
	Network bandwidth usage above 100 Mb/s on any network interface.

Overview

Nagios commands for appliances

Configuring the monitoring host for monitoring

Defining -specific commands

Listing appliances on your network

Verifying the Nagios configuration

Restarting Nagios

Configuring the appliance for Nagios monitoring

Reachability checks

check_privateserver_ping

Status

Output

Service health checks

check_privateserver_sip

Status

Output

check_privateserver_web_console

Status

check_privateserver_ssh_console

Status

check_privateserver_db_status

Status

check_privateserver_db_data

Status

Resource usage checks

check_privateserver_cpu

Status

check_privateserver_memory

Status

check_privateserver_disk

Status

check_privateserver_bandwidth

Status

Appendix: Attachments

`check_privateserver_ping`

`check_privateserver_sip`

`check_privateserver_web_console`

`check_privateserver_ssh_console`

`check_privateserver_db_status`

`check_privateserver_db_data`

`check_privateserver_cpu`

`check_privateserver_memory`

`check_privateserver_disk`

`check_privateserver_bandwidth`