Resource Monitors: Nagios, Server
Monitoring, & Ganglia
There are three primary monitoring tools for the Science Analysis Systems software infrastructure: Nagios (currently available only at SLAC), the tomcat Server Monitoring tool, and Ganglia (for monitoring glast domains, including: glast-oracle, glastlnx, nfs-glast, and glast-xrootd).
Note: If you are at SLAC, and wish to get a quick overview of the hosts and machines monitored by Nagios, view Service Status Details for all Service Groups.
 |
Nagios (available only if you are at SLAC )
- Launch Nagios.
- From the menu, select: Servicegroup Overview to see an overview (similar to the following)for all service groups:
Note: If you prefer to can see an overview of Host Groups by Machine Types (e.g., RHEL3, RHEL4, Windows, isoc machines, isoc servers, isoc workstations, etc.), click on: Hostgroup Overview in the Nagios menu.

|
 |
Things to Note:
- Service Groups monitored are one of two types: Server or Space.
- Server groups monitored include:
- FTP Servers
- Oracle Servers
- MySQL Servers
- Web Servers
- Space groups monitored include:
- CVS Space
- AFS Space
- FTP Space
- User Space
- Groups Space
- Drilldown links are provided in both the "Host" and "Services" columns; links are also provided in upper left corner to Host "History", "Notifications" as well as "Service Status Detail for all Hosts".
|
Note: If you prefer, you may wish to monitor the Servicegroup Summary page, or the Hostgroup Summary page instead of the respective Overview pages. |
Troubleshooting Tips: Nagios
Problem conditions are color-coded:
- Red = Outage
- Yellow = Warning
|
|
If there is an Outage, the Host Status "Up" box on the left will be red.
Notes:
- If the host is on a local network (i.e., not connected to Nagios via a router), Nagios simply issues a 'host check" command and assumes the host is up if the command returns an OK state.
If the host returns any other status level, Nagios assumes the host is down.
- If the host is on a remote network, Nagios indicates if the Host is (from Nagios' point-of-view) Up, Down, or Unreachable (e.g., one or more routers connecting to the host is down).
See: Determining Status and Reachability of Network Hosts.
|
If there is a Warning, a yellow warning link will be displayed in the host's Services column; click on this link to view the Status Information: |
|

After reading the Status Information, click on the Host link in the left column (glastlnx02.slac.stanford.edu) to view a a list of all Services and disk space provided by the affected host, together with the Status, Last Check, Duration, Attempt, and Status Information for each.
Server Monitoring (Tomcat Servers)
- To view the applications running on Tomcat servers, click on: Server Monitoring
The Server Monitoring page will be displayed.
|
|
- In the upper right corner, click on: Configuration
The "Select Server Name" pane will be displayed.

- With the "Shift" key depressed, scroll through the list and select all servers then, in the upper right corner, click on: Applications
A page similar to the following will be displayed (note that there is a "Show all servers/Show prod servers" toggle in the top left corner of this page.:

Note: Applications shown were those running the day this was written. If an application has failed, refer to the Tomcat page in Confluence??????????????
Troubleshooting Resources
|