EditRegion3_redNavBar

Resource Monitors: Nagios, Server
Monitoring, & Ganglia

There are three primary monitoring tools for the Science Analysis Systems software infrastructure: Nagios (currently available only at SLAC), the tomcat Server Monitoring tool, and Ganglia (for monitoring glast domains, including: glast-oracle, glastlnx, nfs-glast, and glast-xrootd).

Note: If you are at SLAC, and wish to get a quick overview of the hosts and machines monitored by Nagios, view Service Status Details for all Service Groups.

Nagios (available only if you are at SLAC )

  1. Launch Nagios.
  1. From the menu, select: Servicegroup Overview to see an overview (similar to the following)for all service groups:

Note: If you prefer to can see an overview of Host Groups by Machine Types (e.g., RHEL3, RHEL4, Windows, isoc machines, isoc servers, isoc workstations, etc.), click on: Hostgroup Overview in the Nagios menu.

Things to Note:

  • Service Groups monitored are one of two types: Server or Space.
    • Server groups monitored include:
      • FTP Servers
      • Oracle Servers
      • MySQL Servers
      • Web Servers
    • Space groups monitored include:
      • CVS Space
      • AFS Space
      • FTP Space
      • User Space
      • Groups Space
  • Drilldown links are provided in both the "Host" and "Services" columns; links are also provided in upper left corner to Host "History", "Notifications" as well as "Service Status Detail for all Hosts".
Note: If you prefer, you may wish to monitor the Servicegroup Summary page, or the Hostgroup Summary page instead of the respective Overview pages.

Troubleshooting Tips: Nagios

Problem conditions are color-coded:

  • Red = Outage
  • Yellow = Warning

If there is an Outage, the Host Status "Up" box on the left will be red.

Notes:

  • If the host is on a local network (i.e., not connected to Nagios via a router), Nagios simply issues a 'host check" command and assumes the host is up if the command returns an OK state.

If the host returns any other status level, Nagios assumes the host is down.

  • If the host is on a remote network, Nagios indicates if the Host is (from Nagios' point-of-view) Up, Down, or Unreachable (e.g., one or more routers connecting to the host is down).

See: Determining Status and Reachability of Network Hosts.

If there is a Warning, a yellow warning link will be displayed in the host's Services column; click on this link to view the Status Information:

After reading the Status Information, click on the Host link in the left column (glastlnx02.slac.stanford.edu) to view a a list of all Services and disk space provided by the affected host, together with the Status, Last Check, Duration, Attempt, and Status Information for each.

Server Monitoring (Tomcat Servers)

  1. To view the applications running on Tomcat servers, click on: Server Monitoring

The Server Monitoring page will be displayed.

  1. In the upper right corner, click on: Configuration

The "Select Server Name" pane will be displayed.

  1. With the "Shift" key depressed, scroll through the list and select all servers then, in the upper right corner, click on: Applications

A page similar to the following will be displayed (note that there is a "Show all servers/Show prod servers" toggle in the top left corner of this page.:

Note: Applications shown were those running the day this was written. If an application has failed, refer to the Tomcat page in Confluence??????????????

Troubleshooting Resources