EditRegion3_redNavBar

Resource Monitors: Nagios, Server
Monitoring, & Ganglia

There are three primary monitoring tools for the Science Analysis Systems software infrastructure: Nagios (currently available only at SLAC), the tomcat Server Monitoring tool, and Ganglia (for monitoring glast domains, including: glast-oracle, glastlnx, nfs-glast, and glast-xrootd).

Note: If you are at SLAC, and wish to get a quick overview of the hosts and machines monitored by Nagios, view Service Status Details for all Service Groups.

Nagios (available only if you are at SLAC )

  1. Launch Nagios.
  1. From the menu, select: Servicegroup Overview to see an overview (similar to the following)for all service groups:

Note: If you prefer to can see an overview of Host Groups by Machine Types (e.g., RHEL3, RHEL4, Windows, isoc machines, isoc servers, isoc workstations, etc.), click on: Hostgroup Overview in the Nagios menu.

Things to Note:

  • Service Groups monitored are one of two types: Server or Space.
    • Server groups monitored include:
      • FTP Servers
      • Oracle Servers
      • MySQL Servers
      • Web Servers
    • Space groups monitored include:
      • CVS Space
      • AFS Space
      • FTP Space
      • User Space
      • Groups Space
  • Drilldown links are provided in both the "Host" and "Services" columns; links are also provided in upper left corner to Host "History", "Notifications" as well as "Service Status Detail for all Hosts".
Note: If you prefer, you may wish to monitor the Servicegroup Summary page, or the Hostgroup Summary page instead of the respective Overview pages.

Troubleshooting Tips: Nagios

Problem conditions are color-coded:

  • Red = Outage
  • Yellow = Warning

If there is an Outage, the Host Status "Up" box on the left will be red.

Notes:

  • If the host is on a local network (i.e., not connected to Nagios via a router), Nagios simply issues a 'host check" command and assumes the host is up if the command returns an OK state.

If the host returns any other status level, Nagios assumes the host is down.

  • If the host is on a remote network, Nagios indicates if the Host is (from Nagios' point-of-view) Up, Down, or Unreachable (e.g., one or more routers connecting to the host is down).

See: Determining Status and Reachability of Network Hosts.