EditRegion3_redNavBar


 

#1 How to Fix: in Confluence; except for "how to fix Confluence itself"
    - Get list of services that may fail
        o Pipeline
        o AFS
        o etc.
    - Get recipe of how to fix each

???Tomcat Servers??? for develpers/testing only???
???xrootd???

  1. How to Fix: in Confluence; except for "how to fix Confluence itself"
    - Get list of services that may fail

 

Tony's List (Confluence: How to Fix)

  • Nagio
  • FastCopy
  • Pipeline
  • AFS
  • xrootd
  • MySql
  • CVS
  • Java CVS
  • SSH
  • Release Manager???

Nagios (Service Overview for
all Service Groups - live)

  • FTP Servers
  • FTP Space
  • CVS Space
  • Oracle Servers
  • AFS Space
  • User Space
  • Groups Space
  • MySQL Servers
  • Web Servers (confluece.slac.stanford.edu, glast-ground.slac.stanford.edu, glastlnx02.slac.stanford.edu, glast.stanford.edu)

Note: On Service Overview for all Host Groups, you will find the following:

  • "isoc-machines", "isoc-servers", "isoc-workstations";
    • "all" machines, "RHEL3" RedHat Enterprise Linux 3, and "RHEL4" Machines and "Windows" 2003 Machines

Troubleshooting Resources:

Nagios (available only if you are at SLAC )

  1. Launch Nagios.
  1. From the menu, select: Servicegroup Overview to see an overview for all service groups (e.g., servers, including: FTP, Oracle, MySQL, and Web; and space, including: CVS, AFS, FTP, User, and Groups).

Drilldown links are provided in both the "Host" and "Services" columns; links are also provided in upper left corner to Host "History", "Notifications" as well as "Service Status Detail for all Hosts".

Note: If you prefer to can see an overview of Host Groups by Machine Types (e.g., RHEL3, RHEL4, Windows, isoc machines, isoc servers, isoc workstations, etc.), click on: Hostgroup Overview in the Nagios menu.

Troubleshooting Tip:

  1. If a service is down, the "Up" box in the Host Status Totals summary (top, left of center) will be red.
    1. In the upper left corner, click on:

    View Service Status Detail for All Service Groups

OR

    From the menu, click on: Service Detail

    1. Scroll down the page; the ailing Host will be flagged (red = down; yellow = warning). All affected services are also listed in the "Service" column.

 

  1. Click on Servicegroup Grid.

The Status Grid for ALL Service Groups will be displayed.

Troubleshooting Tip: If a service is down, the "Up" box in the Host Status Totals summary (top, left of center) will be red.

  1. In the upper left corner, click on:
    View Service Status Detail for All Service Groups.

A page similar to the following will be displayed:

  1. Scroll down the page; the ailing Host will be flagged (red = down; yellow = warning). All affected services are also listed in the "Service" column.

If there is a failure, refer to: ????????????

 

Tomcat Servers

  1. To view the applications running on Tomcat servers, click on: Server Monitoring

The Server Monitoring page will be displayed.

  1. In the upper right corner, click on: Configuration

The "Select Server Name" pane will be displayed.

  1. With the "Shift" key depressed, scroll through the list and select all servers then, in the upper right corner, click on: Applications

A page similar to the following will be displayed (note that there is a "Show all servers/Show prod servers" toggle in the top left corner of this page.:

Note: Applications shown were those running the day this was written. If an application has failed, refer to the Tomcat page in Confluence??????????????