This page last changed on May 28, 2008 by chuckp.

How to Fix Xrootd

Owned by: Wilko Kroeger 

Note:  xrootd is the daemon for the Xrootd data server.

The xrootd cluster consists of redirectors and data server. A redirector is the central entrance point for a client. It does not provide any files but it directs a client to an appropriate data server.

type Machines comment
data server sulky02, sulky07, sulky08, sulky47, sulky48 sulky47/48 are read only
redirector glastlnx04, glastlnx05 The DNS alias glast-rdr points to these two machines.

Relevant directories on each Xrootd server are:

Installation dir /opt/xrootd
Xrootd log and core files /var/adm/xrootd/logs /var/adm/xrootd/core
Olbd log and core files
/var/adm/xrootd/logs /var/adm/xrootd/core

 Monitoring Tools

Free disk space glast xrootd server: http://www.slac.stanford.edu/~wilko/glastmon/xrddisk.html

Nagios01 xrootd Service Status:  http://nagios01.slac.stanford.edu/nagios/cgi-bin/status.cgi?hostgroup=GLAST+xrootd+Servers&style=detail

Ganglia glast-xrootd: http://ganglia01.slac.stanford.edu:8080/ganglia/glast/?c=glast-xrootd&m=&r=hour&s=descending&hc=4

 Start/Stop daemons

The glast xrootd servers (data servers and redirectors) are configured so that ranger will automatically restart a non running xrootd or olbd daemon. Ranger runs every 15 min.

Notes:

  • If you don't want xrootd/olbd to run remove the link /opt/xrootd/prod. Otherwise ranger will restart the daemons. 
  • The prod/etc directory contains scripts to stop and start the daemons. Most of the times it is the easiest to restart both xrootd and olbd daemons using the RestartALL script.
    > cd /opt/xrootd
    > ./RestartALL
  • Because of some configuration issue ./RestartALL does not yet work on *glastlnx04*, *glastlnx05* and *sulky02*.
    In this case, use:
    > ./RestartAll_glast

    This issue should be resolved soon (by May19) 

  • If an individual daemon should be started or stopped, use:
      >  ./prod/etc/StopXRD
    # stop xrootd
    or
    >  ./prod/etc/StartXRD # start xrootd
  • Similar commands are used for the olbd (i.e., /prod/etc/StopOLB, /prod/etc/StartOLB)
  • For sulky47/48, use: > ./RestartAll
  • For glastlnx04/50 and sulky02 use > ./RestartAll_glast
    After a restart, check that the daemons are running and only one instance exists:
    > ps -ef | grep xrootd

 Background Notes:

Xrootd (also referred to as SCALLA for "Scalable Cluster Architecture for Low Latency Access") handles both disk and tape, thereby providing the option to write data for which demand has been low, to tape and retrieve it upon request. In addition to scalability and rapid response times, Xrootd is able to handle concurrent requests for data located in the same directory.  In contrast, AFS also handles concurrent requests, but replicates requested files on additional servers as needed in order to keep pace with requests for data, making it much less efficient in its use of disk space.  NFS is not as good as AFS at handling concurrent requests for data from the same directory, and its use is therefore being minimized.

Document generated by Confluence on Jan 21, 2010 11:37