This page last changed on Jul 29, 2009 by panetta.

How to Fix FASTCopy Incoming

Owned by: Philip Hart 

How to Fix: 
1. Ingest Failure
  • The FC_Incoming script writes to the Oracle tables FCOPY_INCOMING (one row/tarball), untars the incoming file to:
    • /nfs/farm/g/glast//u23/ISOC-flight/Archive/fcopy/yyyy/mm/day.mm.dd.wkd/utchh/hh.mm.ss
  • Under that one sees (for example):
    • LISOC_2008096171823 LISOC_2008096171823.tar LISOC_2008096171823.tar.fastcopy.log
  • The fc sender gets the return code from FC_Incoming as status and resends if needed.
  • A cronjob running under root - cron@glastlnx11 runs flightops-ingest, which writes to:
    • FASTCOPY_RAWARCHIVE, _DATAGRAM, _PACKET, _L0GAP [we calculate this last] tables.
  • It also writes to:
    • /nfs/farm/g/glast/u23/ISOC-flight/Archive/level0/srcmnop/yyyy/.../smnopa[apid]t[something].ICDFILEkeyFor example:
      /nfs/farm/g/glast/u23/ISOC-flight/Archive/level0/src0077/2008/04/108.04.17.Thu/utc22/s0077a0958t1208469600.0000062745
  • Afterwards cron@glastlnx06 runs the fcopy_dispatch job, e.g., launches:
    • ISOC/bin/L0Dispatcher.py which runs (from ISOC/bin)
    • ProcessHSK.py - Trending ingest/limit checking -> (only) Oracle
    • ProcessCMD - logging/some MP reconciliation -> (only) Oracle
    • ProcessSCI -> oracle
      • -> dirs, files for NonEventReporting
      • -> halfpipe -> L1proc
        Note: See /afs/slac/g/glast/isoc/flightOps/offline//halfPipe/v6r0p2/config (<-prod) for control of what it does.
  • L0Dispatcher queries FCOPY_INCOMING, ICDFILE tables for tarballs/files ingested but not dispatched -> FC_L0DISPATCH

1. Ingest Failure

To Test: Launch FASTCopy Monitoring (FCWebView), select Incoming, and check the Status column.

  • If there is an INGESTFAILED message, you will need to reset the "submitted" flag to "new", so the cron job
    will pick it up and resubmit the job*.*
    • First, hover over the Filename of the failed package.
    • From the status bar at the bottom of the page, copy the the icdfile_pk number
      (e.g., icdfile_pk=63677).
  • From an ISOC environment terminal, you can access the relevant Oracle instance via a wrapper,e.g.,
    rlwrap sqlplus /@isocnightly
    

    or:

    rlwrap sqlplus /@isocflight
    

Note: For others, see: $TNS_ADMIN/tnsnames.ora
*/var/log/flightops/ingest.log on glastlnx11 may provide clues.
*It may be useful to inspect the table setup via:

desc fcopy_icdfile;
select * from fcopy_jobstate;
  • To Fix:
    • To change the table status, run:
      update fcopy_icdfile set jobstate_fk = 1 where icdfile_pk in (1234, 234, 545);
      
    • Or, to reingest an entire tarball:
      ... where icdfile pk = 123456789;
      

      Note: To test ingest, send a tarball via FC_send.sh (This is unlikely to be needed in production.)

2. Redispatching runs

At times (such as when a primary disk controller pines for the fjords), the ingest will succeed, but the dispatch to the HalfPipe will fail. When this happens, the delivery needs to be resent through ProcessSCI.py. To do so, log into glastlnx11 as glastops (maintainers will be part of the netgroup that allows this), and execute:

ProcessSCI.py -k <incoming_pk>

The key <incoming_pk> is the incoming key of the FASTCopy delivery, as found in fcopy_incoming. This may be determined from the Data Processing web page.


Document generated by Confluence on Jan 21, 2010 11:37