Pipeline Jython Scriptlets

Also see:

and in Confluence, see:

Jython scriptlet processes within the pipeline enjoy access to the full Java API.  Access to the Data Catalog is provide via an object named "datacatalog".

Example: Dataset registration is performed by calling:

datacatalog.registerDataset(DATA_TYPE, DATA_CATALOG_LOCATION,
DISK_LOCATION [, META_DATA])

where:

  • DATA_TYPE is the type of data within the file.
    • Typical values are merit, mc, recon, ...

     

    For example:

    Data Type
    Description
    Format
    L0dataT
    Raw data; post trigger
    LDF
    L0dataF
    Raw data; post OBF (on-board filter)
    LDF
    cal
    Calorimeter ROOT Tree
    ROOT
    digi
    Digitization ROOT Tree
    ROOT
    fes
    Pointing history (exposure)
    EBF
    gcr
    Galactic cosmic ray (heavy ion ROOT ntuple)
    ROOT
    mc
    Monte Carlo ROOT Tree
    ROOT
    merit
    "Analysis" ROOT ntuple
    ROOT
    recon
    Reconstruction ROOT Tree
    ROOT
    relation
    ROOT relational table
    ROOT
    meta
    ROOT file for cel (composite event list)
    ROOT
    svac
    SVAC ROOT ntuple
    ROOT
    svachist
    SVAC histogram
    ROOT
  • DATA_CATALOG_LOCATION has the following form:  <logical folder path>[<dataset group name>:]<dataset name>
    • <logical folder path> is required and has the form: /folder1/sub-folder/.../
      • It denotes the location within the Data Catalog folder-tree where the dataset will be registered.
      • The folder need not exist, it will be created if necessary.
    • <dataset group name> is optional.
      • If present, it must be followed by a ":" (colon) character.
      • The name is a simple alphanumeric string (spaces are not permitted.)
      • A dataset group is used to bundle together datasets which are fragments of a larger dataset.
      • For example, all merit files of a large monte carlo task are generally cataloged together using a dataset group.
      • <dataset name> is required.
        • It is simply the name of the dataset.
        • It is an alphanumeric string (spaces are not permitted.)
        • It must be unique within the folder or group where it will be placed.
  • DISK_LOCATION has the following form: <disk file path>[@<site name>]
    • <disk file path> is required.
      • It is the full path on disk (or in XRootd, etc.) to the file that is being registered.
    • <site name> is optional.
      • If specified, it must be preceded by a "@" (ampersand) character.
      • The site name tells the data catalog where to find the physical file.
      • Currently it may be one of:
        • SLAC, SLAC_XROOT, IN2P3, IN2P3_HPSS, UW
        • If no site name is specified, a default of "SLAC" is assumed.
  • META_DATA is optional.  If specified, the supplied meta-data will be attached to the dataset upon registration.  Meta-data provide a basis for searching the Data Catalog for datasets.  A META_DATA expression has the following form: <name>=<value>[:<name2>=<value2>[...]]
    • <name> is required.
      • It is simply the name of the meta-data object, but it's form is significant because it denotes the object type of the <value> parameter. The Data Catalog will perform a type conversion and store the <value> parameter internally based on the type specified by the name:
        • n[A-Z]+.* (ex: nEvents, nSecondsMET) indicates a numeric value
        • t[A-Z]+.* (ex: tStartDate, tEndDate) indicates a timestamp value
        • Anything else (ex: RunStatus, myDogsName) indicates a string value
    • <value> is required and must be separated from <name> by a single '=' (equals) character.
      • The value must reflect the type specified by <name> or an error will be thrown, and the registration will fail.
        • Numeric values have 38 decimal digits of precision for integers and 18 for floats. Leading and trailing zeros will be removed during conversion.
        • Timestamp values must be supplied in the following format: yyyy-mm-dd hh:mm:ss.[fff...]
          (fff... is an optional, fractional seconds component with nanosecond precision.)
        • String values are simply ASCII strings. Put whatever you want in there, even numbers.
    • Multiple <name>=<value> pairs may be supplied if separated by ":" (colon) characters

Below is an example. The parameters are interpreted as follows:

  • It registers a "merit" type dataset. 
  • The dataset is placed under the Data Catalog folder "/ServiceChallenge/Interleave3h-GR-v11r17/runs/", in the group "merit", with a name of  "000002". 
  • The file is found on disk at:

    "/nfs/farm/g/glast/u43/MC-tasks/Interleave3h-GR-v11r17/
    data/merit/Interleave3h-GR-v11r17-000002-merit.root"

and is assumed to be located at SLAC (because no site name was specified).

datacatalog.registerDataset("merit","/ServiceChallenge/
Interleave3h-GR-v11r17/runs/merit:000002","/nfs/farm/g/
glast/u43/MC-tasks/Interleave3h-GR-v11r17/data/merit/
Interleave3h-GR-v11r17-000002-merit.root")


 

Owned by: Dan Flath
Last updated by: Chuck Patterson 11/24/2008