Untitled Document

On this page:

Overview: Performing the actual fit with gtlikelihood

Functional Form of the Likelihood

Choosing the Data to Analyze – Regions of Interest and Source Regions

Model Selection

Precomputation of Likelihood Quantities (Livetime Cube)

Model Fitting with gtlikelihood

Source Detection and Localization

Likelihood

The likelihood that is applicable to the LAT data is constructed and then used to find the best fit model parameters, which includes the description of a source's spectrum, its position, and even whether it exists.

The likelihood (L) is the probability of obtaining the data given an input model. For the purposes of this discussion, the input model is the distribution of gamma-ray sources on the sky, and includes their intensity and spectra.

Overview: Performing the actual fit with gtlikelihood

When performing the actual fit with gtlikelihood, the parameter space can be quite large – the spectral parameters from a number of sources must be fit simultaneously; therefore, the SAE provides a choice of three 'optimizers' to maximize the likelihood efficiently. Fitting requires repeatedly calculating the likelihood for different trial parameter sets until a value sufficiently near the maximum is found; the optimizers guide the choice of new trial parameter sets to converge efficiently on the best set. The variation of the likelihood in the vicinity of the maximum can be related to the uncertainties on the parameters, and therefore these optimizers estimate the parameter uncertainties.

Thus likelihood spectral fitting provides the best fit parameter values and their uncertainties. But is this a good fit? When χ2 is a valid statistic, then we know that the value of χ2 is drawn from a known distribution, and we can use the probability of obtaining the observed value as a goodness-of-fit measure. When there are many degrees of freedom (i.e., the number of energy channels minus the number of fitted parameters) then we expect the χ2 per degree of freedom to be ~1 for a good fit. However, when χ2 is not a valid statistic, we usually do not know the distribution from which the maximum likelihood value is drawn, and therefore we do not have a goodness-of-fit measure.

Model Fitting

When it is known that a source is present, the objective is to determine the best value of the spectral model parameters. To determine the best model (i.e., the one having highest probability of resulting in the data), vary the spectral parameters until the likelihood is maximized.

Note: χ2 is -2 times the logarithm of the likelihood in the limit of a large number of counts in each bin, and therefore where χ2 is a valid statistic, minimizing χ2 is equivalent to maximizing the likelihood.

A number of steps are necessary to fit a source's spectra:

Select the data. Data must be used from a substantial spatial region around the source(s) being analyzed due to the overlapping of the point spread functions of nearby sources.

Select the model. This model selected must include:
- the position of the source(s) being analyzed;
- the position of nearby sources;
- a model of the diffuse emission;
- the functional form of the source spectra; and
- values of the spectral parameters.
In fitting the source(s) of interest, let the parameters for these sources vary, but because the region around these sources includes counts from nearby sources that are not of interest, you might also let the parameters from these nearby sources vary.
Precompute a number of quantities that are part of the likelihood computation. As parameter values are varied in searching for the best fit, the likelihood is calculated many times. Therefore, precomputing a number of computation-intensive quantities will greatly speed up the fitting process.

Perform the actual fit. The parameter space can be quite large – the spectral parameters from a number of sources must be fit simultaneously; therefore, the SAE provides a choice of three optimizers to maximize the likelihood efficiently.

Fitting requires repeatedly calculating the likelihood for different trial parameter sets until a value sufficiently near the maximum is found, and the optimizers guide the choice of new trial parameter sets to converge efficiently on the best set.

The variation of the likelihood in the vicinity of the maximum can be related to the uncertainties on the parameters; therefore, these optimizers estimate the parameter uncertainties.

While likelihood spectral fitting provides the best fit parameter values and their uncertainties, the question remains, "Is this a good fit?":

When χ2 is a valid statistic, we know that the value of χ2 is drawn from a known distribution, and we can use the probability of obtaining the observed value as a goodness-of-fit measure. When there are many degrees of freedom (i.e., the number of energy channels minus the number of fitted parameters), we expect the χ2 per degree of freedom to be ~1 for a good fit.

However, when χ2 is not a valid statistic, we usually do not know the distribution from which the maximum likelihood value is drawn; therefore we do not have a goodness-of-fit measure.

Source Localization

The optimizers find the best fit spectral parameters, but not the location. In other words, the fitting tool does not fit the source coordinates. However, a tool is provided that performs a grid search – mapping out the maximum likelihood value over a grid of locations. As explained below (see Source Detection), it is convenient to use a quantity called the 'Test Statistic' TS that is maximized when the likelihood is maximized.

Source Detection

The Test Statistic is defined as TS=-2ln(Lmax,0/Lmax,1), where Lmax,0 is the maximum likelihood value for a model without an additional source (the 'null hypothesis') and Lmax,1 is the maximum likelihood value for a model with the additional source at a specified location. As can be seen, TS is a monotonically increasing function of Lmax,1, which is why maximizing TS on a grid is equivalent to maximizing the likelihood on a grid. In the limit of a large number of counts, Wilkes Theorem states that the TS for the null hypothesis is asymptotically distributed as χ²x (here χ2 is the distribution, not a value of the statistic), where x is the number of parameters characterizing the additional source. This means that TS is drawn from this distribution if no source is present, and an apparent source results from a fluctuation. Thus, a larger TS indicates that the null hypothesis is incorrect (i.e., a source really is present), which can be quantified. Whether the LAT data are in the limit where this assumption applies is under investigation.

Functional Form of the Likelihood

LAT data is binned into a great many bins because the counts are characterized by many variables; therefore, even with many counts, each bin will contain a small number of counts. The observed number of counts in each bin is characterized by the Poisson distribution, and with a small number of counts per bin, the Poisson distribution cannot be approximated by a normal distribution.

The likelihood L is the product of the probabilities of observing the detected counts in each bin. Assume that the expected number of counts in the ith bin is m_i. Note that m_i is a function of the source model, and will differ for different models. The probability of detecting n_i counts in this bin is p_i=m_i^n_i exp[-m_i]/n_i!. The likelihood L is the product of pi for all i. But notice that this product factors into the product of the m_i^n_i/n_i!, which depends on the data (i.e., the values of n_i) and the product of the exp[-m_i]. The product of exp[-m_i] for all i is equal to the exponential of minus the sum of m_i. The sum of m_i is just the total number N_exp of counts that the source model predicts should have been detected.

Therefore, the likelihood L can be factored into exp[-N_exp], which is purely a function of the source model, and the product of m_i^n_i/n_i!, which is a function of both the source model and the data:

L = exp[-N_exp] ∏i m_i^n_i/n_i!

This likelihood, with finite size bins and n_i that may be greater than 1, is the basis for binned likelihood. Since binning destroys information (i.e., the precise values of the quantities describing a count), there is a tradeoff between the number of bins (and thus the bin size) and the accuracy; smaller bins result in a more accurate likelihood.

If we let the bin sizes get infinitesmally small, then n_i=0 or 1. The likelihood is now the product of exp[-N_exp], as before, and the product of m_i where i is now the index over the counts.

L = exp[-N_exp] ∏i m_i

Since m_i is calculated using the precise values for each count, and not an average over a finite size bin, this unbinned likelihood is the most accurate.

For a small number of counts, the unbinned likelihood can be calculated rapidly; but as the number of counts increases, the time to calculate the likelihood becomes prohibitive and the binned likelihood must be used.

Choosing the Data to Analyze – Regions of Interest and Source Regions

Analyzing the Spectrum of a Point Source

The data from a substantial spatial region around the source(s) being analyzed must be used because of the overlapping of the point spread functions of nearby sources.

For the greatest accuracy possible in modeling a single source, it would be necessary to model the entire sky. In reality, the influence of sources a very great distance away from your source will be greatly attenuated (i.e., at 100 MeV, 68% of the counts will be within 3.5 degrees due to the large point spread function at low energies). Thus, it is recommended that you include sources from a large 'Source Region', and counts from a smaller 'Region of Interest' (ROI).

Obtain the positions and spectra of sources in the Source Region outside of the ROI from a catalog; these sources are included for their contribution to the counts in the ROI. You may elect to:

fix the parameters of the sources other than the one you are studying at their catalog values; or

perform a fit on the parameters of all these sources.

All sources in the source region will be used, and the size of the Source Region appropriate for your needs is determined by your experience and prior experimentation. In the meantime, the recommended default values are ROI+10 and ROI+5 degrees for sources dominated by ~100 MeV and ~1 GeV events, respectively.

All counts in the ROI will also be included and you will also determine the appropriate ROI size based on your experience and prior experimentation. In the meantime, the recommended default values are 20 and 15 degrees for sources dominated by ~100 MeV and ~1 GeV events, respectively.

When you extract the LAT data from the online database, you will be able to choose the time range, spatial region and energy range from which the counts are extracted. The result is a photon file (FT1) with these counts, and a spacecraft file (FT2) describing the position of the spacecraft and the orientation of the LAT over the chosen time range. The spatial region is the Region of Interest (ROI). Further selections can be performed using the gtselect tool.

Note: Data selection criteria are recorded in the photon file with 'Data Sub-Space' (DSS) keywords.

Model Selection

Models used by the SAE are stored in XML files. XML is a language to define and store data; HTML used to define webpages is a form of XML. XML files are ASCII files full of 'tags' delimited by the < and > symbols.

Note: A model includes the position of the source(s) being analyzed, the position of nearby sources, a model of the diffuse emission, the functional form of the source spectra, and values of the spectral parameters. In fitting the source(s) of interest, you will let the parameters for these sources vary, but because the region around these sources includes counts from nearby sources in which you are not interested, you might also let the parameters from these nearby sources vary.

For historic reasons the SAE uses two XML formats, one used for parameter fitting (e.g., by the likelihood tool) and the other for source simulation. The likelihood XML format includes parameter uncertainties but does not allow time dependence, whereas the simulation format does. The simulation format also includes source models that are not in the likelihood format.

Creating and Editing Source Models

A GUI-driven tool called ModelEditor is included in the SAE tools, and is invoked at the command line; its use is fairly intuitive.

While you might want to define each source in your Source Region, it is easier to start with a catalog for the region. After opening a catalog, choose, choose File --> Extract.... A GUI will open that allows you to define your:

source Region (RA, Dec and radius in degrees), and the
flux limit,
file with the catalog, and the
output file.

Note: Remember, the Source Region should be larger than the Region of Interest.

A GUI with a list of selected sources will then be displayed. When you select a source, you will be presented with the source properties (i.e., a position and a spectral model), which can be edited. For example, the catalog may have assumed the spectrum was a simple power law, but you would like to try a power law with an exponential cutoff.

Note: To save the model for a particular source, click on the Set components button.

Using the pull-down menu, you can:

delete a selected source,
add a diffuse, or
add a point source.

Multiple xml files. The entire source model need not be in the same XML file. The tools accept lists of source XML files, both at the command line and by reading in simple ASCII files listing the XML files.

The model parameters (in the likelihood XML format) have a number of attributes:

value	The parameter's value; may be an initial guess, or the result of a fit.
scale	A scale factor for the parameter.
name	Name given to the parameter.
max	Parameter's maximum value.
min	Parameter's minimum value.
free	Whether the parameter should be fit: 0 means the parameter value should be fixed, 1 that it should be fit.

Therefore, if the 'value' is '4.3' and the 'scale' is '1e^-9,' the actual parameter value is 4.3x10^-9.

Spatial Models

Select from four models:

SkyDirFunction	A point source.
ConstantValue	A diffuse source with a constant flux per steradian.
SpatialMap	A spatially varying diffuse source. The map of the source is provided by a file. The default is the map of the Galactic Diffuse Emission.
MapCubeFunction	A 3 dimensional FITS map (two sky coordinates and energy) used to map diffuse emission, thereby allowing arbitrary spectral variation as a function of sky position

Spectral Models

The units for the spectral models are cm^-2 s^-1 MeV^-1 for point sources and cm^-2 s^-1 MeV^-1 sr^-1 for diffuse sources. All energies are in MeV.

See: Likelihood Tutorial: 4. Create a Source Model XML File.

Precomputation of Likelihood Quantities (Livetime Cube)

The computation of the likelihood usually occurs many times as parameter values are varied in searching for the best fit. While not strictly necessary, precomputing a number of computation-intensive quantities will greatly speed up the fitting process. Fitting involves varying model parameters until the best values are found (the methodology is described elsewhere). Fits are done with various model parameters fixed, or with different sources present or absent. Certain quantities need be calculated only once, speeding up the repeated computation of the likelihood.

Livetime Cube

Because livetime cube calculation is computationally intensive, the livetime cubes are provided on different timescales. Thus to analyze data from a given time period, download livetime cubes spanning most of the period, run gtltcube for the time not covered by these pre-packaged livetime cubes, and then sum all these livetime cubes with gtltsum.

Pre-packaged Livetime Cubes. LAT instrument response functions (IRFs) are a function of the inclination angle, the angle between the direction to a source and the LAT normal.

The number of counts that a source should produce should therefore depend on the amount of time that the source spent at a given inclination angle during an observation. This livetime quantity, the time that the LAT observed a given position on the sky at a given inclination angle, depends only on the history of the LAT's orientation during the observation, and not on the source model.

The array of these livetimes at all points on the sky is called the 'livetime cube' and, as a practical matter, the livetime cannot be provided as a continuous function of inclination angle or position on the sky.

Thus, the SAE livetimecubes are provided on a healpix grid on the sky, and in inclination angle bins.

Livetime Cubes Calculated by gtltcube. Livetime cubes are calculated by the gtltcube tool. The inputs are:

Size of the spatial grid (in degrees).
The inclination angle binning (in cos τ steps).
The spacecraft file (FT2).

gtltcube calculates the livetime cube for the entire sky for the time range covered by the spacecraft file; therefore, the same output file can be used for analyzing different regions of the sky over the same time range.

Note: The livetime cube is calculated on a spatial healpix grid.

Livetime Cubes Are Additive. The livetime cube for a time range that is the sum of two time subranges is the sum of the livetime cube for each of the time subranges.

Thus, the livetime cube for five calendar days can be calculated by adding the livetime cubes for each of the five calendar days. Livetime cubes can be added by gtltsum.

Exposure Maps

Exposure maps are used for extended sources such as the diffuse Galactic and Extragalactic backgrounds, not for individual sources.

Remember, a likelihood analysis considers the counts in the Region of Interest resulting from all sources in a Source Region that is a superset of the Region of Interest. Therefore the exposure map must extend over the entire Source Region, and is specific to the Region of Interest.

The likelihood consists of two factors:

The first is dependent on the detected counts and differs between binned and unbinned likelihood calculations.

The second is equal to the exponential of minus the expected total number of counts Nexp for the source model.

The exposure map is the total exposure – area multiplied by time – for a given position on the sky, producing counts in the Region of Interest. Since the response function is a function of the photon energy, the exposure map is also a function of this energy.

Thus, the counts produced by a source at a given position on the sky is the integral of the source flux and the exposure map (a function of energy) at that position.

gtexpmap

The exposure map is calculated by the gtexpmap tool, which derives the Region of Interest from the observation's photon file (FT1); remember that the region from which counts were selected is recorded by keywords in the photon file's FITS header.

Inputs. The Source Region is assumed to be centered on the Region of Interest, but the size of the Source Region must be input. The spatial and energy gridding for the Source Region must also be provided.

The tool also needs the livetime spent at each inclination angle at every point in the Source Region; this can be provided by the livetime cube (as described in the previous subsection); or, if the livetime cube file is not provided, the tool will calculate these livetimes from the spacecraft file (FT2).

Note: The exposure map is calculated on a longitude-latitude grid, while the livetime cube is calculated on a spatial healpix grid.

Diffuse Emission and Individual Counts

The term in the likelihood that is dependent on the counts is a function of the probability of observing each count. This probability is the sum of the probability from all sources, both point and diffuse.

The gtdiffrsp tool calculates the contribution of diffuse sources to this probability.

Notes:

The photon file (FT1) lists the position of all counts.

The spacecraft file (FT2) provides the history of the inclination angle for all positions from which the diffuse emission originates.

The source model describes the diffuse emission.

Model Fitting with gtlikelihood

Fitting involves finding the set of parameters that maximizes the likelihood. (See Overview: Performing the actual fit with gtlikelihood.

The maximum is found by iteratively calculating the function for different sets of trial parameters, and estimating derivatives of the function with respect to the parameters. The algorithms choose new trial parameters that are progressively closer to the set that maximizes the function.

Notes:

The function is calculated for new sets of trial parameters, until the change in the function value between iterations is sufficiently small (or the number of iterations reaches a maximum value).

While iterating, these algorithms map out the dependence of the function on the parameters, particularly near the function's maximum. The uncertainties on the best fit parameters are related to this dependence.

Different algorithms vary in how rapidly they converge to the function maximum, the amount of computer memory they require, and the accuracy with which they map out the dependence of the function on the parameters near the maximum (and thus estimate the uncertainty).

Inputs necessary for fitting parameters with gtlikelihood:

An event file with the counts to be fit. (See Choosing the Data to Analyze - Regions of Interest (ROIs) and Source Regions.)

A spacecraft file that covers the time range over which the counts in the event file were extracted.

A source model. (See Model Selection.)

Precomputed quantities, such as the livetime cube and the exposure map. (See Precomputation of Likelihood Quantities (Livetime Cube.)

When running gtlikelihood,you have a choice of algorithms, called optimizers, for maximizing the likelihood. (See Overview: Performing the actual fit with gtlikelihood.)

A reasonable strategy is to run gtlikelhood with the DRMNFB optimizer until convergence, and then to run gtlikelhood with NEWMINUIT with the best fit parameter values from this first run in order to calculate the uncertainties on the parameter values.

DRMNFB is efficient at finding the maximum likelihood but approximates the parameter dependence near this maximum; consequently, the uncertainties provided by this optimizer may not be reliable.

On the other hand, NEWMINUIT is a conservative optimizer that converges more slowly than these other methods; indeed, it may exhaust the number of permitted iterations before convergence. However, NEWMINUIT more accurately maps out the parameter space near the likelihood maximum, thus providing more reliable uncertainty estimates.

Convergence criteria is controlled by the hidden variable, fit_tolerance, whose definition depends on the particular optimizer, but is approximately the fractional change in the logarithm of the likelihood.

Source Detection and Localization

Currently, the likelihood tool, gtlike does not fit the source position parameters. Instead, the gttsmap tool runs gtlike at each gridpoint on a rectangular grid. The user provides a model of all the other sources in the Source Region, and gtssmap calculates the Test Statistic (TS) for adding an additional source at each gridpoint. A livetimecube and exposure map precomputed for running gtlike can also be used for gttsmap.

If you have a good approximate source location, such as the gridpoint with the maximum TS from gttsmap or a candidate source identification, then you can refine the location by running gtfindsrc, which maximizes the TS in continuous space.

The TS is -2 times the logarithm of the ratio of the likelihood for the model without the additional source (the null hypothesis) to the likelihood for the model with the additional source. Thus the TS is maximized when the likelihood for the model with the source is maximized; therefore, the location with the maximum TS is the best fit source position.

But is the addition of this source significant? By Wilks' Theorem, if there is no additional source then the TS should be drawn from an χ²n distribution, where n is the difference in the degree of freedom between the models with and without the additional source. In the case under consideration, the additional source is characterized by a source intensity and spectral index (the spectrum is assumed to be a power law); thus n=2. Wilks' Theorem is valid asymptotically as the number of counts increases. At the time this was written, studies were underway to determine if the number of LAT counts for a typical analysis is sufficiently large. If Wilks' Theorem is valid, then integrating χ²2 from the observed TS value to infinity gives the probability that the apparent source is a fluctuation. The resulting significance is ~(TS)^1/2σ, and thus TS=25 is equivalent to 5σ.

Owned by:

David Band

Last updated by: Chuck Patterson 10/30/2008