Previous Topic

Next Topic

Book Contents

Book Index

Assessing Fit Results

The physiologic behavior of PET tracers is usually quite complex. A comprehensive model accurately describing its kinetics therefore requires many compartments and exchange parameters. However, as the PET signal is limited in quality and only represents the sum of all tracer radioactivity, the model must be simplified to a degree that only a few parameters remain. When estimating a model with many parameters, the variance of the parameter estimates tends to be very high, so that a reliable interpretation of the results becomes impossible. A simplified model with fewer parameters will provide more precise results, but these parameters may be biased. Therefore, the optimal trade-off between bias and uncertainty has to be sought by testing models of decreasing complexity.

After every model fit detailed information about the parameter estimates and the goodness-of-fit is available on the Details tab.

PKIN Details Tab

Parameter Confidence Intervals

Nonlinear regression using the Marquart-Levenberg optimization reports a standard error for each fitted parameter value. If the inherent fitting assumptions are true, a 95% confidence interval can be approximated by the result parameter plus/minus two standard errors. This confidence interval is displayed for each fitted parameter (Conf.low, Conf.high).

There will be a 95% chance that the confidence interval contains the true parameter value. A sufficiently narrow confidence interval indicates that the parameter could be determined with a reasonable certainty, whereas a wide interval makes it necessary to revise the configuration of the used model, or look for a more appropriate model.


The following measures are defined which allow a direct or indirect assessment of the goodness-of-fit:


Degrees of freedom defined as the number of valid measurements minus the number of fitted parameters.


Sum of squared (unweighted) residuals.


Reduced Chi square. Sum of squared, weighted residuals, divided by the degrees of freedom.

The reduced chi-square provides a useful measure of goodness-of-fit. If the model describes the measured data, the reduced chi-square will mostly represent the variance of the data and will be close to 1.0 (when weighting is appropriate).


Akaike Information Criterion. The AIC methodology attempts to find the model that best explains the data with a minimum of free parameters. The AIC is calculated with the second order correction for small sample size (<40).

The preferred model is the one with the lowest AIC value.


Schwartz Criterion, also called Bayesian Information Criterion (BIC).

The preferred model is the one with the lowest SC value.


Another criterion used in the Scientist Software (MicroMath, Saint Louis, Missouri USA) is the Model Selection Criterion.

The preferred model is the one with the highest MSC value.


There is also a measure of the goodness-of-fit, the coefficient of determination R2, a number between 0 and 1. A value of 0 means that the fit is not better than a horizontal line through the mean of all measurements, whereas a value of 1 means that all measurements lie exactly on the curve. High R2 values indicate that the model curve is close to the measurement.


Another information about the residuals is provided by the root mean square value Sy,x. It is defined as the standard deviation of the residuals and can be used to generate synthetic measurements in Monte Carlo simulations, provided all measurements have the same variability.

Runs test

The runs test is a statistical test to decide whether the model curve deviates systematically from the data. It is based on the number of runs resulting from the fit. A run is a set of consecutive measurements which are above (positive residuals) or below (negative residuals) the measurement. Given the assumption that the residuals are randomly distributed, the probability p of the occurrence of a number of runs can be calculated. If p is small (eg. p<0.05) the measurements systematically deviate from the model curve. Such a finding signals that most likely an inadequate model was fitted and further investigations of the result are not sensible. The test is only applicable for a sufficient number of positive and negative runs (>8).

1 means that the systematic deviation between model and measurement (p<0.05). 0 means there is no significant deviation.