USGS - science for a changing world

Northern Prairie Wildlife Research Center

  Home About NPWRC Our Science Staff Employment Contacts Common Questions About the Site

Suggestions for Presenting the Results of Data Analyses

Information-Theoretic Methods

These methods date back only to the mid-1970s. They are based on theory published in the early 1950s and are just beginning to see use in theoretical and applied ecology. A synthesis of this general approach is given by Burnham and Anderson (1998). Much of classical frequentist statistics (except the null hypothesis testing methods) underlie and are part of the information-theoretic approach; however, the philosophy of the 2 paradigms is substantially different.

As part of the Methods section of a paper, describe and justify the a priori hypotheses and models in the set and how these relate specifically to the study objectives. Avoid routinely including a trivial null hypothesis or model in the model set; all models considered should have some reasonable level of interest and scientific support (Chamberlin's [1965] concept of "multiple working hypotheses"). The number of models (R) should be small in most cases. If the study is only exploratory, then the number of models might be larger, but this situation can lead to inferential problems (e.g., inferred effects that are actually spurious; Anderson et al. 2001). Situations with more models than samples (i.e., R > n) should be avoided, except in the earliest phases of an exploratory investigation. Models with many parameters (e.g., K ~ 30-200) often find little support, unless sample size or effect sizes are large or if the residual variance is quite small.

A common mistake is the use of Akaike's Information Criterion (AIC) rather than the second-order criterion, AICc. Use AICc (a generally appropriate small-sample version of AIC) unless the number of observations is at least 40 times the number of explanatory variables (i.e., n/K > 40 for biggest K over all R models). If using count data, provide some detail on how goodness of fit was assessed and, if necessary, an estimate of the variance inflation factor (c) and its degrees of freedom. If evidence of overdispersion (Liang and McCullagh 1993) is found, the log-likelihood must be computed as loge ()/c-hat and used in QAICc, a selection criterion based on quasi-likelihood theory (Anderson et al. 1994). When the appropriate criterion has been identified (AIC, AICc, or QAICc), it should be used for all the models in the set.

Discuss or reference the use of other aspects of the information-theoretic approach, such as model averaging, a confidence set on models, and examination of the relative importance of variables. Define or reference the notation used (e.g., K, δi, and wi). Ideally, the variance component due to model selection uncertainty should be included in estimates of precision (i.e., unconditional vs. conditional standard errors) unless there is strong evidence favoring the best model such as an Akaike weight (wi) > about 0.9.

For well-designed, true experiments in which the number of effects or factors is small and factors are orthogonal, use of the full model will often suffice (rather than considering more parsimonious models). If an objective is to assess the relative importance of variables, inference can be based on the sum of the Akaike weights for each variable, across models that include that variable, and these sums should be reported (Burnham and Anderson 1998:140-141). Avoid the implication that variables not in the selected (estimated best) model are unimportant.

The results should be easy to report if the Methods section outlines convincingly the science hypotheses and associated models of interest. Show a table of the value of the maximized log-likelihood function (log()), the number of estimated parameters (K), the appropriate selection criterion (AIC, AICc, or QAICc), the simple differences (δi), and the Akaike weights (wi) for models in the set (or at least the models with some reasonable level of support, such as where δi < 10). Interpret and report the evidence for the various science hypotheses by ranking the models from best to worst, based on differences (δi), and on the Akaike weights (wi). Provide quantities of interest from the best model or others in the set (e.g., σ², coefficients of determination, estimates of model parameters and their standard errors). Those using the Bayesian Information Criterion (BIC; Schwarz 1978) for model selection should justify the existence of true model in the set of candidate models (Methods section).

Do not include test statistics and P-values when using the information-theoretic approach since this inappropriately mixes differing analysis paradigms. For example, do not use AICc to rank models in the set and then test if the best model is significantly better than the second best model (no such test is valid). Do not imply that the information-theoretic approaches are a test in any sense. Avoid the use of terms such as significant and not significant, or rejected and not rejected; instead view the results in a strength of evidence context (Royal 1997).

If some analysis and modeling were done after the priori effort (often called data dredging), then make sure this procedure is clearly explained when such results are mentioned in the Discussion section. Give estimates of important parameters (e.g., effect size) and measures of precision (preferably a confidence interval).

Previous Section -- Frequentist Methods
Return to Contents
Next Section -- Bayesian Methods

Accessibility FOIA Privacy Policies and Notices

Take Pride in America logo logo U.S. Department of the Interior | U.S. Geological Survey
Page Contact Information: Webmaster
Page Last Modified: Saturday, 02-Feb-2013 06:03:00 EST
Reston, VA [vaww55]