Northern Prairie Wildlife Research Center

What should we do instead of testing hypotheses? As Quinn and Dunham (1983) pointed out, it is more fruitful to determine the relative importance to the contributions of, and interactions between, a number of processes. For this purpose, estimation is far more appropriate than hypothesis testing (Campbell 1992). For certain other situations, decision theory is an appropriate tool. For either of these applications, as well as for hypothesis testing itself, the Bayesian approach offers some distinct advantages over the traditional methods. These alternatives are briefly outlined below. Although the alternatives will not meet all potential needs, they do offer attractive choices in many frequently encountered situations.

Four decades ago, Anscombe (1956) observed that statistical hypothesis tests
were totally irrelevant, and that what was needed were estimates of magnitudes
of effects, with standard errors. Yates (1964) indicated that "The most commonly
occurring weakness in the application of Fisherian methods is undue emphasis
on tests of significance, and failure to recognize that in many types of experimental
work estimates of the treatment effects, together with estimates of the errors
to which they are subject, are the quantities of primary interest." Further,
because wildlife ecologists want to influence management practices, Johnson
(1995) noted that, "If ecologists are to be taken seriously by decision makers,
they must provide information useful for deciding on a course of action, as
opposed to addressing purely academic questions." To enforce that point, several
education and psychological journals have adopted editorial policies requiring
that parameter estimates accompany any *P*-values presented (McLean and
Ernest 1998).

Ordinary confidence intervals provide more information than do *P*-values.
Knowing that a 95% confidence interval includes zero tells one that, if a
test of the hypothesis that the parameter equals zero is conducted, the resulting
*P*-value will be greater than 0.05. A confidence interval provides both
an estimate of the effect size and a measure of its uncertainty. A 95% confidence
interval of, say, (-50, 300) suggests the parameter is less well estimated
than would a confidence interval of (120, 130). Perhaps surprisingly, confidence
intervals have a longer history than statistical hypothesis tests (Schmidt
and Hunter 1997).

With its advantages and longer history, why have confidence intervals not been used more than they have? Steiger and Fouladi (1997) and Reichardt and Gollob (1997) posited several explanations: (1) hypothesis testing has become a tradition; (2) the advantages of confidence intervals are not recognized; (3) there is some ignorance of the procedures available; (4) major statistical packages do not include many confidence interval estimates; (5) sizes of parameter estimates are often disappointingly small even though they may be very significantly different from zero; (6) the wide confidence intervals that often result from a study are embarrassing; (7) some hypothesis tests (e.g., chi square contingency table) have no uniquely defined parameter associated with them; and (8) recommendations to use confidence intervals often are accompanied by recommendations to abandon statistical tests altogether, which is unwelcome advice. These reasons are not valid excuses for avoiding confidence intervals in lieu of hypothesis tests in situations for which parameter estimation is the objective.

Often experiments or surveys are conducted in order to help make some decision, such as what limits to set on hunting seasons, if a forest stand should be logged, or if a pesticide should be approved. In those cases, hypothesis testing is inadequate, for it does not take into consideration the costs of alternative actions. Here a useful tool is statistical decision theory: the theory of acting rationally with respect to anticipated gains and losses, in the face of uncertainty. Hypothesis testing generally limits the probability of a Type I error (rejecting a true null hypothesis), often arbitrarily set at α= 0.05, while letting the probability of a Type II error (accepting a false null hypothesis) fall where it may. In ecological situations, however, a Type II error may be far more costly than a Type I error (Toft and Shea 1983). As an example, approving a pesticide that reduces the survival rate of an endangered species by 5% may be disastrous to that species, even if that change is not statistically detectable. As another, continued overharvest in marine fisheries may result in the collapse of the ecosystem even while statistical tests are unable to reject the null hypothesis that fishing has no effect (Dayton 1998). Details on decision theory can be found in DeGroot (1970), Berger (1985), and Pratt et al. (1995).

Statistical tests can play a useful role in diagnostic checks and evaluations of tentative statistical models (Box 1980). But even for this application, competing tools are superior. Information criteria, such as Akaike's, provide objective measures for selecting among different models fitted to a data set. Burnham and Anderson (1998) provided a detailed overview of model selection procedures based on information criteria. In addition, for many applications it is not advisable to select a "best" model and then proceed as if that model was correct. There may be a group of models entertained, and the data will provide different strength of evidence for each model. Rather than basing decisions or conclusions on the single model most strongly supported by the data, one should acknowledge the uncertainty about the model by considering the entire set of models, each perhaps weighted by its own strength of evidence (Buckland et al. 1997).

Bayesian approaches offer some alternatives preferable to the ordinary (often called frequentist, because they invoke the idea of the long-term frequency of outcomes in imagined repeats of experiments or samples) methods for hypothesis testing as well as for estimation and decision-making. Space limitations preclude a detailed review of the approach here; see Box and Tiao (1973), Berger (1985), and Carlin and Louis (1996) for longer expositions, and Schmitt (1969) for an elementary introduction.

Sometimes the value of a parameter is predicted from theory, and it is more
reasonable to test whether or not that value is consistent with the observed
data than to calculate a confidence interval (Berger and Delampady 1987, Zellner
1987). For testing such hypotheses, what is usually desired (and what is sometimes
believed to be provided by a statistical hypothesis test) is Pr[H_{0}
| data]. What is obtained, as pointed out earlier, is *P* = Pr[observed
or more extreme data | H_{0}]. Bayes' theorem offers a formula for
converting between them.

This is an old (Bayes 1763) and well-known theorem in probability. Its use
in the present situation does not follow from the frequentist view of statistics,
which considers Pr[H_{0}] as unknown, but either zero or 1. In the
Bayesian approach, Pr[H_{0}] is determined before data are gathered;
it is therefore called the prior probability of H_{0}. Pr[H_{0}]
can be determined either subjectively (what is your prior belief about the
truth of the null hypothesis?) or by a variety of objective means (e.g., Box
and Tiao 1973, Carlin and Louis 1996). The use of subjective probabilities
is a major reason that Bayesian approaches fell out of favor: science must
be objective! (The other main reason is that Bayesian calculations tend to
get fairly heavy, but modern computer capabilities can largely overcome this
obstacle.)

Briefly consider parameter estimation. Suppose you want to estimate a parameter
θ. Then replacing H_{0}
by θ in the above formula yields

which provides an expression that shows how initial knowledge about the value of a parameter, reflected in the prior probability function Pr[θ], is modified by data obtained from a study, Pr[data | θ], to yield a final probability function, Pr[θ | data]. This process of updating beliefs leads in a natural way to adaptive resource management (Holling 1978, Walters 1986), a recent favorite topic in our field (e.g., Walters and Green 1997).

Bayesian confidence intervals are much more natural than their frequentist
counterparts. A frequentist 95% confidence interval for a parameter θ, denoted (θ_{L},
θ_{U}), is interpreted
as follows: if the study were repeated an infinite number of times, 95% of
the confidence intervals that resulted would contain the true value θ. It says nothing about the particular study that was actually conducted,
which led Howson and Urbach (1991:373) to comment that "statisticians regularly
say that one can be '95 per cent confident' that the parameter lies in the
confidence interval. They never say why." In contrast, a Bayesian confidence
interval, sometimes called a credible interval, is interpreted to mean that
the probability that the true value of the parameter lies in the interval
is 95%. That statement is much more natural, and is what people think a confidence
interval is, until they get the notion drummed out of their heads in statistics
courses.

For decision analysis, Bayes' theorem offers a very logical way to make decisions in the face of uncertainty. It allows for incorporating beliefs, data, and the gains or losses expected from possible consequences of decisions. See Wolfson et al. (1996) and Ellison (1996) for recent overviews of Bayesian methods with an ecological orientation.

Previous Section -- Replication

Return to Contents

Next Section -- Conclusions