USGS - science for a changing world

Northern Prairie Wildlife Research Center

  Home About NPWRC Our Science Staff Employment Contacts Common Questions About the Site

The Insignificance of Statistical Significance Testing

Why are Hypothesis Tests Used?

With all the deficiencies of statistical hypothesis tests, it is reasonable to wonder why they remain so widely used. Nester (1996) suggested several reasons: (1) they appear to be objective and exact; (2) they are readily available and easily invoked in many commercial statistics packages; (3) everyone else seems to use them; (4) students, statisticians, and scientists are taught to use them; and (5) some journal editors and thesis supervisors demand them. Carver (1978) recognized that statistical significance is generally interpreted as having some relationship to replication, which is the cornerstone of science. More cynically, Carver (1978) suggested that complicated mathematical procedures lend an air of scientific objectivity to conclusions. Shaver (1993) noted that social scientists equate being quantitative with being scientific. D. V. Lindley (quoted in Matthews 1997) observed that "People like conventional hypothesis tests because it's so easy to get significant results from them."

I attribute the heavy use of statistical hypothesis testing, not just in the wildlife field but in other "soft" sciences such as psychology, sociology, and education, to "physics envy." Physicists and other researchers in the "hard" sciences are widely respected for their ability to learn things about the real world (and universe) that are solid and incontrovertible, and also yield results that translate into products that we see daily. Psychologists, for one group, have difficulty developing tests that are able to distinguish two competing theories.

In the hard sciences, hypotheses are tested; that process is an integral component of the hypothetico–deductive scientific method. Under that method, a theory is postulated, which generates several predictions. These predictions are treated as scientific hypotheses, and an experiment is conducted to try to falsify each hypothesis. If the results of the experiment refute the hypothesis, that outcome implies that the theory is incorrect and should be modified or scrapped. If the results do not refute the hypothesis, the theory stands and may gain support, depending on how critical the experiment was.

In contrast, the hypotheses usually tested by wildlife ecologists do not devolve from general theories about how the real world operates. More typically they are statistical hypotheses (i.e., statements about properties of populations; Simberloff 1990). Unlike scientific hypotheses, the truth of which is truly in question, most statistical hypotheses are known a priori to be false. The confusion of the 2 types of hypotheses has been attributed to the pervasive influence of R. A. Fisher, who did not distinguish them (Schmidt and Hunter 1997).

Scientific hypothesis testing dates back at least to the 17th century: in 1620, Francis Bacon discussed the role of proposing alternative explanations and conducting explicit tests to distinguish between them as the most direct route to scientific understanding (Quinn and Dunham 1983). This concept is related to Popperian inference, which seeks to develop and test hypotheses that can clearly be falsified (Popper 1959), because a falsified hypothesis provides greater advance in understanding than does a hypothesis that is supported. Also similar is Platt's (1964) notion of strong inference, which emphasizes developing alternative hypotheses that lead to different predictions. In such a case, results inconsistent with predictions from a hypothesis cast doubt of its validity.

Examples of scientific hypotheses, which were considered credible, include Copernicus' notion HA: the Earth revolves around the sun, versus the conventional wisdom of the time H0: the sun revolves around the Earth. Another example is Fermat's last theorem, which states that for integers n, X, Y, and Z, Xn + Yn = Zn implies n ≤ 2. Alternatively, a physicist may make specific predictions about a parameter based on a theory, and the theory is provisionally accepted only if the outcomes are within measurement error of the predicted value, and no other theories make predictions that also fall within that range (Mulaik et al. 1997). Contrast these hypotheses, which involve phenomena in nature, with the statistical hypotheses presented in The Journal of Wildlife Management, which were mentioned above, and which involve properties of populations.

Rejection of a statistical hypothesis would constitute a piece of evidence to be considered in deciding whether or not to reject a scientific hypothesis (Simberloff 1990). For example, a scientific hypothesis might state that clutch sizes of birds increase with the age of the bird, up to some plateau. That idea would generate a hypothesis that could be tested statistically within a particular population of birds. A single such test, regardless of its P-value, would little affect the credibility of the scientific hypothesis, which is far more general. A related distinction is that scientific hypotheses are global, applying to all of nature, while statistical hypotheses are local, applying to particular systems (Simberloff 1990).

Why do we wildlife ecologists rarely test scientific hypotheses? My view is that we are dealing with systems more complex than those faced by physicists. A saying in ecology is that everything is connected to everything else. (In psychology, "everything correlates with everything," giving rise to what David Lykken called the "crud factor" for such ambient correlation noise [Meehl 1997]). This saying implies that all variables in an ecological system are intercorrelated, and that any null hypothesis postulating no effect of a variable on another will in fact be false; a statistical test of that hypothesis will be rejected, as long as the sample is sufficiently large. This line of reasoning does not denigrate the value of experimentation in real systems; ecologists should seek situations in which variables thought to be influential can be manipulated and the results carefully monitored (Underwood 1997). Too often, however, experimentation in natural systems is very difficult if not impossible.

Previous Section -- What is Statistical Hypothesis Testing?
Return to Contents
Next Section -- Replication

Accessibility FOIA Privacy Policies and Notices

Take Pride in America logo logo U.S. Department of the Interior | U.S. Geological Survey
Page Contact Information: Webmaster
Page Last Modified: Saturday, 02-Feb-2013 06:03:43 EST
Reston, VA [vaww54]