Northern Prairie Wildlife Research Center

How often have you read something like, "Our data were not normally distributed, so we used nonparametric methods."? The reasoning is that nonparametric methods require few, if any, assumptions. In a recent Special Feature article in

The situations for which nonparametric methods are commended vary, and include
correlation, regression, and more, but I concentrate on the comparison of
two means. The usual (parametric) method is the *t* test, which can be
employed either when variances within the two groups are the same (ordinary
Student's *t*) or when they differ (Welch-Satterthwaite modification).
The nonparametric counterpart is the Wilcoxon rank sum or the equivalent Mann-Whitney
test (WMW test).

Several points bear emphasis. First, data do not need to be normally distributed
in order to apply the *t* test. Only the means need to be, and that property
is assured by the Central Limit Theorem, even for relatively small samples,
for all but the most perverse data. This is exemplified in Fig. 1, which shows
at the upper left a very nonnormal (in fact, a uniform) distribution of original
data. Random samples of size *N* = 2, 4, and 8 demonstrate that the distribution
of averages based on even those small sample sizes rapidly approaches normality.

Second, statements are often made about means of distributions differing,
based on nonparametric tests such as WMW, although Potvin and Roff did not
make this mistake. The WMW test actually tests the hypothesis that the two
distributions are *identical*, not that they have the same mean (e.g.,
Gibbons 1985). In particular, variances must be the same if the test is to
compare means; as Hollander and Wolfe (1973:71) stated, "we assume that the
two populations do not differ in dispersion." To compare means, the WMW test
requires the assumption that the two distributions are identical in shape
and scale, differing only in their means. This assumption can be harder to
justify than the asymptotic normality demanded by the *t* test, and is
rarely evaluated (Petranka 1990). A significant test statistic from, say,
the WMW procedure indicates that the two distributions differ in some way;
it does not suggest how they differ—mean, variance, shape, etc. Petranka
(1990) provided an example of two distributions that had identical means and
medians: the *t* test indicated no difference between means, whereas
the WMW test was significant. If the distributions have different variances,
the Welch-Satterthwaite version of the *t* test performs well (Wang 1971)
and is more valid than the WMW test (Fligner and Policello 1981, Stewart-Oaten
1995).

FIG. 1. Uniform distribution of values, and distributions
of means based on random samples of size N = 2, 4, and 8 from
that population, indicating how means from a non-normal distribution
can rapidly approach normality. |

Third, although nonparametric methods can be used for estimating parameters, they are better adapted to testing hypotheses and used mostly for that purpose. By their very nature, nonparametric methods do not specify an easily interpreted parameter (Simberloff 1990). And parameter estimates are generally more useful than hypothesis tests. Almost all null hypotheses tested truly are false; the only real question is whether the sample gathered is large enough to make the test statistic significant. For example, does the density of a plant species in one study area differ from that in another? Of course. Densities might be 5000 plants/ha in one area and 4999.9 plants/ha in the other, but that is a real difference, which will be detected (the difference between sample means will become significant) once the samples are large enough. As Yoccoz (1991:106) noted, "most biologists and other users of statistical methods seem still to be unaware that significance testing by itself sheds little light on the questions they are posing." Overemphasis on statistical hypothesis testing may be due to confusing that activity as an "inductive or even descriptive procedure" with the deductive logic involved in hypothesis testing in "strong inference" (Quinn and Dunham 1983).

The emphasis on hypothesis tests raises the issue of biological significance, as contrasted with statistical significance. Biological significance implies importance in some sense. Statistical significance means that the result was unlikely due to chance; if the null hypothesis is true, an improbable event has occurred. Differences of certain magnitudes are said to be not biologically significant, although they were shown to be statistically significant (does that mean that the samples were too large?). And some authors talk of differences that are biologically significant, even though they did not meet usual α criteria (does that mean that the differences are important but perhaps not real?). It is rarely sufficient to know that two parameters differ; estimates of their values are needed for useful application.

More meaningful than a test comparing two means are estimates of the difference between means, along with an assessment of one's confidence in that difference. If ecologists are to be taken seriously by decision makers, they must provide information useful for deciding on a course of action, as opposed to addressing purely academic questions. What Roberts (1990:382) said about business applies equally well to ecology: "[S]ignificance tests are irrelevant to the manager who must make the business decision."

Returning to the plant density example, estimates of the difference between
the two areas would approach 0.1 plants/ha, the true value, as sample sizes
grow large. The estimated difference, along with a confidence interval for
it, can be brought to bear on a decision. Neither the *t* statistic—the
ratio of the estimate to its estimated standard error—nor the significance
level of the *t* value is useful or even interesting.

Nonparametric methods have an important role to play, especially in the analysis of ordinal data. My concern is only that they are too freely adopted for inappropriate purposes. Glass et al. (1972:237) referred to "a largely unnecessary hegira to non-parametric statistics" that took place in education and the social sciences during the 1950s and that ecology now seems in danger of repeating. Parameters are generally of most interest, so we should provide estimates of those parameters that are meaningful and applicable to making real decisions. If the data we have do not meet assumptions underlying the standard techniques, and those assumptions are in fact necessary, then alternatives should be considered, such as transforming the data to better meet the assumptions (Green 1979:43-54, Atkinson and Cox 1988) or using robust parametric methods (Huber 1981, Bickel 1988), which are less sensitive to violations of the assumptions.

If ecologists are careful about randomly sampling the populations about which they want to draw inferences, standard parametric methods will ordinarily be adequate; if they are not, nonparametric methods will not protect them from sailing off course (Box et al. 1978; Stewart-Oaten 1995).

Return to Contents

Next Section -- Acknowledgments