USGS - science for a changing world

Northern Prairie Wildlife Research Center

  Home About NPWRC Our Science Staff Employment Contacts Common Questions About the Site

Statistical Sirens: The Allure of Nonparametrics

Main Text

How often have you read something like, "Our data were not normally distributed, so we used nonparametric methods."? The reasoning is that nonparametric methods require few, if any, assumptions. In a recent Special Feature article in Ecology, Potvin and Roff (1993:1619) made the point explicit: "The main advantage of nonparametric methods over their parametric counterparts is the absence of assumptions regarding the distribution underlying the observations" (emphasis added). Numerous authors have made similar statements, but I focus on the Potvin and Roff article because it was intended as an update for ecologists. My purpose here is to indicate that their characterization is incorrect and the implied advice is misleading.

The situations for which nonparametric methods are commended vary, and include correlation, regression, and more, but I concentrate on the comparison of two means. The usual (parametric) method is the t test, which can be employed either when variances within the two groups are the same (ordinary Student's t) or when they differ (Welch-Satterthwaite modification). The nonparametric counterpart is the Wilcoxon rank sum or the equivalent Mann-Whitney test (WMW test).

Several points bear emphasis. First, data do not need to be normally distributed in order to apply the t test. Only the means need to be, and that property is assured by the Central Limit Theorem, even for relatively small samples, for all but the most perverse data. This is exemplified in Fig. 1, which shows at the upper left a very nonnormal (in fact, a uniform) distribution of original data. Random samples of size N = 2, 4, and 8 demonstrate that the distribution of averages based on even those small sample sizes rapidly approaches normality.

Second, statements are often made about means of distributions differing, based on nonparametric tests such as WMW, although Potvin and Roff did not make this mistake. The WMW test actually tests the hypothesis that the two distributions are identical, not that they have the same mean (e.g., Gibbons 1985). In particular, variances must be the same if the test is to compare means; as Hollander and Wolfe (1973:71) stated, "we assume that the two populations do not differ in dispersion." To compare means, the WMW test requires the assumption that the two distributions are identical in shape and scale, differing only in their means. This assumption can be harder to justify than the asymptotic normality demanded by the t test, and is rarely evaluated (Petranka 1990). A significant test statistic from, say, the WMW procedure indicates that the two distributions differ in some way; it does not suggest how they differ—mean, variance, shape, etc. Petranka (1990) provided an example of two distributions that had identical means and medians: the t test indicated no difference between means, whereas the WMW test was significant. If the distributions have different variances, the Welch-Satterthwaite version of the t test performs well (Wang 1971) and is more valid than the WMW test (Fligner and Policello 1981, Stewart-Oaten 1995).

Figure 1.
FIG. 1.   Uniform distribution of values, and distributions of means based on random samples of size N = 2, 4, and 8 from that population, indicating how means from a non-normal distribution can rapidly approach normality.

Third, although nonparametric methods can be used for estimating parameters, they are better adapted to testing hypotheses and used mostly for that purpose. By their very nature, nonparametric methods do not specify an easily interpreted parameter (Simberloff 1990). And parameter estimates are generally more useful than hypothesis tests. Almost all null hypotheses tested truly are false; the only real question is whether the sample gathered is large enough to make the test statistic significant. For example, does the density of a plant species in one study area differ from that in another? Of course. Densities might be 5000 plants/ha in one area and 4999.9 plants/ha in the other, but that is a real difference, which will be detected (the difference between sample means will become significant) once the samples are large enough. As Yoccoz (1991:106) noted, "most biologists and other users of statistical methods seem still to be unaware that significance testing by itself sheds little light on the questions they are posing." Overemphasis on statistical hypothesis testing may be due to confusing that activity as an "inductive or even descriptive procedure" with the deductive logic involved in hypothesis testing in "strong inference" (Quinn and Dunham 1983).

The emphasis on hypothesis tests raises the issue of biological significance, as contrasted with statistical significance. Biological significance implies importance in some sense. Statistical significance means that the result was unlikely due to chance; if the null hypothesis is true, an improbable event has occurred. Differences of certain magnitudes are said to be not biologically significant, although they were shown to be statistically significant (does that mean that the samples were too large?). And some authors talk of differences that are biologically significant, even though they did not meet usual α criteria (does that mean that the differences are important but perhaps not real?). It is rarely sufficient to know that two parameters differ; estimates of their values are needed for useful application.

More meaningful than a test comparing two means are estimates of the difference between means, along with an assessment of one's confidence in that difference. If ecologists are to be taken seriously by decision makers, they must provide information useful for deciding on a course of action, as opposed to addressing purely academic questions. What Roberts (1990:382) said about business applies equally well to ecology: "[S]ignificance tests are irrelevant to the manager who must make the business decision."

Returning to the plant density example, estimates of the difference between the two areas would approach 0.1 plants/ha, the true value, as sample sizes grow large. The estimated difference, along with a confidence interval for it, can be brought to bear on a decision. Neither the t statistic—the ratio of the estimate to its estimated standard error—nor the significance level of the t value is useful or even interesting.

Nonparametric methods have an important role to play, especially in the analysis of ordinal data. My concern is only that they are too freely adopted for inappropriate purposes. Glass et al. (1972:237) referred to "a largely unnecessary hegira to non-parametric statistics" that took place in education and the social sciences during the 1950s and that ecology now seems in danger of repeating. Parameters are generally of most interest, so we should provide estimates of those parameters that are meaningful and applicable to making real decisions. If the data we have do not meet assumptions underlying the standard techniques, and those assumptions are in fact necessary, then alternatives should be considered, such as transforming the data to better meet the assumptions (Green 1979:43-54, Atkinson and Cox 1988) or using robust parametric methods (Huber 1981, Bickel 1988), which are less sensitive to violations of the assumptions.

If ecologists are careful about randomly sampling the populations about which they want to draw inferences, standard parametric methods will ordinarily be adequate; if they are not, nonparametric methods will not protect them from sailing off course (Box et al. 1978; Stewart-Oaten 1995).

Return to Contents
Next Section -- Acknowledgments

Accessibility FOIA Privacy Policies and Notices

Take Pride in America logo logo U.S. Department of the Interior | U.S. Geological Survey
Page Contact Information: Webmaster
Page Last Modified: Saturday, 02-Feb-2013 06:02:30 EST
Sioux Falls, SD [sdww55]