Northern Prairie Wildlife Research Center

Sampling Plans and Estimators

North Dakota Game and Fish Department personnel counted pronghorn in the 2 study areas by flying east-west linear strip transects that extended the length of the study area and were 0.8 km apart. Transects were searched 0.4 km on each side of the aircraft. Observers and pilots were experienced in surveys of pronghorn. Transects were 2.4-41.6 km long in the Bowman area and 1.6-64.0 km long in the Slope area. A Piper Super Cub was flown 96-128 km/hour at an altitude of 100-115 m. When the pilot or an observer sighted pronghorn, the aircraft circled the herd so that all pronghorn in the herd could be counted. Where pronghorn detectability might be lower due to heterogeneous habitat, areas were searched thoroughly at an altitude of 25 m. We recorded the number of pronghorn counted in each quarter section (0.65 km²) on field maps.

We used 2 surveys of the Bowman area, 1 in July 1979 and the other in July 1987, in which 201 and 630 pronghorn were seen, respectively, and a single July 1986 survey of the Slope area, in which 350 pronghorn were seen. We believe that counts were virtually exact, because of open terrain, narrow transect width, high visibility of pronghorn, and careful searching methods (Pojar et al. 1995). Nonetheless, because we could not determine visibility bias for the surveys, our results are conditional on observed distribution of pronghorn.

A sampling plan involves defining and selecting the sampling unit, choosing a sample size, and deciding on stratification. In addition, a population estimator must be selected. We selected combinations of sampling plans and estimators on the basis of previous use, suggestions by other researchers, or potential for producing valid estimates.

The sampling unit was a 0.8-km-wide linear transect variable in length (Table
1) according to size and shape of the study area or stratum. We examined 3
methods for selecting sampling units: (1) simple random sampling without replacement
(SRS)(Cochran 1977:18), (2) probability proportional to size with replacement
(PPS), and (3) systematic sampling (SYS). Under SRS, each sampling unit had
an equal chance of being selected. With PPS sampling, the probability of choosing
a sampling unit was proportional to the area of the sampling unit. With SYS,
units were numbered 1 to *M*, where the total number of sampling units
was *M = mp*, *m* was the sample size selected from *M* units,
and *p* was the number of possible systematic samples. The first unit
was randomly chosen from among the first *p* units, and then every *p*
unit following was selected.

Table 1. Size of study areas,
number of sampling units (M), total count (N) of pronghorn,
and variance of N for study areas with and without stratification
in Bowman (1979 and 1987) and Slope counties (1986), North Dakota. |
|||||||

First survey ^{a} |
Second survey ^{b} |
||||||

Study area |
Area (km²) |
M |
Transect lengths (km) |
N |
Variance ^{c} |
N |
Variance |

Bowman |
|||||||

Total | 1,242 | 48 | 2.4-41.6 | 201 | 51.4 | 630 | 373.5 |

Grassland stratum | 486 | 30 | 7.2-27.3 | 185 | 61.4 | 355 | 311.1 |

Mixed stratum | 756 | 48 | 2.4-36.9 | 16 | 2.0 | 275 | 95.8 |

Slope |
|||||||

Total | 2,387 | 62 | 1.6-64.0 | 350 | 62.5 | ||

Grassland stratum | 1,690 | 48 | 18.0-48.9 | 343 | 69.5 | ||

Mixed stratum | 697 | 76 | 1.3-28.9 | 7 | 0.1 | ||

^{a} Jul 1979 for Bowman area,
Jul 1986 for Slope area.^{b} Jul 1987 for Bowman area only.^{c} Population variance, σ² = (n_{i}
- µ)²/M, where n_{i} is the count on
unit i, µ is the population mean, and M is the total
no. of transects. |

We considered 3 levels of sampling intensity: 16, 33, and 50% of the total number of sampling units. Except in the stratified Slope area, the percentage of the area sampled was within 2% of sampling intensity.

We considered stratification and no stratification of study areas. On the basis of 1974 LANDSAT data, we stratified each study area into 2 vegetational types, grassland stratum and mixed stratum, thought to correspond to areas of high and low use, respectively, by pronghorn. Grassland stratum contained extensive grassland; the mixed stratum was composed of cultivated lands, badlands, and a small amount (10-14%) of grassland. We used the same stratification for both years in the Bowman area. The grassland stratum was smaller than the mixed in the Bowman area, but the reverse was true for the Slope area (Table 1).

Depending on the selection method, we evaluated 1-4 estimators of abundance:
simple (Cochran 1977:22-26, 207, 224), probability proportional to size (pps;
note use of lower case to distinguish the estimator from PPS sampling) (Cochran
1977:253-254), separate ratio, and combined ratio estimators (Cochran 1977:150-162).
We used the area of the sampling unit as the auxiliary variable for the pps
and ratio estimators. When the surveyed area was stratified, an abundance
estimate (_{j}) and its
variance were calculated independently in each stratum. Estimated overall
abundance () and its variance were obtained
by summing estimates across strata.

Once a sample size, *m*, had been selected, the number of sampling units
chosen from each stratum could be determined in many ways. Stratum sample
sizes, *m*_{j}, may be allocated in a way that yields
the minimum variance of the estimate, but this optimal allocation depended
on the selection method and estimator used and on unknown population parameters
(Cochran 1977:172). Optimal allocation with SRS using the simple estimator
required that population variance of the count in each stratum be known (Cochran
1977:97-98). We tested an approximation of an optimal allocation:

where , was the estimated proportion
of pronghorn in stratum *j*, and *M*_{j}, was the
total number of sampling units in stratum *j*. This method was optimal
if sampling was SRS with the simple estimator and _{j},
(or equivalently _{j})
was proportional to the population variance of the count in the *j* ^{th}
stratum. The method was similar to that used by Siniff and Skoog (1964) and
places greater sampling intensity where abundance is thought to be greater.
For our evaluations, we asked a biologist familiar with western North Dakota,
but who had not seen the pronghorn data, to estimate the proportion of pronghorn
in each stratum. We used the same allocation method for all combinations of
sampling plans and estimators and were able to compare our calculated sample
sizes with the true optimal sample sizes because we had a known distribution
of counts.

For each of the 3 known population distributions (Bowman area, 1979, 1987; Slope area, 1986), we drew 1,000 random samples of the specified size according to the specified selection method. For example, there were 48 transects in the Bowman area; for a simple random sample of 33% intensity, we randomly drew 16 transects with equal probability and without replacement. For systematic sampling, we drew all possible samples.

We compared combinations of sampling plans and estimators on the basis of
3 criteria: accuracy of the estimator, confidence interval coverage, and cost.
Accuracy of the estimators, , was of
primary importance for estimating abundance, *N*. A useful measurement
of accuracy is the mean square error (MSE), which is the variance of the estimator
plus the squared bias. For all simulations, the percent difference between
MSE and variance was < 1 %, so MSE approximated variance. If variance
was equal to MSE, then there was no bias and accuracy was the same as precision.
We used the CV

as a measure of precision, facilitating comparisons across study areas and years. The smaller the CV, the more precise the estimator. For the simple and pps estimators, we could calculate the exact CV, but for the ratio estimators we used the estimated CV

where r was the number of repetitions of the simulation, and
was the estimated variance of the population estimate for the *i* ^{th}
simulation.

The coverage of usual 95% confidence intervals was an important criterion to consider. For each simulation, we constructed nominal 95% confidence intervals:

where t was the 0.975 percentile of Student's *t* distribution with
*m* - 1 df with no stratification and *m*_{1} + *m*_{2}
- 2 df with stratification. For each combination of sampling plan and estimator,
we calculated the confidence interval coverage as the percentage of confidence
intervals containing *N*.

For simplicity, we calculated cost for each simulated survey as the sum of the lengths of the transects and the travel distances between transects. These costs were averaged across simulations under a particular sampling plan to get the cost for that plan.

The large number of simulations we used ensured repeatability of results. To measure the performance of simulations, we calculated the CV of estimates of CV, coverage, and cost for a number of sampling plans and estimators. We did not perform significance tests because all comparisons would have been significant (P < 0.001) due to the large number of simulations.

Previous Section -- Study Areas

Return to Contents

Next Section -- Results