a LaFrance Consulting Services™ publication
TwoOldGuys™ Study Guides
Plant Ecology Text

# Introduction:ii. Sampling Theory

The first principle of sampling is that it is possible to estimate the values by looking at only some “representative” examples rather than looking at all members of the population. Population is “the entire statistical universe, or all of the values which exist for the phenomenon under investigation.” We can, and usually do, restrict the statistical universe to only those values for the phenomenon under investigation and under the conditions of the investigation. Additional [statistical] definitions are required for this discussion (but are not included in the glossary). “Observed” refers to the sample data; “parametric” refers to the population (or statistical universe). Although we need some statistical concepts to understand sampling, this text will not cover statistics beyond the minimum needed to understand the principles of sampling.
All measured observations are estimates of the underlying parametric value. We can not determine the actual, parametric value because all observations include “sampling error.” The average of several observations (absolute minimum = 5; preferred minimum = 25) is a better estimate of the parametric value than is any one observation.

## observed vs parametric values:

There are statistical methods to determine the nature of the observations. Each set of observations exhibits precision and accuracy. “Precise” is an estimate of how ‘repeatable’ the measurements are. We can repeat the same measurement taken by the same observer, or by different, equally trained observers. The mean (average) value over the repeated measurements is the estimate of the parametric value, and the statistical variance about the mean is the estimate of the precision of the estimate of the parametric value. The variance is esentially the average of the squared deviations, where “deviation” is defined as the observed values minus the expected (from the hypothesis) values. “Accurate” is an estimate of how close the estimates come to the actual, underlying parametric value. The grand mean (average of several averages) is a better estimate of the parametric value than are any of the individual means [a mathematically provable theorem, called the Central Limit Theorem, and the statistical variance of the sample means about the grand mean is the estimate of the accuracy of the individual estimates. A data set is said to be “biased” if the means consistently differ from the parametric value in the same direction [and requires estimates from different observers, or using different measurement techniques]

## area based vs point methods:

An “area based” sampling regime (in Plant Ecology) uses “quadrats” of a defined size (area), such as a square one meter on each side, although any shape (used consistently) and size (also used consistently) can be used. To determine the appropriate size quadrats for the community (or communities) under investigation we prepare a “species area curve.” This involves a series of nested (so the larger quadrats include the smaller ones) quadrats of increasing area, and recording the number of species present in each. The number of species observed plotted against area of quadrats yields a curve which rises and approaches the parametric value), so it rises rapidly at first, then progressively slower with increasing quadrat size. The optimum quadrat size is usually described as ‘on the shoulder’ of the curve (where it has leveled out), or sometimes at an arbitrary percent of the maximum value observed. If the study will cover more than one community, the optimum size quadrats is the largest optimum from all of the communities involved. In actual field studies, many researchers will select an arbitrary quadrat size (based on typical species area curves seen while a graduate student) which they use for the rest of their field career.
Point methods (more common in Forestry or other studies restricted to trees) involve selecting random points, then laying out north-south and east-west lines crossing on the point. The distance to the nearest individual of each species in each of the four quadrants is recorded. The center of the crossed lines may be on the random point, or on the tree nearest to the random point. I, personally, prefer the area-based methods, but the point-based methods work very well for forest trees.
The remaining discussion of sampling will reflect my bias toward area-based samples. When the study can be completed with qualitative data in area-based samples, we typically record “presence” of each species in each quadrat. “Frequency” (of each species) is defined as the number of quadrats in which the species is present divided by the total number of quadrats in the sample, and is considered to be an estimate of the probability of finding at least one individual of the species in the next sample. A group of graduate students (of which I was a member) had lengthy discussions of ways to estimate quantitative “absence,” or the probability of the species being present although it did not appear in the data [needless to say, we were not able to come with any definition acceptable to the professor in charge of our seminar class].
When the study requires quantitative data, we typically record “abundance” (a count of the individuals in the quadrat), “cover” (the percent of the quadrat in the ‘shadow’ of the individuals in the quadrat), “Importance” (a qualitative estimate of the contribution of the individuals of the species to the community [I prefer to avoid calling qualitative estimates ‘quantitative’]), or “biomass” (the dry weight of the harvested individuals [often limited to the above ground parts] from the quadrat).
For animal studies, a question often discussed is inventory versus census data. An “inventory” requires capture and removal of all of the individuals in the [ecological] population. A “census” requires marking each captured animal, releasing them and recording the number of individuals recaptured and of those captured for the first time during each successive sampling effort. There are mathematical algorithms to estimate the total ecological population from these data, which can be found in discussions of “marked release recapture methods.”