The size of a population can be estimated by capturing individuals from the population, marking those individuals and returning them to the population, extracting another sample from the population, and determining the fraction of individuals in the second sample that had the mark from the first sample. This idea was first used to estimate the population of humans in London in 1662 (Krebs 1999), but was popularized in fisheries by C.G.J. Petersen in 1896 (Ricker 1975). The method was used by Lincoln (1930) for estimating duck populations and by Jackson (1933) for estimating tsetse fly populations. Thus, this method is variously referred to as the Petersen method, the Lincoln-Petersen method, or the Lincoln-Petersen-Jackson method. “Petersen method” will be used here

The Petersen method, which relies on only one sample of potentially marked fish, can be extended to a series of samples. These extended methods, called the Schnabel and Schumacher-Eschmeyer methods, require at least one sample where fish are marked and then multiple samples of potentially marked fish. However, these methods are most often used when the unmarked fish in the subsequent samples are marked and returned, along with the previously marked fish, to the population. Thus, these methods can be used with multiple marking and multiple “recapturing” samples.

Single Census Mark-Recapture Methods

Petersen Method

The Petersen method consists of two samples from a closed population and is, thus, the simplest of the broad array of mark-recapture techniques for estimating animal abundance. The population and sampling scheme for the two samples (\(i=1,2\)) of the Petersen method are represented in Figure 1. The large squares just to the left of the “i=” vertical lines represent the population just before the \(i\)th sample is taken. The large square just after the “i=1” vertical line represents the population just after the first sample was taken. Thus, under the assumption that the population is closed, the population immediately after the first sample is the same as the population immediately before the second sample. The samples are represented by the small grey boxes just above each “i=” vertical line.

Figure 1: Schematic representation of the two samples (\(i=1,2\)) in a Petersen mark-recapture study. See the text for detailed description and Table 2 for definitions of notation.

The population initially consists of all unmarked animals. All fish in the first sample are then marked and returned to the population. The first sample does not necessarily need to be a random sample of the population although the assumptions for the second sample are more likely to be met if it is. In addition, the fish can receive a batch mark, i.e., each fish does not need to be uniquely identified.¹ The second sample must be a random sample from the entire population of marked and unmarked individuals such that each fish, whether marked or unmarked, has the same chance of being captured. This also implies that the marked fish mix randomly with the unmarked fish in the population (Figure 1). Each fish in this second sample is examined to determine if it has the mark from the first sample or not.

In capture history format a 0 is used to show that a fish was not captured during that sampling event and a 1 is used if the fish was captured in that event. For example, \(n_{11}\) is the number of fish captured in both sampling events whereas \(n_{01}\) is the number of fish captured in the second sample event that were NOT captured in the first sampling event. In addition, \(n_{1\cdot}\) is the number of fish caught in the first sampling event, regardless of whether they were or were not caught in the second sampling event. In other words, the \(\cdot\) is a place holder for either a 0 or a 1. With this notation the summary data for a Petersen estimate can be shown in the format of capture histories (Table 1).

Table 1: Summary data matrix for the two samples in a Petersen mark-recapture framework. Note that a zero indicates the fish were not captured in that sample and a one indicates that the fish were captured in that sample.

The more traditional notation of of a Petersen estimate is to replace the following capture history symbols with \(N=n_{\cdot\cdot}\), \(M=n_{1\cdot}\), \(n=n_{\cdot 1}\), and \(m=n_{11}\). For ease of comparison to other sources these symbols will be used throughout these notes.² Remember that the capitalized symbols refer to the population (i.e., \(N\) is number of fish in the population, \(M\) is the number of marked fish in the population) whereas the lower-case letters refer to the second sample (i.e., \(n\) is the number of fish captured in the second sample, \(m\) is the number of marked fish captured in the second sample). The meaning of the symbols used in the Petersen method are shown in Table 2.

Table 2: Summary of notation used in the Petersen method.

Symbol	Meaning
\(N\)	The unknown size of the population just prior to the first sample.
\(M\)	The number of fish from the first sample that were marked and returned to the population.
\(n\)	The number of fish in the second sample.
\(m\)	The number of marked fish in the second sample.
\(\widehat{N}\)	The estimated size of the population just prior to the first sample.

The Petersen estimate of abundance can be derived from an assumption that if the second sample is a random sample of the population of marked and unmarked animals, then the proportion of marked animals in the second sample should equal the proportion of marked animals in the population, i.e.,

\[ \frac{m}{n} = \frac{M}{N} \]

A rearrangement of this equality gives an estimate of the size of the population, i.e.,

\[ \hat{N} = \frac{Mn}{m} \quad \quad \text{(1)} \]

Thus, the total population size can be estimated from the number of marked animals, the number of animals observed in the second sample, and the number of animals in the second sample that had the mark.

Approximate confidence intervals for \(N\) are often used, with the specific form of the approximation depending on characteristics of the number of marked and recaptured fish.³ Seber (1982) suggested the following sequential “rules” (i.e., stop at the step where you answer “yes”) for choosing the method for approximating the confidence interval for \(N\) in the Petersen method:

Is the fraction of marked fish in the second sample “large” (i.e., \(\frac{m}{n}>0.10\))? – Use the binomial approximation?
Is the number of marked fish in the second sample “large” (i.e., \(m>50\))? – Use the normal approximation?
Use the Poisson approximation.

Both the binomial and normal approximation methods identify a confidence interval for the ratio of marked fish in the second sample (i.e., \(\frac{m}{n}\)) and then the endpoints of these intervals are substituted into the modified Petersen equation,

\[ \hat{N} = \frac{Mn}{m} = \frac{M}{\frac{m}{n}} \]

to derive endpoints of the confidence interval for \(N\). The binomial approximation method constructs the interval for \(\frac{m}{n}\) with computer algorithms of the binomial distribution. The normal approximation is considered a large-sample method where the standard error for \(\frac{m}{n}\) is estimated with

\[ SE_{\frac{m}{n}} = \sqrt{\left(1-\frac{m}{M}\right)\frac{\frac{m}{n}\left(1-\frac{m}{n}\right)}{n-1}} + \frac{1}{2n} \]

The confidence interval is then constructed in the usual way with \(\frac{m}{n}\pm Z^{*}SE_{\frac{m}{n}}\).

The Poisson approximation operates similarly, except that it uses computer algorithms of the Poisson distribution to construct a confidence interval for \(m\), the endpoints of which are then substituted back into Equation 1 to derive the confidence interval for \(N\).

Modifications of the Petersen Method

The Petersen estimator of \(N\) is, unfortunately, biased, especially for small samples. Chapman (1951) showed that when \(M+n \geq N\) that

\[ \hat{N} = \frac{(M+1)(n+1)}{(m+1)}-1 \quad \quad \text{(2)} \]

is an exactly unbiased estimator of \(N\).⁴ If \(M+n < N\), then Robson and Regier (1964) showed that the bias of Equation 2 is less than 2% if \(\frac{Mn}{N}>4\). Unfortunately, \(N\) is usually unknown. However, Robson and Regier (1964) note that if \(m\geq7\) then there is a 95% chance that \(\frac{Mn}{N}>4\) and the bias of Equation 2 can be considered negligible (Seber 1982). Thus, a given study should be designed so that enough fish are marked and the second sample is large enough to ensure that more than seven marked fish are recaptured.

A final modification was proposed by Bailey (1951) and Bailey (1952). Bailey’s method is appropriate if the second sample of fish is collected with replacement (i.e., an individual may be counted more than once). This type of sampling happens most often if the tagged fish are simply observed rather than captured. Bailey showed that Equation 1 is biased under these conditions and that his modified estimator,

\[ \hat{N} = \frac{(M)(n+1)}{(m+1)} \quad \quad \text{(3)} \]

is nearly unbiased if \(m\geq7\).

Confidence intervals for \(N\) from these modified estimators use techniques similar to those described for the Petersen method. However, the binomial and normal confidence intervals for the ratio \(\frac{m}{n}\) must be converted to a confidence interval for \(m\) by multiplying the endpoints by \(n\). The endpoints of the confidence interval for \(m\) are then substituted into Equation 2 or Equation 3 to obtain confidence intervals for \(N\).

Reproducibility Information

Compiled Date: Mon Jan 17 2022
Compiled Time: 10:18:16 AM
R Version: R version 4.1.2 (2021-11-01)
System: Windows, x86_64-w64-mingw32/x64 (64-bit)
Base Packages: base, datasets, graphics, grDevices, methods, stats, utils
Required Packages: FSA, captioner, knitr and their dependencies (car, dplyr, dunn.test, evaluate, graphics, grDevices, highr, lmtest, methods, plotrix, sciplot, stats, stringr, tools, utils, withr, xfun, yaml)
Other Packages: captioner_2.2.3.9000, FSA_0.9.1.9000, knitr_1.36
Loaded-Only Packages: bslib_0.3.1, compiler_4.1.2, digest_0.6.28, evaluate_0.14, fastmap_1.1.0, htmltools_0.5.2, jquerylib_0.1.4, jsonlite_1.7.2, magrittr_2.0.1, R6_2.5.1, rlang_0.4.12, rmarkdown_2.11, sass_0.4.0, stringi_1.7.5, stringr_1.4.0, tools_4.1.2, xfun_0.28, yaml_2.2.1
Links: Script / RMarkdown

References

Bailey, N. T. J. 1951. On estimating the size of mobile populations from capture-recapture data. Biometrika 38:293–306.

Bailey, N. T. J. 1952. Improvements in the interpretation of recapture data. Journal of Animal Ecology 21:120–127.

Chapman, D. G. 1951. Some properties of the hypergeometric distribution with applications to zoological censuses. University of California Publications on Statistics 1:131–160.

Jackson, C. H. N. 1933. On the true density of tsetse flies. Journal of Animal Ecology 2:204–209.

Krebs, C. J. 1999. Ecological methodologySecond. Addison-Welsey Educational Publishing, Menlo Park, CA.

Lincoln, F. C. 1930. Calculating waterfowl abundance on the basis of banding returns. U.S. Department of Agriculture Circular 118:1–4.

Ricker, W. E. 1975. Computation and interpretation of biological statistics of fish populations. Bulletin of the Fisheries Research Board of Canada.

Robson, D. S., and H. A. Regier. 1964. Sample size in Petersen mark-recapture experiments. Transactions of the American Fisheries Society 93:215–226.

Seber, G. A. F. 1982. The estimation of animal abundanceSecond. Edward Arnold.

However, it is generally beneficial to uniquely tag each fish.↩︎
The most traditional notation uses \(C=n_{\cdot 1}\), and \(R=n_{11}\). We do not use this notation in order to maintain continuity with subsequent mark-recapture methods.↩︎
Exact confidence intervals for \(N\) can generally be derived from the hypergeometric distribution underlying the likelihood function. Unfortunately, there is no simple cumulative distribution for the hypergeometric distribution.↩︎
Ricker (1975) modified the Chapman estimator by ignoring the “-1” at the end of the right-hand-side of Equation 2. His argument was that subtracting one is of no practical importance in the estimate. While this argument is understandable, subtracting one to get the exact result proposed by Chapman is not onerous. Therefore, I suggest using the full estimator as proposed by Chapman (1951).↩︎