library(FSA) # for peek(), headtail(), sumtable()
library(dplyr) # for filter(), select(), mutate(), arrange(), group_by(), summarize()
Backcalculation of Previous Lengths
Francis (1990) defined back-calculation as “… a technique that uses a set of measurements made on a fish at one time to infer its length at an earlier time or times. Specifically, the dimensions of one or more marks in some hard part of the fish, together with its current body length, are used to estimate its length at the time of formation of each of the marks. … The marks are generally annual rings associated with growth checks, … .” Thus, backcalculation is the reconstruction of the length of a fish at previous ages from measurements made on calcified structures.
Setup
Packages
Functions used in this supplement require the following packages.
Also note that single functions from the tidyr
and stringr
packages are used with ::
so that the whole packages are not loaded here.
Data
All analyses in this supplement use the West Bearskin Lake (MN) Smallmouth Bass (Micropterus dolomieu) data from Weisberg (1993) found in SMBassWB.1 The analysis in this supplement will focus on those fish captured in 1990. Three variables that were constant and not used in this analysis and three measurement variables that corresponded to ages that did not exist after reducing to only fish captured in 1990 were removed to save space.
1 Download data with CSV link on the linked page.
<- read.csv("https://raw.githubusercontent.com/fishR-Core-Team/FSA/main/data-raw/SMBassWB.csv") SMBassWB
<- SMBassWB |>
wb90 filter(yearcap==1990) |>
select(-(species:gear),-(anu10:anu12))
peek(wb90,n=5)
#R| yearcap fish agecap lencap anu1 anu2 anu3 anu4 anu5 anu6
#R| 1 1990 482 1 75 1.51076 NA NA NA NA NA
#R| 45 1990 431 3 130 1.01867 2.04110 3.22882 NA NA NA
#R| 90 1990 453 3 165 1.63068 2.96180 3.96131 NA NA NA
#R| 136 1990 382 7 225 1.33285 2.42372 3.04270 3.79372 4.86226 5.74919
#R| 181 1990 389 9 329 1.05913 2.18769 3.55137 4.40766 5.78634 7.58178
#R| anu7 anu8 anu9 radcap
#R| 1 NA NA NA 1.51076
#R| 45 NA NA NA 3.22882
#R| 90 NA NA NA 3.96131
#R| 136 6.47536 NA NA 6.47536
#R| 181 8.32094 9.46362 10.43491 10.43491
Terminology
Two types of measurements can be made on calcified structures. A radial measurement is the total distance from the center of the structure (e.g., focus of scale or nucleus of otolith) to the edge of an annulus. An incremental measurement is the distance between two successive annuli. Radial measurements are required for back-calculation of fish length.
Back-calculation models estimate length at previous age \(i\) (i.e., \(L_{i}\)) from known values of length at time of capture (\(L_{C}\)), scale radius to the \(i\)th annulus (\(S_{i}\)), and scale radius at time of capture (\(S_{C}\)). Several back-calculation models rely on the relationship between \(S_{C}\) and \(L_{C}\). Depending on the model, a function of mean \(S_{C}\) for a given \(L_{C}\) (i.e., \(E(S_{C}|L_{C})\) ) or mean \(L_{C}\) for a given \(S_{C}\) (i.e., \(E(L_{C}|S_{C})\)) is used. These functions are not required to be linear, but often are, and in their linear form are represented as
\[ E(S_{C}|L_{C})=a+bL_{C} \tag{1}\]
\[ E(L_{C}|S_{C})=c+dS_{C} \tag{2}\]
Fitting these regressions is demonstrated below.
Common Back-Calculation Models
The first back-calculation model was jointly developed by Knut Dahl and Einar Lea and appeared in Lea (1910). The underlying concept of the Dahl-Lea model is that growth of the calcified structure is in exact proportion to growth in length of the fish. With this, the ratio of \(S_{i}\) to \(S_{C}\) is the same as the ratio of \(L_{i}\) to \(L_{C}\). Rearrangement of this equality yields the Dahl-Lea back-calculation model
\[ L_{i}=\frac{S_{i}}{S_{C}}L_{C} \tag{3}\]
The Dahl-Lea model describes a family of straight lines that pass through the origin and each observed (\(S_{C},L_{C}\)) point. Visually (Figure 1), the estimated \(L_{i}\) for a particular fish is found by locating \(S_{i}\) along the x-axis, moving vertically until the straight line for that fish is met, and then moving horizontally to the point on the y-axis.
Fraser (1916) was the first to describe, but Lee (1920) was the first to formally derive, the back-calculation model from the concept that “the growth increment of the scale is, on the average …, a constant proportion of the growth increment of the fish” (Francis 1990). In practice, the Fraser-Lee model modified the Dahl-Lea model by adjusting for the length of the fish when the calcified structure forms (i.e., \(L=c\) when \(S=0\)), that is,
\[ L_{i}=\frac{S_{i}}{S_{C}}(L_{C}-c)+c \tag{4}\]
where \(c\) comes from the length of the fish at the time of scale formation, the intercept of the length-scale relationship regression (e.g., from Equation 2), or from published “standards” for a species (Carlander 1982). The Fraser-Lee model describes a family of lines with an intercept of \(c\) that pass through the (\(S_{C},L_{C}\)) point (Francis (1990); Figure 1).
The scale proportional hypothesis (SPH) model was named by Francis (1990), but was first recognized by Whitney and Carlander (1956) when they said “{i}f the scale was 10 per cent larger when the fish was caught than the average scale for that size of fish, [then] the scale would be 10 per cent larger than normal throughout the life.” If “average” and “normal” are considered to be expected values, then this hypothesis can be written as
\[ \frac{S_{i}}{E[S|L_{i}]}=\frac{S_{C}}{E[S|L_{C}]} \]
If it is assumed that the scale-length relationship is linear, then the two expected values in these ratios are computed by plugging \(L_{i}\) and \(L_{C}\), respectively, into the scale-length relationship (i.e., Equation 1) to produce
\[ \frac{S_{i}}{a+bL_{i}}=\frac{S_{C}}{a+bL_{C}} \]
which can be solved for \(L_{i}\) to yield the general SPH back-calculation model
\[ L_{i} = \frac{S_{i}}{S_{C}}\left(L_{C}+\frac{a}{b}\right) - \frac{a}{b} \tag{5}\]
The linear SPH model produces a family of lines that all have an intercept of \(-\frac{a}{b}\) and pass through each observed (\(S_{C},L_{C}\)) point (Figure 1). The SPH model is the same as the Fraser-Lee model except that the intercept from Equation 2 is replaced with \(-\frac{a}{b}\). Further note that the SPH model is the same as the Dahl-Lea model if \(a=0\).
The body proportional hypothesis (BPH) model was also named by Francis (1990) and was also first recognized by Whitney and Carlander (1956) when they said “{i}f a fish at time of capture were 10 per cent smaller than the average fish with that size of scale, [then] the fish would be 10 per cent smaller than the expected length for the size of that scale throughout life.” This hypothesis can be written as
\[ \frac{L_{i}}{E[L|S_{i}]}=\frac{L_{C}}{E[L|S_{C}]} \]
If the length-scale relationship is linear then the expected values can be found by plugging \(S_{i}\) and \(S_{c}\) into Equation 2 to get
\[ \frac{L_{i}}{c+dS_{i}}=\frac{L_{C}}{c+dS_{C}} \]
which can be solved for \(L_{i}\) to yield the general BPH back-calculation model
\[ L_{i}=L_{C}\frac{c+dS_{i}}{c+dS_{c}} \tag{6}\]
The linear BPH model produces a family of lines that have an intercept of \(\frac{cL_{C}}{c+dS_{C}}\) and pass through each observed (\(S_{C},L_{C}\)) point (Figure 1). In contrast to the other back-calculation models, the BPH model uses lines with a different intercept for each fish. The linear BPH model is the same as the Dahl-Lea model if \(c=0\).
Vigliola and Meekan (2009) described 18 other models for the back-calculation of fish length.2
2 Functions for each of these models can be created with bcFuns()
from RFishBC
.
Data Organization & Manipulation
In wide or one-fish-per-line format, all information about a single fish, including all of the measurements from the calcified structure, is in one row of the data frame. wb90
contains radial measurements in the wide format.3
3 See the data frame snippet above.
For back-calculation of fish length, these data must be radial, and not incremental, measurements. If wb90
had contained incremental measurements, then it could be converted to radial measurements with gConvert()
from RFishBC
. The gConvert()
function requires the data frame with the incremental measurements as the first argument, the prefix (in quotes) for the columns that contain the incremental measurements in in.pre=
, and the type of measurement to convert to in out.type=
(the options are "rad"
(the default) or "inc"
). For example, the code below would create a new data frame from wb90
with radial measurements (IF wb90
had incremental measurements).
# NOT RUN because wb90 is already radial measurements
<- RFishBC::gConvert(wb90,in.pre="anu",out.type="rad") wb90A
For efficient back-calculation, the data must also be converted to long or one-measurement-per-line format. Wide data may be converted to long data with pivot_longer()
from tidyr
.4 For this use, the variables that contain the radial measurement data are given to cols=
. The names for these variables will form a new column in the resulting data frame with the name given in names_to=
. The values for these variables will be in a new column with the name given in values_to=
.
4 Ogle (2016) used gather()
from tidyr
for this same purpose. The maintainers of tidyr
suggest using pivot_longer()
rather than gather
.
<- wb90 |>
wb90r ::pivot_longer(cols=anu1:anu9,names_to="agei",values_to="radi") |>
tidyrarrange(fish,agei)
headtail(wb90r)
#R| yearcap fish agecap lencap radcap agei radi
#R| 1 1990 374 7 243 7.57375 anu1 1.01672
#R| 2 1990 374 7 243 7.57375 anu2 2.39296
#R| 3 1990 374 7 243 7.57375 anu3 3.47635
#R| 1627 1990 998 7 298 8.54805 anu7 8.54805
#R| 1628 1990 998 7 298 8.54805 anu8 NA
#R| 1629 1990 998 7 298 8.54805 anu9 NA
As noted in Ogle (2016), there are three potential problems with this result. First, the new agei
variable contains the names of the variables from the original wide format data frame (e.g., anu1
, anu2
) rather than numbers that correspond to the age that the annulus was formed. Converting these labels to numbers begins by finding the portion of the string after the “anu” prefix with str_sub()
with the vector of names as the first argument and start=4
(because “anu” is three characters long). The result from str_sub()
, however, is a character that must then be converted to a numeric with as.numeric()
.
The second problem is that several of the radial measurements contain NA
values. The non-NA
values are found and retained by using !is.na()
within filter()
.
The third problem, while not an issue with these particular data, is that “plus growth” may have been recorded. “Plus growth” is growth on the margin of the calcified structure that does not represent a complete year of growth. If “plus growth” is present, then the new agei
variable will have a value greater than the age-at-capture value. These instances should be removed from the new long format data frame.
The following code adjusts for these three issues.
<- wb90r |>
wb90r mutate(agei=as.numeric(stringr::str_sub(agei,start=4))) |>
filter(!is.na(radi)) |>
filter(agei<=agecap)
peek(wb90r,n=10)
#R| yearcap fish agecap lencap radcap agei radi
#R| 1 1990 374 7 243 7.57375 1 1.01672
#R| 85 1990 388 9 300 9.38096 8 8.08080
#R| 170 1990 408 3 136 3.12974 3 3.12974
#R| 256 1990 436 3 154 4.27702 1 1.46016
#R| 341 1990 460 2 115 2.66954 2 2.66954
#R| 426 1990 698 7 264 7.76545 3 3.62943
#R| 511 1990 714 7 290 9.31498 2 2.89488
#R| 597 1990 730 2 98 2.24291 1 1.21677
#R| 682 1990 747 7 265 7.84295 7 7.84295
#R| 767 1990 998 7 298 8.54805 7 8.54805
Computing Back-Calculated Lengths
Scale-Length Relationships
The scale-length (Equation 1) and length-scale (Equation 2) relationships required for all but the Dahl-Lea method are computed with the wide format data. Thus, the wide format data frame must contain the length of the fish (e.g., lencap
) and the total radius of the calcified structure (e.g., radcap
) at the time of capture. Both linear relationships are fit with lm()
and the coefficients should be extracted with coef()
and saved into objects.
<- lm(radcap~lencap,data=wb90)
lm.sl <- coef(lm.sl)[[1]] ) ( a
#R| [1] -1.304391
<- coef(lm.sl)[[2]] ) ( b
#R| [1] 0.03537477
<- lm(lencap~radcap,data=wb90)
lm.ls <- coef(lm.ls)[[1]] ) ( c
#R| [1] 41.65166
<- coef(lm.ls)[[2]] ) ( d
#R| [1] 27.35597
Applying the Back-Calculation Models
The \(L_{i}\) estimated with a back-calculation model are most easily added to the long format data frame with mutate()
. For example, the \(L_{i}\) computed with all four back-calculation models are added to wb90r
below.
<- wb90r |>
wb90r mutate(DL.len=(radi/radcap)*lencap,
FL.len=(radi/radcap)*(lencap-c)+c,
SPH.len=(-a/b)+(lencap+a/b)*(radi/radcap),
BPH.len=lencap*(c+d*radi)/(c+d*radcap))
headtail(wb90r,n=5)
#R| yearcap fish agecap lencap radcap agei radi DL.len FL.len
#R| 1 1990 374 7 243 7.57375 1 1.01672 32.62096 68.68119
#R| 2 1990 374 7 243 7.57375 2 2.39296 76.77693 105.26856
#R| 3 1990 374 7 243 7.57375 3 3.47635 111.53696 134.07052
#R| 4 1990 374 7 243 7.57375 4 4.37718 140.43964 158.01910
#R| 5 1990 374 7 243 7.57375 5 5.18976 166.51087 179.62156
#R| 763 1990 998 7 298 8.54805 3 3.60701 125.74669 149.82265
#R| 764 1990 998 7 298 8.54805 4 4.48222 156.25804 176.06941
#R| 765 1990 998 7 298 8.54805 5 5.17646 180.46047 196.88905
#R| 766 1990 998 7 298 8.54805 6 6.62240 230.86847 240.25149
#R| 767 1990 998 7 298 8.54805 7 8.54805 298.00000 298.00000
#R| SPH.len BPH.len
#R| 1 64.54444 67.83504
#R| 2 102.00007 104.60001
#R| 3 131.48552 133.54177
#R| 4 156.00243 157.60660
#R| 5 178.11753 179.31392
#R| 763 147.06071 151.78969
#R| 764 173.79670 177.68803
#R| 765 195.00440 198.23129
#R| 766 239.17509 241.01809
#R| 767 298.00000 298.00000
For example, the mean length-at-age may be computed from the back-calculated lengths (shown below for the Fraser-Lee results).
|>
wb90r group_by(agei) |>
summarize(n=validn(FL.len),mn=mean(FL.len),sd=sd(FL.len)) |>
as.data.frame()
#R| agei n mn sd
#R| 1 1 181 78.5663 6.472692
#R| 2 2 178 114.1527 10.453632
#R| 3 3 155 146.7669 13.898434
#R| 4 4 71 172.6512 15.339848
#R| 5 5 64 201.0405 17.479717
#R| 6 6 64 235.3834 23.350527
#R| 7 7 50 268.5969 25.286958
#R| 8 8 2 283.2237 26.912010
#R| 9 9 2 314.5000 20.506097
Additionally, the mean length at each back-calculated age computed separately for each age-at-capture may be computed with sumTable()
from FSA
, where the left side of the formula is the quantitative variable to be summarized and the right side has grouping variables presented in row*column
format.
sumTable(FL.len~agecap*agei,data=wb90r,digits=1)
#R| 1 2 3 4 5 6 7 8 9
#R| 1 73.7 NA NA NA NA NA NA NA NA
#R| 2 79.7 113.3 NA NA NA NA NA NA NA
#R| 3 77.3 112.8 148.9 NA NA NA NA NA NA
#R| 4 71.3 121.8 160.9 194.0 NA NA NA NA NA
#R| 6 79.8 107.9 136.0 169.6 198.8 229.7 NA NA NA
#R| 7 81.6 118.2 144.7 170.7 201.8 237.0 269.1 NA NA
#R| 9 71.2 99.8 135.0 166.1 197.5 236.0 256.4 283.2 314.5