Introduction
We are deprecating fitPlot()
from the next version of FSA (v0.9.0). It will likely be removed at the end of the year 2001. We are taking this action to make FSA
more focused on fisheries applications and to eliminate “black box” functions. fitPlot()
was originally designed for students to quickly visualize the results of one- and two-way ANOVAs and simple, indicator variable, and logistic regressions.1
We now feel that students are better served by learning how to create these visualizations using methods provided by ggplot2
, which require more code, but are more modern, flexible, and transparent.
The basic plots produced by fitPlot()
are recreated here using ggplot2
to provide a resource to help users that relied on fitPlot()
transition to ggplot2
.
The examples below require the following additional packages.
Most examples below use the Mirex
data set from FSA
, which contains the concentration of mirex in the tissue and the body weight of two species of salmon (chinook
and coho
) captured in six years. The year
variable is converted to a factor below for modeling purposes and a new variable is created that indicates if the mirex concentration was greater that 0.2 or not. This new variable is used to demonstrate a logistic regression.
One-Way ANOVA
The code below fits a one-way ANOVA model to examine if mean weight differs by species.
There are at least two simple ways to visualize results from a one-way ANOVA. First, summarized means of raw data with 95% confidence intervals derived from the standard deviation, sample size, and degrees-of-freedom specific to each group are shown.
Second, marginal means may be predicted or estimated from the fitted model (discussed in detail in this vignette from the emmeans
package). However, the main difference from above is that the confidence intervals for the marginal means use an “overall” standard deviation and degrees-of-freedom estimated from across all groups. The predicted or estimated marginal means may be computed with predict()
…
or, equivalently, with emmeans()
from the emmeans
package …
fitPlot()
from FSA
(before v0.9.0) uses the first method to display means with 95% confidence intervals for both species.
Using Manually Summarized Means
The summarized means saved in sumdata
above can be plotted as shown below to recreate the fitPlot()
result. Note width=0.1
in geom_errorbar()
is used to reduce the width of the “caps” at the confidence values and group=1
is needed in geom_line()
as there is only one point for each factor level. Changes (themes, colors, labels, etc) to this basic plot can be made as usual for ggplot()
s (and is illustrated further below).
Using Built-In Functions for Summarized Means
This plot can also be created using stat_summary()
from ggplot
coupled with mean_cl_normal()
and mean()
. Take note of how each geom=
in each stat_summary()
mirrors what was used above. Also note the use of width=0.1
and group=1
here as done above.
Using Marginal Means from emmeans
The estimated marginal means may be plotted similarly to the manually summarized means but by using the aov1mcs$emmeans
object created above.
Two-Way ANOVA
The code below fits a two-way ANOVA model to examine if mean weight differs by species, by year, or by the interaction between species and year.
The fitPlot()
from FSA
(before v0.9.0) shows the mean with 95% confidence interval for all combinations of species and year.
Using Built-In Functions for Summarized Means
Again, the stat_summary()
function can be used to efficiently calculate and then plot the 95% confidence intervals and means similar to what was shown above for a one-way ANOVA. However, there are three major differences.
First, in the main ggplot()
call the color of the points and lines is mapped to one of the two factor variables (species
in this case) whereas the other factor variable is mapped to x=
.2 Second, the group=
aesthetic for the line geom must be set to the factor that describes how the lines should be connected (e.g., species
in this case). Third, the intervals and (possibly) the points at each level on the x-axis will overlap if they are not “dodged” a small amount.3 The “dodge” amount should be set outside the geom()
s so that each geom uses the same amount of “dodging.” This will assure that the intervals, points, and connecting lines for each level defined by the colors align. Below this “dodge” amount is set with position_dodge()
and saved to an object called pd
which is then set equal to position=
in each geom()
.
Using Marginal Means from emmeans
The plot with marginal means is constructed similarly as shown below. Note, however, year:species
in emmeans()
is used so that the marginal means and confidence intervals are estimated for each combination of year
and species
.
Simple Linear Regression
The code below fits a simple linear regression for examining the relationship between mirex concentration and salmon weight.
fitPlot()
from FSA
(before v0.9.0) shows the best-fit line with a 95% confidence band.
Using Manually Predicted Values
One method for recreating this plot is to create a new data frame that first has the two variables of observed data and then adds on predicted values of the response at each observed value of the explanatory variable with 95% confidence intervals. The two observed variables are selected from the original data frame with dplyr::select()
. If this new data frame is given to newdata=
in predict()
with interval="confidence"
then the predicted values (and 95% confidence intervals) will be constructed at each value of the explanatory variable. The two data frames are then column-bound together with cbind()
to make one data frame for plotting (further below).
The confidence band is first plotted as a “ribbon” with the best-fit line then added followed by the observed points. In this plot, weight
is globally mapped to x=
in ggplot()
so that it will be used for each geom. The lower and upper confidence values to ymin=
and ymax=
in geom_ribbon()
, whereas the predicted or “fit”ted values are mapped to y=
geom_line()
to make the line and the observed mirex concentrations are mapped to y=
in geom_point()
to plot the observed points. Further note the use of alpha=
to make the confidence band semi-transparent and size=
to make the fitted line slightly larger than the default. Again all aspects of this plot can be changed in the usual ggplot way.
Using A Built-In Function
The best-fit line can also be added to a scatterplot with geom_smooth()
.
Indicator Variable Regression
The code below fits an indicator variable regression to examine if the relationship between mirex concentration and salmon weight differs betwen species.
The fitPlot()
from FSA
(before v0.9.0) shows the best-fit line for both species.
Using Manually Predicted Values
The process of constructing a similar plot in ggplot()
follows the same general procedure as that for a simple linear regression. First, make a data frame that has the observed variables used in the model and predicted values and confidence limits for each observation.
Then plot the data as before but making sure to map color=
and fill=
(just for the ribbon) to the species factor variable.
Using a Built-In Function
This plot can also be constructed with geom_smooth()
, again making sure to map the color=
and fill=
to the species factor variable.
Logistic Regression
The code below fits a logistic regression to examine the relationship between the probability that mirex concentration is greater than 0.2 and salmon weight.
fitPlot()
from FSA
(before v0.9.0) shows the fitted logistic regression curve.
Using Manually Predicted Values
The first method for showing the logistic regression curve follows the general methodology for simple linear regression shown above. Note, however, that predict()
does not produce confidence interval values for a logistic regression. Thus, the plot created in this way cannot have a confidence band.
Using a Built-In Function
The best-fit logistic regression curve with a confidence band can, however, be added to a scatterplot with geom_smooth()
. In this case, method=
must be changed to glm
and method.args=
must be used as shown below so that glm
will construct a logistic (rather than linear) regression.
Note that this method easily generalizes to an indicator variable logistic regression (note that color=
and fill=
are mapped to the species factor variable).
Polynomial Regression
The code below fits a quadratic (second degree polynomial) regression for the relationship between mirex concentration and salmon weight.
fitPlot()
from FSA
(before v0.9.0) shows the best-fit regression curve.
Using Manually Predicted Values
This regression can be viewed similarly to the way the simple linear regressions was viewed.
Using a Built-In Function
This type of regression can also be viewed using geom_smooth()
but the formula for the polynomial must be given to formula=
. However, note that in this formula you put y
and x
rather than the names of the variables that are mapped to y
and x
.
Nonlinear Regression
The following code fits a von Bertalanffy growth function (VBGF) to the total length and age data for spot found in the SpotVA1
data frame built into FSA
. Fitting the VBGF is described in more detail here.
fitPlot()
from FSA
(before v0.9.0) shows this model fit to the observed data.
Using Manually Predicted Values
A similar plot can be constructed from manually predicted values very similarly to what was described for previous models. However, as with the logistic regression, predict()
will not return confidence interval values, so it is not immediately possible to create a confidence band with this method.
Using a Built-In Function
You can get a very similar plot using geom_smooth()
with formula=
set equal to the same formula as in nls()
(except using y
and x
rather than the variables mapped to those aesthetics) and putting the start=
argument in the method.args=
list. Note that the algorithm used by geom_smooth()
to computed the confidence bands will err for this non-linear model; thus, set se=FALSE
to at least see the best-fit curve.
Make a similar plot but with a confidence band derived via bootstrapping is described in this previous post.
Conclusion
The fitPlot()
function in FSA
will be deprecated in v0.9.0 and will likely not exist after that. This post describes a more transparent (i.e., not a “black box”) and flexible set of methods for constructing similar plots using ggplot2
for those who will need to transition away from using fitPlot()
.
As mentioned in the examples above, each plot can be modified further using typical methods for ggplot2
. These changes were not illustrated above to minimize the amount of code shown in this post. However, as an example, the code below shows a possible modification of the IVR plot shown above.