Module 23 IVR Analysis
Modules 20-22 built a conceptual and analytical foundation for an indicator variable regresion (IVR). Three more analytical topics need to be addressed before a full IVR can be completed in Module 24. Those topics – assumptions and transformations, multiple comparisons for slopes, and multiple comparisons for intercepts – are introduced in this module.
23.1 Assumptions & Transformations
The assumptions for an IVR are the same as those for an SLR – independence, linearity, homoscedasticity, normality, and no outliers (see Module 17). Methods for assessing the validity of these assumptions are also essentially the same as those used for an SLR. In other words, residual plots will be used to examine linearity and assess homoscedasticity, an Anderson-Darling test and a histogram of the residuals will be used to assess the normality of residuals, and a residual plot and an outlier test will be used to diagnose outliers and influential points.
An important point to note, though, is that the IVR assumptions are assessed on the ultimate full model. If the assumptions are not met on the ultimate full model, then transformations for the response and the covariate99 should be considered such that a transformed version of the ultimate full model meets the assumptions.100 Once an ultimate full model is found where all assumptions are met, then it is assumed that all assumptions will be met for any subset of the ultimate full model that has non-significant explanatory variables removed.
The assumptions of an IVR are the same as the assumptions for an SLR. Thus, methods for assessing the assumptions are the same as those methods used for an SLR
IVR assumptions are assessed for the ultimate full model. Assumptions do not need to be assessed for simpler models that only have variables from the ultimate full model removed.
23.2 Multiple Comparisons for Slopes
In Section 22.6, IVR indicated that some slopes between mirex concentration in the tissue and salmon weight differed among 1992, 1996, and 1999 (Figure 23.1). An examination of Figure 23.1 suggests that the slopes for 1992, 1996, and 1999 are all likely not the same. However, an objective method is needed to determine which pairs of slopes actually differ. Parallel lines tests for IVRs taking two years at a time could be used but this is inefficient and would increase the experiment-wise error rate (recall from Section 6.1).
The emtrends()
function from the emmeans
package101 can be used to test for differences among all pairs of slopes using a Tukey-like correction. This function requires the ultimate full model as its first argument (ivrY1
here as fit in Section 22.2), a specs=
argument with pairwise~
followed by the name of the factor variable in the ultimate full model, and var=
followed by the name of the covariate in the ultimate full model (which must be in quotes).102 The results of this function should be saved to an object.
<- emtrends(ivrY1,pairwise~year,var="weight") smc
That saved object is then the first argument to summary()
, which also includes infer=TRUE
. The results are in sections labeled as $emtrends
and $contrasts
.
<- summary(smc,infer=TRUE) ) ( smcsum
#R> $emtrends
#R> year weight.trend SE df lower.CL upper.CL t.ratio p.value
#R> 1992 0.02653 0.00483 44 0.01679 0.0363 5.489 <.0001
#R> 1996 0.01225 0.00306 44 0.00609 0.0184 4.004 0.0002
#R> 1999 0.00386 0.00499 44 -0.00620 0.0139 0.773 0.4434
#R>
#R> Confidence level used: 0.95
#R>
#R> $contrasts
#R> contrast estimate SE df lower.CL upper.CL t.ratio p.value
#R> 1992 - 1996 0.01428 0.00572 44 0.000403 0.0282 2.496 0.0425
#R> 1992 - 1999 0.02267 0.00695 44 0.005815 0.0395 3.262 0.0059
#R> 1996 - 1999 0.00839 0.00586 44 -0.005813 0.0226 1.433 0.3331
#R>
#R> Confidence level used: 0.95
#R> Conf-level adjustment: tukey method for comparing a family of 3 estimates
#R> P value adjustment: tukey method for comparing a family of 3 estimates
The $contrasts
section contains the Tukey-corrected tests for each pair of slopes. The important columns in these results are
estimate
: the difference in group sample slopes,lower.CL
andupper.CL
: 95% confidence interval for the difference in slopes, andp.value
: p-value for testing that the difference in slopes is 0 or not.
For example, the sample slope for 1992 is 0.01428 greater than the sample slope for 1996, one is 95% confident that the population slope for 1992 is between 0.000403 and 0.0282 greater than the population slope for 1996, and the population slopes differ between 1992 and 1996 (p=0.0425). In contrast, the sample slope for 1996 is 0.00839 greater than the sample slope for 1999, one is 95% confident that the population slope for 1996 is between 0.005813 less than and 0.0226 greater than the population slope for 1999, and the population slopes do NOT differ between these two years (p=0.3331).
An examination of all of the $contrast
results suggests that the slope from 1992 differed significantly from the slopes of 1996 (p=0.0425) and 1999 (p=0.0059), but the slopes for 1996 and 1999 did not differ (p=0.3331).
The difference in group slopes with 95% confidence intervals and p-values are shown in the $contrasts
section of the results.
The $emtrends
section of the output contains information about the individual slopes, specifically.
XXX.trend
: the sample slope (note thatXXX
will be replaced with the name of the covariate in the model),lower.CL
andupper.CL
: 95% confidence interval for the slope, andp.value
: p-value for the test whether the slope is 0 or not (i.e., is there a relationship or not).
For example, the sample slope for 1992 is 0.02653, one is 95% confident that the population slope for 1992 is between 0.01679 and 0.0363, and there is a significant relationship between mirex in the tissue and salmon weight in 1992 (p<0.00005).
An examination of all of the results in $emtrends
shows that the only slope that was not significantly different from 0 was for 1999. Thus, there was a significant relationship between mirex concentration in the tissue and salmon weight for 1992 and 1996, but not for 1999.
The slopes with 95% confidence intervals and p-values for testing if the slope is 0 or not are shown in the $emtrends
section of the results.
23.3 Multiple Comparisons for Intercepts
In Section 22.7 of the last module, the IVR indicated that the first three years (1977, 1982, and 1986) had the same slopes and that at least some of those years had different intercepts (Figure 23.2).103 An examination of Figure 23.2 suggests that the intercept for 1977 may be greater than the intercepts for 1982 and 1986, which likely do not differ significantly. Again, an objective method is needed to determine which pairs of intercepts significantly differ.
The emmeans()
function from the emmeans
package can be used to determine which pairs of intercepts differ (using a Tukey’s correction). However, before using this function a new model with the insignificant interaction term removed should be fit.
<- lm(mirex~weight+year,data=MirexFirst3) ivrY2_noint
The use of emmeans()
here is exactly the same as it was in Section 6.3. Specifically, the model without the interaction term is the first argument and pairwise~
is followed by the name of the factor variable in the model. The results are then submitted to summary()
with infer=TRUE
and the results are given in two sections labeled $emmeans
and $contrasts
.
<- emmeans(ivrY2_noint,pairwise~year)
imc <- summary(imc,infer=TRUE) ) ( imcsum
#R> $emmeans
#R> year emmean SE df lower.CL upper.CL t.ratio p.value
#R> 1977 0.238 0.0123 68 0.213 0.262 19.392 <.0001
#R> 1982 0.184 0.0122 68 0.159 0.208 15.091 <.0001
#R> 1986 0.172 0.0123 68 0.148 0.197 14.015 <.0001
#R>
#R> Confidence level used: 0.95
#R>
#R> $contrasts
#R> contrast estimate SE df lower.CL upper.CL t.ratio p.value
#R> 1977 - 1982 0.0542 0.0173 68 0.0128 0.0956 3.139 0.0070
#R> 1977 - 1986 0.0655 0.0175 68 0.0235 0.1075 3.736 0.0011
#R> 1982 - 1986 0.0113 0.0173 68 -0.0302 0.0529 0.653 0.7913
#R>
#R> Confidence level used: 0.95
#R> Conf-level adjustment: tukey method for comparing a family of 3 estimates
#R> P value adjustment: tukey method for comparing a family of 3 estimates
The important parts of the $contrasts
portion are
estimate
: the difference in sample intercepts,lower.CL
andupper.CL
: 95% confidence intervals for the difference in intercepts, andp.value
: p-value for whether the differences in intercepts is 0 or not.
For example, the sample intercept for 1977 is 0.0542 mg/kg greater than the intercept for 1982, the population intercept for 1977 is between 0.0128 and 0.0956 greater than the population intercept for 1982, and the intercepts for these two years are significantly different (p=0.0070). Because these two years are parallel, the sample mean mirex concentration for 1977 is 0.0542 mg/kg greater than the sample mean mirex concentration for 1982 for all salmon weights.
The difference in group intercepts with 95% confidence intervals and p-values are shown in the $contrasts
section of the results.
The results in the $emmeans
portion contains information about the mean of the response variable at the mean of the covariate. In other words, this information is about the intercepts for the centered covariate (see Section 15.3). Specifically,
emmean
: sample mean of the response variable at the sample mean of the covariate,lower.CL
andupper.CL
: 95% confidence interval for the population mean of the response variable at the sample mean of the covariate, andp.value
: a p-value for testing if the population mean of the response variable at the sample mean of the covariate is equal to 0 or not.
For example, the sample mean mirex concentration in 1977 for salmon with the sample mean weight of all salmon is 0.238, one is 95% confident that the population mean mirex concentration in 1977 for a salmon with the sample mean weight of all salmon is between 0.213 and 0.262 mg/kg, and the population mean mirex concentration at the sample mean salmon weight in 1977 is not equal to 0 (p<0.00005). A visual of the sample means under emmeans
is shown in Figure 23.3.
The mean value of the response at the mean value of the covariate, along with 95% confidence intervals and p-values, are shown in the $emmeans
section of the results.
The emmeans()
function defaults to showing the mean value of the response for the mean value of the covariate because the intercept is often an extrapolation as discussed in Section 15.3. It is possible to compute the mean values of the response at other values of the covariate by using cov.reduce=
in emmeans()
. For example, the code below computes the mean values of the response at the minimum value of the covariate (and shown in Figure 23.4).104 Because the lines are parallel this does not change anything in the $contrasts
portion of the output. However, the $emmeans
portion is changed because the mean values of the response is computed at, in this example, the minimum value rather than the mean value of the covariate. Using the default calculation at the mean of covariate is probably most appropriate for most situations in general and all situations in this course.
<- emmeans(ivrY2_noint,pairwise~year,cov.reduce=min)
imc2 <- summary(imc2,infer=TRUE) ) ( imc2sum
#R> $emmeans
#R> year emmean SE df lower.CL upper.CL t.ratio p.value
#R> 1977 0.1460 0.0143 68 0.1174 0.175 10.181 <.0001
#R> 1982 0.0918 0.0151 68 0.0617 0.122 6.097 <.0001
#R> 1986 0.0805 0.0163 68 0.0479 0.113 4.928 <.0001
#R>
#R> Confidence level used: 0.95
#R>
#R> $contrasts
#R> contrast estimate SE df lower.CL upper.CL t.ratio p.value
#R> 1977 - 1982 0.0542 0.0173 68 0.0128 0.0956 3.139 0.0070
#R> 1977 - 1986 0.0655 0.0175 68 0.0235 0.1075 3.736 0.0011
#R> 1982 - 1986 0.0113 0.0173 68 -0.0302 0.0529 0.653 0.7913
#R>
#R> Confidence level used: 0.95
#R> Conf-level adjustment: tukey method for comparing a family of 3 estimates
#R> P value adjustment: tukey method for comparing a family of 3 estimates
If the covariate is transformed then any interaction terms with the covariate should be recreated using the transformed covariate.↩︎
The methods and rules for transforming these variables in an IVR are the same as those used for an SLR (see 18).↩︎
Except for
var=
this is the same as foremmeans()
described in Section 6.3↩︎Recall from last module that those three years were chosen simply to serve as an example. Choosing these years has nothing to do with the analysis from the previous section.↩︎
You can get the mean values of the response at a covariate of 0 (i.e., the actual y-intercepts) by using
cov.reduce=function(x) 0
.↩︎