- With respect to indicator variables, ulimate full model, and submodels …
- The indicator variable is named after the “1” group. The reference group will not be explicitly stated in the indicator variables, but is defined when all indicator variables are 0.
- The variables are entered into the ultimate full model in this order – covariate, all indicator variables, and all interaction variables.
- The submodel for the “Arabian Gulf” group is found by plugging in 0 for ALL indicator variables and then simplifying. Essentially everything in the ultimate full model multiplied by an indicator variable is dropped.
- The submodel for the “Indian Ocean” group is found by plugging in 1 for IO and 0 for the other indicator variables and then simplifying. Essentially everything in the ultimate full model multiplied by one of the other indicator variables is dropped. Additionally, the word IO is dropped and then the two constants (α and δ1) are lumped together and the two items multiplied by the covariate (β and γ1) are lumped together. A similar argument is made for the other groups.
- With respect to the tests …
- Be VERY careful with your interpretations of the parallel lines test. Some of you said that there is “no difference between clutch size and carapace length.” This is NOT correct … there is clearly a difference between these two things (like comparing elephants and ants) and that is not what was tested. What you found is that there is NOT a difference in slopes for the regression between clutch size and carapace length among the regions or, better yet, there is NOT a difference in the RELATIONSHIP between clutch size and carapace length among the regions.
- It is “legal” to perform the equal interecepts test because the lines were found to be parallel. It is also “legal” to examine the relationships test p-value (in the row labelled with the covariate) for the same reason. Think of the line for the covariate (relationship test) and the line with the group factor variable (equal intercepts test) as analagous to “main effects” that can be properly interpreted if the interaction variable (parallel lines test) is not significant.
If you decided that the lines are NOT parallel (not the case here) then use
compSlopes()to determine which differ. If you decided that the lines are parallel but some intercepts differ then use
compIntercepts()to determine which intercepts differ.
As has been the case, there is no value to using
summary()except to see the estimated parameters, which are more efficiently seen with
cbind()ed together). DO NOT perform hypothesis tests with the p-values in
summary()as they are not comprehensive and have not been corrected for multiple comparisons.
<li>The fitplot is a visual for the results for your tests. When making a conclusion don't ignore the results from all of your tests and simply make up a conclusion based on your interpretation of the fitplot (which will not include any acknowledgment of sampling variability). </ul>
Turtle Nesting Ecology
As there are five groups in this analysis and the “Arabian Gulf” turtles are to be the reference group, I created the following four indicator variables.
- IO=1 if in the “Indian Ocean” group, 0 otherwise
- RS=1 if in the “Red Sea” group, 0 otherwise
- CO=1 if in the “Caribbean” group, 0 otherwise
- WA=1 if in the “West Atlantic” group, 0 otherwise
The ultimate full model is then μCSZ = α + βCCL + δ1IO + δ2RS + δ3CO + δ4WA + γ1IO×CCL + γ2RS×CCL + γ3CO×CCL + γ4WA×CCL, where CSZ is “clutch size” and CCL is “curved carapace length” of the turtles.
The submodels for all five groups are below.
- Arabian Gulf: μCSZ = α + βCCL
- Indian Ocean: μCSZ = (α+δ1) + (β+γ1)CCL
- Red Sea: μCSZ = (α+δ2) + (β+γ2)CCL
- Caribbean: μCSZ = (α+δ3) + (β+γ3)CCL
- West Atlantic: μCSZ = (α+δ4) + (β+γ4)CCL
It is difficult to assess independence without more information; however, as long as the turtles were randomly selected at a site there should not be a problem of within-group independence and among-group independence is likely given the geographic spread of the regions. The resiuals appear largely linear and homoscedastic (Figure 1-Right), not normal (Anderson Darling p=0.0292) but quite symmetric without overly long tails (Figure 1-Left), and without significant outliers (outlier test p=0.0521).
The slopes between clutch size and curved carapace length are statistically similar among the five regions (p=0.1192; Table 1); thus, the lines that describe the relationship between clutch size and curved carapace length for the separate regions are all parallel. Thus, the relationship between clutch size and curved carapace length does not differ among the separate regions.
The intercepts (assuming parallel lines) for the lines describing the relationship between clutch size and curved carapace length are statistically diffrent among the five regions (p<0.00005; Table 1). Turtles from the Arabian Gulf had a significantly smaller intercept than turtles from all other regions except the Indian Ocean (Table 3). The intercepts for all other pairs of regions are statistically equal (Table 3).
There is a significant relationship between clutch size and curved carapace length (p<0.00005; Table 1). As curved carapace length increases so does clutch size and the degree of increase is the same for all regions (because the lines are paralle as shown above).
A plot that illustrates the overall model fit is in Figure 2. Note that the slope of these lines are statistically equal.
The results of the previous analysis show that the relationship between clutch size and curved carapace length does not differ among the five regions. In other words, clutch size increases with increasing carapace length but that increase is statistically the same across all five regions. The results above also indicate that turtles from the Arabian Gulf have significantly smaller clutch sizes after adjusting for differences in turtle size than turtles from all regions except the Indian Ocean. In other words, some other factor besides size of the turtle explains that turtles from the Arabian Gulf have smaller clutch sizes.
Figure 1: Histogram of residuals (left) and residual plot (right) for indicator variable regression of clutch size on curved carapace length of Hawksbill Turtles from five regions.
Table 1: ANOVA table for the indicator variable regression of clutch size on curved carapace length of Hawksbill Turtles from five regions.
Df Sum Sq Mean Sq F value Pr(>F) CCL 1 246757 246757 526.8775 < 2.2e-16 Region 4 22045 5511 11.7675 5.266e-09 CCL:Region 4 3461 865 1.8472 0.1192 Residuals 368 172349 468
Table 2: Difference, 95% confidence interval, and p-value for the difference in intercepts for the indicator variable regression of clutch size on curved carapace length of Hawksbill Turtles from five regions.
comparison diff 95% LCI 95% UCI p.adj 1 Indian Ocean-Arabian Gulf 18.639927 -1.684516 38.96437 0.08969 2 Red Sea-Arabian Gulf 21.966128 9.484777 34.44748 0.00002 3 Caribbean-Arabian Gulf 27.802604 20.865117 34.74009 0.00000 4 West Atlantic-Arabian Gulf 30.941861 20.322623 41.56110 0.00000 5 Red Sea-Indian Ocean 3.326201 -19.689726 26.34213 0.99479 6 Caribbean-Indian Ocean 9.162677 -11.381848 29.70720 0.73826 7 West Atlantic-Indian Ocean 12.301934 -9.759729 34.36360 0.54455 8 Caribbean-Red Sea 5.836476 -7.000138 18.67309 0.72411 9 West Atlantic-Red Sea 8.975734 -6.170822 24.12229 0.48269 10 West Atlantic-Caribbean 3.139258 -7.895359 14.17387 0.93634
Figure 2: Scatterplot of clutch size on curved carapace length of Hawksbill Turtles from five regions with best-fit lines.
ht <- read.csv("HawksbillTurtles.csv") ht$Region <- factor(ht$Region, levels=c("Arabian Gulf","Indian Ocean","Red Sea", "Caribbean","West Atlantic")) lm1 <- lm(Clutch.Size~CCL*Region,data=ht) transChooser(lm1) aov1 <- anova(lm1) compIntercepts(lm1) fitPlot(lm1,xlab="Curved Carapace Length (cm)",ylab="Clutch Size", legend="topleft",col=c("red3","red3","red3","blue3","blue3"))