Note:
  • Make sure you show an “R Code and Results” section.
  • Questions that ask about a “variance” or “variability” should be answered with an MS, as MS are “true variances” (as described in the reading).
  • Simple model results are “total” and full model results are “within.”
  • When writing the overall conclusion (last question) you can remove all of the statistical jargon (except for possibly the p-value) and simply provide an answer to the study’s question … Is the mean pH different between the two streams.
  • In your conclusion make sure that you say “MEAN” and the mean of what (i.e,. pH). The test is about the summary (i.e., the mean) and not specific observations.

pH in Two Rivers

  1. The simple model in this case is that one mean describes observations in both groups or \(Y_{ij} = \mu + \epsilon_{ij}\), where Yij is the jth observation of the response variable in the ith group, μ is the population grand mean, and εij is the “error” for the jth observation in the ith group. In contrast the full model is that to means are needed to describe the observation in each group separately or \(Y_{ij} = \mu_{i} + \epsilon_{ij}\), where μi is the population mean for the ith group.
  2. The number of groups in the study is I=2. The total number of individuals in the study is n=20.
  3. The residual df for the simple model is dfTotal=1+18=19.
  4. The residual df for the full model is dfWithin=18.
  5. The difference in number of parameters between the two models is dfAmong=1.
  6. The variance of individuals around the simple model is MSTotal=\(\frac{25.4026+9.3719}{1+18}\)=\(\frac{34.7745}{19}\)=1.8302.
  7. The variance of individuals around the full model is MSWithin=0.5207.
  8. The variance among sample means is MSAmong=25.4026.
  9. The amount of “signal” in the data is the measurement of how different the sample means are, which is represented by MSAmong=25.4026.
  10. The amount of “noise” in the data is the measurement of how variable individuals are around the group means, which is MSWithin=0.5207.
  11. The pooled sample variances for the 2-sample t-test (i.e., \(s_{p}^{2}\)) is MSWithin=0.5207.
  12. The overall sample variance (i.e., s2) is MSTotal=1.8302.
  13. The variance not explained by the full model is MSWithin=0.5207.
  14. The variance that is explained by the full model is MSAmong=25.4026.
  15. The F-ratio is the ratio of variability that is explained by the full model to the variability that is left unexplained by the full model. In other words it measures what the full model can explain to what it cannot explain. Alternatively the F-ratio is a measure of the reduction in lack-of-fit from the simple to full model to the lack-of-fit of the full model. In other words, it measures how much better than full models fits relative to how poorly it still fits the data.
  16. The F-ratio from the ANOVA table (=48.789) is exactly the square of the t test statistic from the 2-sample t-test (6.9849).
  17. The p-value in the ANOVA table (1.599e-06) exactly equals the p-value from the 2-sample t-test (1.599e-06).
  18. This p-value (=<0.00005) is less than α which means that H0 is rejected, the full model is preferred over the simple model, and there is significant evidence for a difference in the two group means.
  19. It appears that the mean pH is greater for stream A than for stream B.

R Code and Results

> d <- read.csv("phLevels.csv")
> aov <- lm(pH~river,data=d)
> anova(aov)
Analysis of Variance Table

Response: pH
          Df  Sum Sq Mean Sq F value    Pr(>F)
river      1 25.4026 25.4026  48.789 1.599e-06
Residuals 18  9.3719  0.5207                  
> t.test(pH~river,data=d,var.equal=TRUE)
 Two Sample t-test with pH by river 
t = 6.9849, df = 18, p-value = 1.599e-06
alternative hypothesis: true difference in means between group A and group B is not equal to 0 
95 percent confidence interval:
 1.576042 2.931958 
sample estimates:
mean in group A mean in group B 
          8.662           6.408