Exercise

Note: Your answers to the questions below should follow the expectations for homework found here. Questions outside of class can be asked on the Module Assignments-Questions Teams channel (see link on homepage).

pH in Two Rivers

Burke Center researchers recorded the pH at ten locations in two streams that were close in proximity but in different watersheds with markedly different geologies. They wanted to determine if the mean pH differed between the two streams. Their data are shown in the table below.

Stream A: 8.97 9.12 9.41 8.67 9.94 8.28 7.86 7.51 9.18 7.68
Stream B: 6.67 5.83 6.84 6.86 5.89 7.42 6.56 5.99 5.33 6.69

Load these data into R and answer the following questions. Make sure to show and refer to R code and results as needed. [Note that these data were used in a previous exercise. Some questions below may refer to your work on that previous exercise.]

Write the simple and full models for this situation. Make sure to define all symbols (and subscripts).
What are values for I and n for these data.

For the following questions you should present and refer to an ANOVA table computed from these data. Each answer should have a value directly from or computed from the ANOVA table and a proper label (i.e., using df_Among, SS_Total, MS_Within, etc.),

What are the residual df for the simple model?
What are the residual df for the full model?
What is the difference in number of parameters between the simple and full model?
What is the variance of individuals around the simple model?
What is the variance of individuals around the full model?
What is the variance among sample means?
What value represents the amount of “signal” in the data?
What value represents the amount of remaining “noise” in the data?
What value is the same as the “pooled sample variance” from the 2-sample t-test?
What value is the same as the “sample variance” from your introductory statistics course?
What value is the variance not explained by the full model?
What value is the variance explained by the full model?

The following questions still refer to values from the ANOVA table.

Explain what the F-ratio means. Your explanation is not simply how it is calculated (i.e., it is not simply a formula), rather it should include two separate explanations focused on variabilities (explained and unexplained) and relative model fits.
How does the F-ratio test statistic from the ANOVA table compare to the t test statistic from the 2-sample t-test? This is not a simple “equals”, “less than”, or “greater than” answer; there is a specific relationship (as described in the reading).
What three related conclusions can you make from the p-value? These should be related to models, hypotheses, and number of means.
How does the p-value from the ANOVA table compare to the p-value from the 2-sample t-test? [You may need to refer to your results from this module’s exercises.]
Write an overall conclusion from this study.