Preference for Bottled Water?
- α=0.01.
- HA: “There is a preference among the water choices (i.e., the distribution of students will NOT follow a theoretical distribution of equal proportions among the water choices)” and H0: “There is no preference among the water choices (i.e., the distribution of students will follow a theoretical distribution of equal proportions among the water choices)”.
- Goodness-of-fit test because (i) one group or population (students) were sampled and (ii) the response variable (water choice) is categorial.
- This is quasi-experimental as the students were given a choice of only certain waters. The students were not obviously randomly selected (and were likely part of a voluntary response survey).
- All cells in the expected frequency table (Table 1) were greater than 5; the assumptions for a goodness-of-fit test have been met. [Note that these expected values equal 107/4 to represent no preference in choice of water.]
- The observed frequency table is shown in Table 2.
- χ2=\(\frac{(51-26.75)^{2}}{26.75}\) + \(\frac{(18-26.75)^{2}}{26.75}\) +\(\frac{(17-26.75)^{2}}{26.75}\) +\(\frac{(21-26.75)^{2}}{26.75}\)=29.6355 with 3 df.
- p-value<0.00005 (more specifically 1.646e-06).
- Reject H0 because the p-value<α.
- It appears that there is a preference for water type by the students. Further examination suggests that the students least-preferred the tap water.
- Not required for a goodness-of-fit test
Table 1: Expected frequency of students (assuming that Ho is true) by water type choice.
tap Aquafina Fiji Sams
26.75 26.75 26.75 26.75
Table 2: Observed frequency of students by water type choice.
tap Aquafina Fiji Sams
51 18 17 21
R Appendix.
library(NCStats)
distrib(29.6355,distrib="chisq",df=3,lower.tail=FALSE)
Habitat Use by Wild Turkeys
- α=0.05.
- HA: “There is a preference by the turkeys for habitat (i.e., the distribution of turkeys will NOT follow the (theoretical) distribution of available habitats)” and H0: “There is NOT a preference by the turkeys for habitat (i.e., the distribution of turkeys will follow the (theoretical) distribution of available habitats)”.
- Goodness-of-fit test because (i) one group or population (South Dakota turkeys) were sampled and (ii) the response variable (habitat use) is categorical.
- This is observational as habitat use by the turkeys was simply recorded. The turkey observations were not randomly selected (the researchers likely chose days to sample).
- All cells in the expected frequency table (Table 3) were greater than 5; the assumptions for a goodness-of-fit test have been met. [Note that these expected values are equal to 878 total observations times the proportions of available habitat to represent no preference in choosing habitat types by the turkeys.]
- The observed frequency table is shown in Table 4.
- χ2=\(\frac{(32-25.9)^{2}}{25.9}\) + \(\frac{(16-51.0)^{2}}{51.0}\) +\(\frac{(449-420.1)^{2}}{420.1}\) +\(\frac{(5-5.0)^{2}}{5.0}\)=27.436 with 3 df.
- p-value<0.00005 (somewhat more specifically 4.77e-06).
- Reject H0 because the p-value<α.
- It appears that the turkeys show a habitat preference in fall. Further analysis suggests that the turkeys showed a strong avoidance of meadows and a moderate preference for aspen and pine habitats.
- Not required for a goodness-of-fit test.
Table 3: Expected frequency of turkeys (assuming that Ho is true) by habitat in the fall.
Aspen Meadow Pine OakSpruce
25.9 51.0 420.1 5.0
Table 4: Observed frequency of turkeys by habitat in the fall.
Aspen Meadow Pine OakSpruce
32 16 449 5
R Appendix.
library(NCStats)
distrib(91.530,distrib="chisq",df=3,lower.tail=FALSE)
Random Jury Pool
- α=0.10.
- HA: “The age distribution of the actual jury is not the same as the age distribution in the district as a whole” and H0: “The age distribution of the actual jury is the same as the age distribution in the district as a whole”.
- Goodness-of-fit test because (i) one group or population (the jury in this district) were sampled and (ii) the response variable (age category) is categorial.
- This is observational as the jurors were not allocated to ages. In theory the jury was randomly selected from the district, but that is what is being tested.
- All cells in the expected frequency table (Table 5) were greater than 5; the assumptions for a goodness-of-fit test have been met.
- The observed frequency table is shown in Table 6.
- χ2=\(\frac{(23-81.496)^{2}}{81.496}\) + \(\frac{(96-200.400)^{2}}{200.400}\) +\(\frac{(134-180.360)^{2}}{180.360}\) +\(\frac{(293-289.912)^{2}}{289.912}\)+\(\frac{(297-204.408)^{2}}{204.408}\)+\(\frac{(380-243.152)^{2}}{243.152}\)+\(\frac{(113-136.272)^{2}}{136.272}\)=231.2600 with 6 df.
- p-value<0.00005 (somewhat more specifically 4.12e-47).
- Reject H0 because the p-value<α.
- It appears that the age distribution of the jury does not match the age distribution of the district as a whole. Further examination suggests that the jury has a higher percentage of older jurors (ages “40-49” and “50-64”) and a lower percentage of younger jurors (ages “18-19”, “20-24”, and “25-29”) than expected as based on the age distribution in the district as a whole.
- Not required for a goodness-of-fit test.
Table 5: Expected frequency of jurors (assuming that Ho is true) by age category.
18-19 20-24 25-29 30-39 40-49 50-64 65+
81.496 200.400 180.360 289.912 204.408 243.152 136.272
Table 6: Observed frequency of jurors by age category.
18-19 20-24 25-29 30-39 40-49 50-64 65+
23 96 134 293 297 380 113
R Appendix.
library(NCStats)
distrib(231.2600,distrib="chisq",df=6,lower.tail=FALSE)