Preference for Bottled Water?

  1. α=0.01.
  2. HA: “There is a preference among the water choices (i.e., the distribution of students will NOT follow a theoretical distribution of equal proportions among the water choices)” and H0: “There is no preference among the water choices (i.e., the distribution of students will follow a theoretical distribution of equal proportions among the water choices)”.
  3. Goodness-of-fit test because (i) one group or population (students) were sampled and (ii) the response variable (water choice) is categorial.
  4. This is quasi-experimental as the students were given a choice of only certain waters. The students were not obviously randomly selected (and were likely part of a voluntary response survey).
  5. All cells in the expected frequency table (Table 1) were greater than 5; the assumptions for a goodness-of-fit test have been met. [Note that these expected values equal 107/4 to represent no preference in choice of water.]
  6. Table 1: Expected frequency of students (assuming that Ho is true) by water type choice.

         tap Aquafina     Fiji     Sams 
       26.75    26.75    26.75    26.75 
  7. The observed frequency table is shown in Table 2.
  8. Table 2: Observed frequency of students by water type choice.

         tap Aquafina     Fiji     Sams 
          51       18       17       21 
  9. χ2=\(\frac{(51-26.75)^{2}}{26.75}\) + \(\frac{(18-26.75)^{2}}{26.75}\) +\(\frac{(17-26.75)^{2}}{26.75}\) +\(\frac{(21-26.75)^{2}}{26.75}\)=29.6355 with 3 df.
  10. p-value<0.00005 (more specifically 1.646e-06).
  11. Reject H0 because the p-value<α.
  12. It appears that there is a preference for water type by the students. Further examination suggests that the students least-preferred the tap water.
  13. Not required for a goodness-of-fit test

R Appendix.

library(NCStats)
distrib(29.6355,distrib="chisq",df=3,lower.tail=FALSE)

Habitat Use by Wild Turkeys

  1. α=0.05.
  2. HA: “There is a preference by the turkeys for habitat (i.e., the distribution of turkeys will NOT follow the (theoretical) distribution of available habitats)” and H0: “There is NOT a preference by the turkeys for habitat (i.e., the distribution of turkeys will follow the (theoretical) distribution of available habitats)”.
  3. Goodness-of-fit test because (i) one group or population (South Dakota turkeys) were sampled and (ii) the response variable (habitat use) is categorical.
  4. This is observational as habitat use by the turkeys was simply recorded. The turkey observations were not randomly selected (the researchers likely chose days to sample).
  5. All cells in the expected frequency table (Table 3) were greater than 5; the assumptions for a goodness-of-fit test have been met. [Note that these expected values are equal to 878 total observations times the proportions of available habitat to represent no preference in choosing habitat types by the turkeys.]
  6. Table 3: Expected frequency of turkeys (assuming that Ho is true) by habitat in the fall.

        Aspen    Meadow      Pine OakSpruce 
         25.9      51.0     420.1       5.0 
  7. The observed frequency table is shown in Table 4.
  8. Table 4: Observed frequency of turkeys by habitat in the fall.

        Aspen    Meadow      Pine OakSpruce 
           32        16       449         5 
  9. χ2=\(\frac{(32-25.9)^{2}}{25.9}\) + \(\frac{(16-51.0)^{2}}{51.0}\) +\(\frac{(449-420.1)^{2}}{420.1}\) +\(\frac{(5-5.0)^{2}}{5.0}\)=27.436 with 3 df.
  10. p-value<0.00005 (somewhat more specifically 4.77e-06).
  11. Reject H0 because the p-value<α.
  12. It appears that the turkeys show a habitat preference in fall. Further analysis suggests that the turkeys showed a strong avoidance of meadows and a moderate preference for aspen and pine habitats.
  13. Not required for a goodness-of-fit test.

R Appendix.

library(NCStats)
distrib(91.530,distrib="chisq",df=3,lower.tail=FALSE)





Random Jury Pool

  1. α=0.10.
  2. HA: “The age distribution of the actual jury is not the same as the age distribution in the district as a whole” and H0: “The age distribution of the actual jury is the same as the age distribution in the district as a whole”.
  3. Goodness-of-fit test because (i) one group or population (the jury in this district) were sampled and (ii) the response variable (age category) is categorial.
  4. This is observational as the jurors were not allocated to ages. In theory the jury was randomly selected from the district, but that is what is being tested.
  5. All cells in the expected frequency table (Table 5) were greater than 5; the assumptions for a goodness-of-fit test have been met.
  6. Table 5: Expected frequency of jurors (assuming that Ho is true) by age category.

      18-19   20-24   25-29   30-39   40-49   50-64     65+ 
     81.496 200.400 180.360 289.912 204.408 243.152 136.272 
  7. The observed frequency table is shown in Table 6.
  8. Table 6: Observed frequency of jurors by age category.

    18-19 20-24 25-29 30-39 40-49 50-64   65+ 
       23    96   134   293   297   380   113 
  9. χ2=\(\frac{(23-81.496)^{2}}{81.496}\) + \(\frac{(96-200.400)^{2}}{200.400}\) +\(\frac{(134-180.360)^{2}}{180.360}\) +\(\frac{(293-289.912)^{2}}{289.912}\)+\(\frac{(297-204.408)^{2}}{204.408}\)+\(\frac{(380-243.152)^{2}}{243.152}\)+\(\frac{(113-136.272)^{2}}{136.272}\)=231.2600 with 6 df.
  10. p-value<0.00005 (somewhat more specifically 4.12e-47).
  11. Reject H0 because the p-value<α.
  12. It appears that the age distribution of the jury does not match the age distribution of the district as a whole. Further examination suggests that the jury has a higher percentage of older jurors (ages “40-49” and “50-64”) and a lower percentage of younger jurors (ages “18-19”, “20-24”, and “25-29”) than expected as based on the age distribution in the district as a whole.
  13. Not required for a goodness-of-fit test.

R Appendix.

library(NCStats)
distrib(231.2600,distrib="chisq",df=6,lower.tail=FALSE)