- Make sure to describe your evidence in Step 3 for why you are performing a chi-square test. It is not adequate to simply say “because there is a categorical response variable and two or more groups are being tests.” At least say what the response variable is and what the groups are.
- When testing the assumptions (Step 5), make sure to explicitly say that there are more than five in each cell of the expected table. It is not adequate to simply make the expected table.
-
In hand-calculation questions (e.g., hurricanes) make sure to put
lower.tail=FALSE
indistrib()
when calculating the p-value (Step 8) in a chi-square problem. This is true for EVERY chi-square problem. - You values may differ slightly from mine as I calculate everything in R that holds many decimal places on intermediate values.
Hurricane Strengths
- α=0.10.
- H0: “The distribution of hurricanes into the strength categories is the same for both time periods” versus HA: “The distribution of hurricanes into the strength categories is NOT the same for both time periods”
- A chi-square test is required because (i) a categorical response variable with three levels (strength category) from (ii) two groups or populations (time periods) was recorded.
- This is an observational study without obvious randomization.
- The test statistic computed below should reasonably follow a chi-square distribution because all cells in the expected table (Table 1) have values greater than five.
- The table of observed frequencies were given as in Table 2.
- The χ2 test statistic is \(\frac{(51-51.4)^{2}}{51.4}\) + \(\frac{(26-25.4)^{2}}{25.4}\) +\(\frac{(8-8.1)^{2}}{8.1}\) +\(\frac{(44-43.6)^{2}}{43.6}\) +\(\frac{(21-21.6)^{2}}{21.6}\) +\(\frac{(7-6.9)^{2}}{6.9}\) = 0.0036+0.0121+0.0018+0.0043+0.0142+0.0021 = 0.0382 with 2 df.
- The p-value is 0.9811.
- The H0 is not rejected because the p-value>α.
- It appears that the distribution of hurricanes into the strength categories is the same for both time periods (Table 3).
- Not required for a chi-square test.
Table 1: Expected frequency table for distribution of hurricanes into the strength and time period categories.
Cat 1&2 Cat 3 Cat 4&5
1901-1950 51.4 25.4 8.1
1951-2000 43.6 21.6 6.9
Table 2: Observed frequency table for distribution of hurricanes into the strength and time period categories.
Cat 1&2 Cat 3 Cat 4&5 Sum
1901-1950 51 26 8 85
1951-2000 44 21 7 72
Sum 95 47 15 157
Table 3: Row percentage table for the observations of hurricane strengths by time period.
Cat 1&2 Cat 3 Cat 4&5 Sum
1901-1950 60.0 30.6 9.4 100
1951-2000 61.1 29.2 9.7 100
R Appendix.
library(NCStats)
distrib(0.0382,distrib="chisq",df=2,lower.tail=FALSE)
Response to Hello
- α = 0.05.
- The H0: “The distribution of groups into whether they responded or not to the ‘Hello’ is the same for all three group sizes” versus H0: “The distribution of groups into whether they responded or not to the ‘Hello’ is NOT the same for all three group sizes.”
- A chi-square test is required because (i)a categorical variable with two levels (response or not) was measured on (ii) three groups/populations (group sizes).
- This study is experimental in that the the researchers intervened with the subjects (i.e., said “Hello” to them) but it is observational in the sense that the “groups” (number of peple together) were not created by the researcher. It is clear that the “groups” were not randomly selected.
- The test statistic below should follow a χ2 distribution because the expected number in each cell is greater than five (Table 4).
- The observed frequency table is in Table 5.
- The χ2 test statistic is \(\frac{(92-84.3)^{2}}{84.3}\) + \(\frac{(27-34.7)^{2}}{34.7}\) +\(\frac{(65-66.6)^{2}}{66.6}\) +\(\frac{(29-27.4)^{2}}{27.4}\) +\(\frac{(13-19.1)^{2}}{19.1}\) +\(\frac{(14-7.9)^{2}}{7.9}\) = 0.703+ 1.709+ 0.038+ 0.093+ 1.948+ 4.710= 9.201 with (3-1)(2-1)=2 df.
- The p-value is 0.0100.
- The H0 is rejected because the p-value < α.
- There is a difference in the responsiveness among the three sizes of groups. It appears that the groups with 4,5,6 individuals responded less frequently than the individuals and groups of 2 and 3 (Table 6).
- Not needed for a chi-square test.
Table 4: Expected frequencies of people that responded to ‘Hello’ and group size.
Responded Did Not Respond
Individual 84.3 34.7
Two or Three 66.6 27.4
Four or more 19.1 7.9
Table 5: Observed frequencies of people that responded to ‘Hello’ and group size.
Responded Did Not Respond Sum
Individual 92 27 119
Two or Three 65 29 94
Four or more 13 14 27
Sum 170 70 240
Table 6: The percentage of people that responded to ‘Hello’ by group size.
Responded Did Not Respond Sum
Individual 77.3 22.7 100
Two or Three 69.1 30.9 100
Four or more 48.1 51.9 100
R Appendix.
library(NCStats)
distrib(9.201,distrib="chisq",df=2,lower.tail=FALSE)
Turtle Excluder Devices
Table 7: Expected frequencies of trawl tows with at least one turtle mortality by two sizes of openings.
mortality no mortality
original 11.04294 63.95706
new 12.95706 75.04294
Table 8: Observed frequencies of trawl tows with at least one turtle mortality by two sizes of openings.
mortality no mortality
original 16 59
new 8 80
Table 9: The percentage of trawl tows with at least one turtle mortality by two sizes of openings.
mortality no mortality Sum
original 21.3 78.7 100
new 9.1 90.9 100
R Appendix
> obs <- matrix(c(16,75-16,8,88-8),nrow=2,byrow=TRUE)
> rownames(obs) <- c("original","new")
> colnames(obs) <- c("mortality","no mortality")
> ( chi1 <- chisq.test(obs,correct=FALSE) )
> chi1$expected
> percTable(chi1$observed,margin=1)