Equation of the Line I

  1. Dishwasher Example
    1. response - Height of suds (mm)
    2. explanatory - Amount of soap (g)
    3. slope is 12.4
    4. For each additional gram of soap, the height of suds increases by 12.4 mm, on average.
    5. For 0 grams of soap, the height of suds is -20.2 mm, on average. [This is an extrapolation.]
    6. Asked to predict height of suds for an amount of soap between 3.5 and 8.0 grams. Example … “What is the predicted height of suds for 5 g of soap?” Answer … -20.2+12.4×5=41.8 mm.
    7. Asked to predict height of suds for an amount of soap not between 3.5 and 8.0 grams. Example … “What is the predicted height of suds for 10 g of soap?”

Note: In the applied questions below …
  • The words “response”, “explanatory”, “y”, and “x” are not used at all. These words should be replaced with the actual variable names used in the problem.
  • The x-axis (explanatory) variable is increased by 1 unit when describing the slope. If the slope is positive then the y-axis (response) variable increases by the slope amount for each 1 unit increase of the explanatory variable. If the slope is negative then the y-axis (response) variable decreases by the slope amount (with the negative sign removed) for each 1 unit increase of the explanatory variable.
  • Interpret the y-intercept value even if it does not make sense, which will happen often as the y-intercept is often an extreme extrapolation.
  • When computing the residual, make sure to do the observed y-axis (response) value minus the predicted y-axis (response) value. About half the time this will result in a negative value. That is okay, it just means that the observed value was less than the predicted value (i.e., the observed point was below the best-fit line).
  • The words “proportion of variability explained” always imply using r2 as the answer.
  • The correlation coefficient is always the value of r. If you compute r by taking the square root of r2 then remember to put a negative sign on it if the slope (i.e., direction or association) is negative (your calculator will always return a positive value).
  • When assessing whether anything “concerns” you about the regression then you should address whether the form looks linear and if there is homoscedasticty. Non-linear forms will look obviously curved, heteroscedastic forms will look like a funnel (usually from narrow on the left to wider on the right). If you do not see a curve or a funnel, then explicitly say that you do not have any concerns about the lack of linearity or homoscedasticity.

Beach Sand

  1. The equation of the best-fit line is SAND = 0.16+0.053×ANGLE
  2. For each 1 degree increase in beach angle, the median sand diameter increases by 0.053 mm, on average.
  3. If there is 0 beach angle, then the median sand diameter is 0.16 mm, on average.
  4. This prediction should not be computed, because 15 degrees of beach angle is outside of the domain of the data used to define the relationship (i.e., this is an extrapolation).
  5. The predicted median sand diameter for a beach angle of 4 degrees is 0.372 mm.
  6. The residual for an observed median sand diameter of 0.2 mm and beach angle of 5 degrees is -0.225 mm.
  7. The correlation coefficient between median sand diameter and beach angle is -0.954.
  8. The proportion of variability in median sand diameter that is explained by knowing the angle of the beach is 0.910.
  9. I would expect the median sand diameter to increase by four slopes or 0.212 mm, on average, if the beach angle increased by 4 degrees.
  10. The relationship between median sand diameter and beach angle appears to be nonlinear (i.e., curved). I am not concerned about homoscedastiticty as no funnel-shape is apparent.

Everest Temperatures

  1. The equation of the best-fit line is AIRTEMP = 29.8-0.0059×ALTLAPSERATE
  2. If the altitude lapse rate is 0, then the mean air temperature is 29.8oC, on average.
  3. For each 1 degree increase in altitude lapse rate, the mean air temperature decreases by 0.0059oC, on average.
  4. The predicted mean air temperature for an altitude lapse rate of 4000 oC/km is 6.20oC.
  5. The residual for an observed mean air temperature of 0oC and altitude lapse rate of 3500oC/km is -9.15oC.
  6. The correlation coefficient between mean air temperature and altitude lapse rate is 0.961.
  7. I would expect the mean air temperature to decrease by 1000 slopes or 5.900oC, on average, if the altitude lapse rate increased by 1000oC/km.
  8. This prediction should not be computed, because 1000oC/km of altitude lapse rate is outside of the domain of the data used to define the relationship (i.e., this is an extrapolation).
  9. The proportion of variability in mean air temperature that is explained by knowing the altitude lapse rate is 0.923.
  10. I have no concerns as the relationship between mean air temperature and altitude lapse rate appears to be linear (no evident curve) and homoscedastic (no evident funnel).