Note:
  • The use of qualifiers such as “strongly” or “slightly” when describing shape. The use of these qualifiers is important for deciding which measures of center and dispersion should be used (see below). However, also note that you will get some “leeway” in grading relative to their use (i.e., you and I may disagree whether something is slightly or moderately skewed, though we should not disagree on whether a distribution is slightly or strongly skewed).

Shape and Outliers I

  1. Strongly right skewed, possible outlier at maximum.
  2. Moderately left skewed, no outliers.
  3. Strongly right skewed, outlier at maximum.
  4. Approximately symmetric, no outliers.
  5. Approximately symmetric, no outliers.
  6. Strongly left skewed, no outliers.
  7. Moderately left skewed, outlier at minimum.
  8. Bimodal symmetric, no outliers.


Note (for the QUANTITATIVE EDA questions below):
  • That all four items (Shape, Outliers, Center, and Dispersion) are addressed for each question. Many students may omit outliers if no outliers exist; however, you should explicitly say “no outliers were present” (or similar) as shown below.
  • Center is always described by the mean or the median, never both and never neither one. Which one you choose depends on whether outliers are present (use median), the distriution is more than slightly skewed (use median), or the distribution is no more than slighly skewed and no outliers are present (use mean).
  • Dispersion is always described by the standard deviation or IQR, never both and never neither one. Additionally, you can use the range but never by itself (i.e., it should be used in conjunction with the standard deviation or IQR). Whether you use the standard deviation or IQR depends on whether you used the mean (then use the standard deviation) or median (then use the IQR).
  • Include a sentence in each description that describes why you used the mean and standard deviation or used the median and IQR to describe center and dispersion. See answers below for examples.

Commute Time

The distribution of commute times is approximately symmetric with no outliers (Figure 1). The center as measured by the mean is 23.75 mins and the disperision as measured by the standard deviation is 3.54 mins (Table 1). The mean and standard deviation were used because the distribution was symmetric and no outliers were present.


Dungeness Crabs

The distribution of post-molt carapace lengths for Dungeness Crabs is strongly left-skewed with two outliers between 38.8 and 60 mm (Figure 2). The center as measured by the median is 147.4 mm and the dispersion as measured by the IQR is from a Q1 of 138.0 to a Q3 of 153.4 mm (Table 2). The median and IQR were used because the distribution was strongly skewed and outliers were present.



Note (for the CATEGORICAL EDA questions below):
  • You DO NOT describe shape, outliers, center, or dispersion for categorical data.
  • You do make a conclusive statement about the main “take-away” messages found in the bar chart of frequency/percentage tables.
  • You do NOT simply repeat the values present in the tables or bar chart (you can use values to support your conclusion, but you should not simply repeat each value in the results).

Forest Fires

Fires occurred most often in late summer or early fall (August and September) with a smaller peak in late winter or early spring (March). Fires are infrequent in most other months.


Fate of Plastics

In 2015, most plastics were discarded with approximately equal amounts incinerated and recycled. There was a substantial decrease in the percentage of plastics that were discarded from 1995 to 2015, with a concomitant increase in the percentages that were incinerated and recycled.