Definitions

Statistical Definitions

The following are definitions for general statistical words. See further below for R-specific definitions.

Word Definition

Accuracy: The tendency of a statistic to come close to the parameter it was intended to estimate.
Alternative Hypothesis: A statistical hypothesis that states that there is a difference between a parameter and a specific value or between two parameters.
Bimodal: The shape of a distribution with two peaks or "humps."
Bivariate: Examining two variables.
Coefficient of Determination: The proportion of the total variability in the response variable that is explained away by knowing the explanatory variable and the best-fit model.
Continuous: A quantitative variable that can assume an uncountable number of values.
Convenience: A sample of individuals who are easiest to reach for the researcher.
Dependent: See response variable.
Discrete: A quantitative variable that can assume a countable number of values.
Factor(s): In an experiment, the variable(s) that is (are) deliberately manipulated to determine its effect on the response variable.
Independent: See explanatory variable.
Individual: One of the items examined by the researcher.
Inference: The process of forming conclusions about the unknown parameters of a population by computing statistics from the individuals in a sample.
Inter-Quartile Range (IQR): The difference between the third (Q3) and first (Q1) quartiles.
Intercept: The value of the response variable when the explanatory variable is equal to zero.
Left-Skewed: The left-tail of a distribution is longer or more drawn out than the right-tail.
Levels: In an experiment, the number of categories or groupings of the factor.
Mean: The center of gravity or balance point of the data, i.e., the sum of the data divided by the number of individuals.
Median: The midpoint of the data, i.e., the value of the individual in the position that splits the ordered list of individuals into two equal-sized halves.
Mode: The value or class of values that occurs most often in a data set.
Multivariate: Examining more than two variables.
Natural Variability: The fact that no two individuals are exactly alike.
Nominal: A categorical variable for which a natural order DOES NOT exist among the categories.
Null Hypothesis: A statistical hypothesis that states that there is no difference between a parameter and a specific value or between two parameters.
Ordinal: A categorical variable for which a natural order exists among the categories.
Outlier: An individual whose value is widely separated from the main cluster of values in the sample.
p-value: The probability of the observed statistic or a value of the statistic more extreme assuming the null hypothesis is true.
Parameter: A summary of all individuals in a population.
Population: ALL individuals of interest.
Precision: The tendency to have values clustered closely together. Precision is inversely related to the standard error – the smaller the standard error, the greater the precision.
Quartiles: The values that divide the ordered data into quarters.
Range: The difference between the maximum and minimum value in a data set.
Replicates: In an experiment, the number of individuals in each treatment group.
Research Hypothesis: A general statement about the question or phenomenon being tested.
Residual: The difference between the observed and predicted values of the response variable for an individual. In regression, the vertical difference between the observed and predicted values of the response variable for an individual.
Response: The variable to be predicted or explained.
Right-Skewed: The right-tail of a distribution is longer or more drawn out than the left-tail.
Sample: A subset of the population examined by a researcher.
Sampling Distribution: The distribution of the values of a particular statistic computed from all possible samples of the same size from the same population.
Sampling Variability: The fact that the results (i.e., statistics) from different samples (of the same population) are different.
Simple Random: A probability-based sample where each individual of the population has the same chance of being selected for the sample. Usually abbreviated as SRS.
Slope: The change in value of the response variable for a unit change in value of the explanatory variable.
Standard Deviation: "Essentially" the average deviation or difference of individuals from the mean.
Standard Error: The numerical measure of dispersion used for sampling distributions – i.e., measures the dispersion among statistics from all possible samples.
Statistic: A summary of all individuals in a sample.
Statistics: As a field of study ... The science of collecting, organizing, and interpreting numerical information or data.
Symmetric: The left- and right-tail of a distribution are nearly the same in length and height.
Treatments: In an experiment, he number of combinations of all factors in the experiment.
Unbiased: For statistics, a statistic in which the center of its sampling distribution equals the parameter it is intended to estimate. For samples, a sample that does not systematicall over- or under-represent portions of the population.
Univariate: Examining one variable.
Variable: The characteristic of interest recorded about each individual.
Voluntary Response: A sample of individuals that choose themselves for the sample by responding to a general appeal.

R Definitions

The following definitions are related to R.

Word Definition

Argument: A "directive" provided within the parentheses of a function.
data.frame: A two-dimensional organization of variables (as columns, possibly of different data types) recorded on multiple individuals (as rows).
Factor(s): A special type of variable that identifies the group to which an individual belongs.
Function: In R, a program that performs a particular task.
Vector: A one-dimensional list of items of the same data type.