Handle Factors in R

How to convert seemingly quantitative to categorical variables and how to reorder the levels of categorical variables.

Two common “issues” you will run into in this course is the need to turn a variable that R sees as quantitative into a categorical (or factor) variable and how to rearrange the order of levels within a categorical (or factor) variable. Handling both of these cases is described below.

Example data used here is from Mirex.csv [data, meta], which is loaded below.

Mirex <- read.csv("https://raw.githubusercontent.com/droglenc/NCData/master/Mirex.csv")

 

Convert Quantitative to Factor Variables

Quick View

Change a variable to a factor with factor(). For example,

Mirex$year <- factor(Mirex$year)

Explanation

At times a variable that represents groups – i.e., a categorical variable – will be entered with numeric values. For example, Mirex contains year which records the year the fish was captured. In this example, there are relatively few years, the years are not contiguous, and hypotheses will be used to determine if the mean response differs among years. Thus, year contains “groups” and should be treated as a categorical variable.

R, however, treats year as if it is an integer, or quantitative variable because it simply “see” numbers.

str(Mirex)
#R>   'data.frame': 122 obs. of  4 variables:
#R>    $ year   : int  1977 1977 1977 1977 1977 1977 1977 1977 1977 1977 ...
#R>    $ weight : num  0.41 0.45 1.04 1.09 1.24 1.25 1.3 1.34 1.37 1.49 ...
#R>    $ mirex  : num  0.16 0.19 0.19 0.1 0.13 0.19 0.28 0.16 0.17 0.2 ...
#R>    $ species: chr  "chinook" "chinook" "chinook" "coho" ...

year is forced to be a factor that defines groupings with factor(). When simply converting to a factor factor() only requires the variable (in dataframe$var format) as an argument. The result should be saved to a variable in the data frame. For example, year in Mirex is replaced with a factored version below.

Mirex$year <- factor(Mirex$year)

year is now a factor variable.

str(Mirex)
#R>   'data.frame': 122 obs. of  4 variables:
#R>    $ year   : Factor w/ 6 levels "1977","1982",..: 1 1 1 1 1 1 1 1 1 1 ...
#R>    $ weight : num  0.41 0.45 1.04 1.09 1.24 1.25 1.3 1.34 1.37 1.49 ...
#R>    $ mirex  : num  0.16 0.19 0.19 0.1 0.13 0.19 0.28 0.16 0.17 0.2 ...
#R>    $ species: chr  "chinook" "chinook" "chinook" "coho" ...

As a side note, it is seen in the Mirex structure that species is a character (chr) variable rather than a factor. In analyses in this course, character variables will ultimately be treated as factor variables so there is no need to explicitly convert them to factors.

 

Change Order of Levels

Quick View

Change order of levels by setting level order in levels= of factor(). For example,

Mirex$species <- factor(Mirex$species,levels=c("coho","chinook"))

Explanation

R treats the levels of categorical variables alphabetically unless a specific order was set. For example, the levels for species are

unique(Mirex$species)
#R>   [1] "chinook" "coho"

Here R will treat chinook as the “first level” because it alphabetically precedes coho. This default order is evident in tables and figures; e.g.,

xtabs(~species,data=Mirex)
#R>   species
#R>   chinook    coho 
#R>        67      55

In some analyses, groups will need to be in a different order. The order of levels is controlled by setting the specific order with levels= in factor(). For example, the order of the levels in species is changed below.

Mirex$species <- factor(Mirex$species,levels=c("coho","chinook"))

This change in order will then be evident in subsequent tables and figures.

xtabs(~species,data=Mirex)
#R>   species
#R>      coho chinook 
#R>        55      67

If a variable has more groups then the list of groups in levels= will simply be longer.