Module 7 Wrangle Rows

The previous module demonstrated methods for wrangling columns (i.e., variables) in a data frame. In this module, methods to wrangle rows, which with tidy data are observations, in a data frame are introduced. The primary method of interest here is selecting a smaller subset of rows (i.e., filtering) for further analysis.

Manipulating rows means you are manipulating observations.

Again the descriptive examples below will use the bears data frame from Section 3.3.1.

bears <- read_csv(file.path("data","Bears.csv"))
bears
#R>  # A tibble: 8 x 3
#R>    length.cm weight.kg loc     
#R>        <dbl>     <dbl> <chr>   
#R>  1      139        110 Bayfield
#R>  2      120.        60 Bayfield
#R>  3      149         85 Bayfield
#R>  4      141        100 Ashland 
#R>  5      141         95 Ashland 
#R>  6      150         85 Douglas 
#R>  7      130.       105 Douglas 
#R>  8      150        110 Douglas

The code here will also use the pipe operator, %>%, to again help you become more comfortable with its use.

 

7.1 Selecting Specific Rows

Specific rows may be selected or omitted from a data frame using slice(). Below are four simple examples.

bears %>% slice(1)                # First row
#R>  # A tibble: 1 x 3
#R>    length.cm weight.kg loc     
#R>        <dbl>     <dbl> <chr>   
#R>  1       139       110 Bayfield
bears %>% slice(c(1,3,5))         # First, third, and fifth rows
#R>  # A tibble: 3 x 3
#R>    length.cm weight.kg loc     
#R>        <dbl>     <dbl> <chr>   
#R>  1       139       110 Bayfield
#R>  2       149        85 Bayfield
#R>  3       141        95 Ashland
bears %>% slice(-1)               # All but the first row
#R>  # A tibble: 7 x 3
#R>    length.cm weight.kg loc     
#R>        <dbl>     <dbl> <chr>   
#R>  1      120.        60 Bayfield
#R>  2      149         85 Bayfield
#R>  3      141        100 Ashland 
#R>  4      141         95 Ashland 
#R>  5      150         85 Douglas 
#R>  6      130.       105 Douglas 
#R>  7      150        110 Douglas
bears %>% slice(-c(1,3,5))        # All but the first, third, and fifth rows
#R>  # A tibble: 5 x 3
#R>    length.cm weight.kg loc     
#R>        <dbl>     <dbl> <chr>   
#R>  1      120.        60 Bayfield
#R>  2      141        100 Ashland 
#R>  3      150         85 Douglas 
#R>  4      130.       105 Douglas 
#R>  5      150        110 Douglas

Rows from the beginning (i.e., the “head”) or end (i.e., the “tail”) of the data frame may also be selected with slice_head() or slice_tail(), respectively. You may select a certain number of rows with n= or an approximate proportion of rows with prop=. Below are four examples.

bears %>% slice_head(n=3)         # First three rows
#R>  # A tibble: 3 x 3
#R>    length.cm weight.kg loc     
#R>        <dbl>     <dbl> <chr>   
#R>  1      139        110 Bayfield
#R>  2      120.        60 Bayfield
#R>  3      149         85 Bayfield
bears %>% slice_head(prop=0.33)   # Approx. first 33% of rows
#R>  # A tibble: 2 x 3
#R>    length.cm weight.kg loc     
#R>        <dbl>     <dbl> <chr>   
#R>  1      139        110 Bayfield
#R>  2      120.        60 Bayfield
bears %>% slice_tail(n=3)         # Last three rows
#R>  # A tibble: 3 x 3
#R>    length.cm weight.kg loc    
#R>        <dbl>     <dbl> <chr>  
#R>  1      150         85 Douglas
#R>  2      130.       105 Douglas
#R>  3      150        110 Douglas
bears %>% slice_tail(prop=0.33)   # Approx. last 33% of rows
#R>  # A tibble: 2 x 3
#R>    length.cm weight.kg loc    
#R>        <dbl>     <dbl> <chr>  
#R>  1      130.       105 Douglas
#R>  2      150        110 Douglas

Finally a random sample of rows from the data frame may be selected with slice_sample(), again either using n= or prop=.

bears %>% slice_sample(n=3)       # 3 random rows
#R>  # A tibble: 3 x 3
#R>    length.cm weight.kg loc    
#R>        <dbl>     <dbl> <chr>  
#R>  1      141        100 Ashland
#R>  2      141         95 Ashland
#R>  3      130.       105 Douglas
bears %>% slice_sample(prop=0.33) # Random approx. 33% rows. 
#R>  # A tibble: 2 x 3
#R>    length.cm weight.kg loc     
#R>        <dbl>     <dbl> <chr>   
#R>  1       150       110 Douglas 
#R>  2       139       110 Bayfield

Use the slice() family of functions to select specific (or random) rows from a data frame.

7.2 Filtering Rows

Observations or rows can be selected from a data frame with filter(). The directive arguments to filter() are conditional expressions describing which observations from the data frame to maintain. Common operators used in these conditional expressions are in Table 7.1. The filter() function works by evaluating the condition to either TRUE or FALSE and then returning all rows that evaluated to TRUE.

 

Table 7.1: Comparison operators used in filterD() and their results. Note that var generically represents a variable in the original data frame and value is a generic value or level. Both var and val would be replaced with specific items (see examples in main text).
Comparison Operator Rows Returned from Original Data Frame
var==value All rows where var IS equal to value
var!=value All rows where var is NOT equal to value
var %in% c(value1,value2) All rows where var IS IN (or one of the) vector of values27
var>value All rows where var is greater than value28
var>=value All rows where var is greater than or equal to value29
var<value All rows where var is less than value30
var<=value All rows where var is less than or equal to value31
condition1,condition2 All rows where BOTH conditions are true
condition1 | condition2 All rows where ONE or BOTH conditions are true32

 

The following are examples of new data frames created from bears. The name of the new data frame (i.e., object left of the assignment operator) is tmp (for temporary) in each example below because there is no plan to use these data frames further.

  • Only observations from Bayfield county.
tmp <- bears %>% filter(loc=="Bayfield")
tmp
#R>  # A tibble: 3 x 3
#R>    length.cm weight.kg loc     
#R>        <dbl>     <dbl> <chr>   
#R>  1      139        110 Bayfield
#R>  2      120.        60 Bayfield
#R>  3      149         85 Bayfield
  • Observations from both Bayfield and Ashland counties.
tmp <- bears %>% filter(loc %in% c("Bayfield","Ashland"))
tmp
#R>  # A tibble: 5 x 3
#R>    length.cm weight.kg loc     
#R>        <dbl>     <dbl> <chr>   
#R>  1      139        110 Bayfield
#R>  2      120.        60 Bayfield
#R>  3      149         85 Bayfield
#R>  4      141        100 Ashland 
#R>  5      141         95 Ashland
  • Observations NOT from Bayfield county.
tmp <- bears %>% filter(loc != "Bayfield")
tmp
#R>  # A tibble: 5 x 3
#R>    length.cm weight.kg loc    
#R>        <dbl>     <dbl> <chr>  
#R>  1      141        100 Ashland
#R>  2      141         95 Ashland
#R>  3      150         85 Douglas
#R>  4      130.       105 Douglas
#R>  5      150        110 Douglas
  • Observations with a weight greater than 100 kg.
tmp <- bears %>% filter(weight.kg>100)
tmp
#R>  # A tibble: 3 x 3
#R>    length.cm weight.kg loc     
#R>        <dbl>     <dbl> <chr>   
#R>  1      139        110 Bayfield
#R>  2      130.       105 Douglas 
#R>  3      150        110 Douglas
  • Observations from Douglas County that weighed at least 110 kg.
tmp <- bears %>% filter(loc=="Douglas",weight.kg>=110)
tmp
#R>  # A tibble: 1 x 3
#R>    length.cm weight.kg loc    
#R>        <dbl>     <dbl> <chr>  
#R>  1       150       110 Douglas

The last example above illustrates that multiple conditional expressions in filter() are combined as an “and” operator such that both conditions must be true.

Use filter() to select rows from a data frame that match a logical condition.

It is good practice to examine a data frame after filtering to be sure that the new data frame contains the observations that you want. The data frames above are so small that you can simply and easily examine the entire data frame. However, this will not be the case with more realistic larger data frames. Thus, I suggest the following methods for “checking your filtering.”

  • Simply display the data frame or the structure of the data frame to identify any obvious issues. For example, the code below should return all bears from Douglas County with a weight greater than 150 kg. Showing the data frame or the structure of the data frame both show that this data frame contains no data.
tmp <- bears %>% filter(loc=="Douglas",weight.kg>=150)
tmp
#R>  # A tibble: 0 x 3
#R>  # ... with 3 variables: length.cm <dbl>, weight.kg <dbl>, loc <chr>
str(tmp,give.attr=FALSE)
#R>  spec_tbl_df [0 x 3] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
#R>   $ length.cm: num(0) 
#R>   $ weight.kg: num(0) 
#R>   $ loc      : chr(0)
  • If you filter with respect to a categorical variable then use unique() with that categorical variable to examine its levels. For example, the filter below is expected to return observations for just Ashland and Bayfield counties. The use of unique() supports that this is what was returned.
tmp <- bears %>% filter(loc %in% c("Bayfield","Ashland"))
unique(tmp$loc)
#R>  [1] "Bayfield" "Ashland"
  • If you filter with respect to a quantitative variable then use summary() with that quantitative variable to examine its summary statistics. For example the fitler below is expected to return observations for lengths between 130 and 145 cm. The minimum and maximum values in the summary() results support that is what was returned.
tmp <- bears %>% filter(length.cm>130,length.cm<145)
summary(tmp$length.cm)
#R>     Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
#R>    139.0   140.0   141.0   140.3   141.0   141.0

Examine the new data frame after filtering to ensure that it contains the observations you intended.

7.3 Arranging Rows

The arrange() function is used to sort rows based on values in one or more variables.33 The default is ascending order. To sort in descending order then wrap the variable name in desc(). If more than one variable is given then the rows are first sorted based on the first variable and then ties in the first variable are sorted based on the second variable. Examples of sorting are shown below.

  • Alphabetically sort bears by location name.
bears <- bears %>% arrange(loc)
bears
#R>  # A tibble: 8 x 3
#R>    length.cm weight.kg loc     
#R>        <dbl>     <dbl> <chr>   
#R>  1      141        100 Ashland 
#R>  2      141         95 Ashland 
#R>  3      139        110 Bayfield
#R>  4      120.        60 Bayfield
#R>  5      149         85 Bayfield
#R>  6      150         85 Douglas 
#R>  7      130.       105 Douglas 
#R>  8      150        110 Douglas
  • Sort bears from heaviest to lightest.
bears <- bears %>% arrange(desc(weight.kg))
bears
#R>  # A tibble: 8 x 3
#R>    length.cm weight.kg loc     
#R>        <dbl>     <dbl> <chr>   
#R>  1      139        110 Bayfield
#R>  2      150        110 Douglas 
#R>  3      130.       105 Douglas 
#R>  4      141        100 Ashland 
#R>  5      141         95 Ashland 
#R>  6      149         85 Bayfield
#R>  7      150         85 Douglas 
#R>  8      120.        60 Bayfield
  • Sort bears from heaviest to lightest within each location.
bears <- bears %>% arrange(loc,desc(weight.kg))
bears
#R>  # A tibble: 8 x 3
#R>    length.cm weight.kg loc     
#R>        <dbl>     <dbl> <chr>   
#R>  1      141        100 Ashland 
#R>  2      141         95 Ashland 
#R>  3      139        110 Bayfield
#R>  4      149         85 Bayfield
#R>  5      120.        60 Bayfield
#R>  6      150        110 Douglas 
#R>  7      130.       105 Douglas 
#R>  8      150         85 Douglas
  • sort bears by size, first by length and then by weight.
bears <- bears %>% arrange(length.cm,weight.kg)
bears
#R>  # A tibble: 8 x 3
#R>    length.cm weight.kg loc     
#R>        <dbl>     <dbl> <chr>   
#R>  1      120.        60 Bayfield
#R>  2      130.       105 Douglas 
#R>  3      139        110 Bayfield
#R>  4      141         95 Ashland 
#R>  5      141        100 Ashland 
#R>  6      149         85 Bayfield
#R>  7      150         85 Douglas 
#R>  8      150        110 Douglas

Use arrange() to sort rows in a data frame by the value(s) of variable(s).

7.4 Appending Rows

Two data frames can be combined with bind_rows() IF they have the same column names and classes. For example suppose that two other data frames exist – bears2 has more information about bears and bobcats has similar information about bobcats.

bears2
#R>    length.cm weight.kg  loc
#R>  1       135       100 Iron
#R>  2       142       115 Iron
#R>  3       143       110 Iron
bobcats
#R>    length.cm weight.kg      loc
#R>  1        75       6.2  Douglas
#R>  2        82       8.1  Douglas
#R>  3        71       7.4 Bayfield
#R>  4        79       7.6  Douglas

The code below appends the bears2 data frame to the bottom of the bears data frame and then, for demonstration purposes, orders the bears by size.

newbears <- bind_rows(bears,bears2) %>%
  arrange(length.cm,weight.kg)
newbears
#R>  # A tibble: 11 x 3
#R>     length.cm weight.kg loc     
#R>         <dbl>     <dbl> <chr>   
#R>   1      120.        60 Bayfield
#R>   2      130.       105 Douglas 
#R>   3      135        100 Iron    
#R>   4      139        110 Bayfield
#R>   5      141         95 Ashland 
#R>   6      141        100 Ashland 
#R>   7      142        115 Iron    
#R>   8      143        110 Iron    
#R>   9      149         85 Bayfield
#R>  10      150         85 Douglas 
#R>  11      150        110 Douglas

The same could be done with the bears and bobcats data frames but there will be no way to then tell which observations are for bears and which are for bobcats. This deficiency can be overcome by giving names to the data frames within bind_rows() and giving a variable name to .id= for the new variable that will identify the groups. For example,

animals <- bind_rows("bear"=bears,"bobcat"=bobcats,.id="animal")
animals
#R>  # A tibble: 12 x 4
#R>     animal length.cm weight.kg loc     
#R>     <chr>      <dbl>     <dbl> <chr>   
#R>   1 bear        120.      60   Bayfield
#R>   2 bear        130.     105   Douglas 
#R>   3 bear        139      110   Bayfield
#R>   4 bear        141       95   Ashland 
#R>   5 bear        141      100   Ashland 
#R>   6 bear        149       85   Bayfield
#R>   7 bear        150       85   Douglas 
#R>   8 bear        150      110   Douglas 
#R>   9 bobcat       75        6.2 Douglas 
#R>  10 bobcat       82        8.1 Douglas 
#R>  11 bobcat       71        7.4 Bayfield
#R>  12 bobcat       79        7.6 Douglas

Note that more than two data frames can be combined with bind_rows().

Use bind_rows() to combine two (or more) data frames that have the same variables (i.e., columns).

7.5 Examples in Context

7.5.1 NBA Players

In Section 6.7.1 the players2 data frame was created that showed the starting year, ending year, total years played, and whether the player was from the “modern” era or not for all NBA players.

players2
#R>  # A tibble: 4,393 x 5
#R>     name                 start   end years_played modern
#R>     <chr>                <dbl> <dbl>        <dbl> <chr> 
#R>   1 Willis, Kevin         1984  2006           22 yes   
#R>   2 Jones, Mark           1983  2004           21 yes   
#R>   3 Carter, Vince         1998  2018           20 yes   
#R>   4 Garnett, Kevin        1995  2015           20 yes   
#R>   5 Nowitzki, Dirk        1998  2018           20 yes   
#R>   6 Parish, Robert        1976  1996           20 yes   
#R>   7 Abdul-Jabbar, Kareem  1969  1988           19 yes   
#R>   8 Bryant, Kobe          1996  2015           19 yes   
#R>   9 Cousy, Bob            1950  1969           19 no    
#R>  10 Crawford, Jamal       2000  2018           18 yes   
#R>  # ... with 4,383 more rows

The graph shown in that same section was for all players with more than 18 years in the NBA. The data frame for that graph is constructed below.

nba_gt18 <- players2 %>% filter(years_played>18)
nba_gt18
#R>  # A tibble: 9 x 5
#R>    name                 start   end years_played modern
#R>    <chr>                <dbl> <dbl>        <dbl> <chr> 
#R>  1 Willis, Kevin         1984  2006           22 yes   
#R>  2 Jones, Mark           1983  2004           21 yes   
#R>  3 Carter, Vince         1998  2018           20 yes   
#R>  4 Garnett, Kevin        1995  2015           20 yes   
#R>  5 Nowitzki, Dirk        1998  2018           20 yes   
#R>  6 Parish, Robert        1976  1996           20 yes   
#R>  7 Abdul-Jabbar, Kareem  1969  1988           19 yes   
#R>  8 Bryant, Kobe          1996  2015           19 yes   
#R>  9 Cousy, Bob            1950  1969           19 no

It might be interesting to see who started in the NBA in the year of your college graduation (using mine below).

nba_grad1 <- players2 %>% filter(start==1989)
nba_grad1
#R>  # A tibble: 81 x 5
#R>     name               start   end years_played modern
#R>     <chr>              <dbl> <dbl>        <dbl> <chr> 
#R>   1 Robinson, Clifford  1989  2006           17 yes   
#R>   2 Divac, Vlade        1989  2004           15 yes   
#R>   3 Barros, Dana        1989  2003           14 yes   
#R>   4 Rice, Glen          1989  2003           14 yes   
#R>   5 Anderson, Nick      1989  2002           13 yes   
#R>   6 Hardaway, Tim       1989  2002           13 yes   
#R>   7 Kemp, Shawn         1989  2002           13 yes   
#R>   8 Mason, Anthony      1989  2002           13 yes   
#R>   9 McCloud, George     1989  2002           13 yes   
#R>  10 Robinson, David     1989  2002           13 yes   
#R>  # ... with 71 more rows

Perhaps those that started in the year of your graduation and played for more than a decade.

nba_grad2 <- players2 %>% filter(start==1989,years_played>10)
nba_grad2
#R>  # A tibble: 21 x 5
#R>     name               start   end years_played modern
#R>     <chr>              <dbl> <dbl>        <dbl> <chr> 
#R>   1 Robinson, Clifford  1989  2006           17 yes   
#R>   2 Divac, Vlade        1989  2004           15 yes   
#R>   3 Barros, Dana        1989  2003           14 yes   
#R>   4 Rice, Glen          1989  2003           14 yes   
#R>   5 Anderson, Nick      1989  2002           13 yes   
#R>   6 Hardaway, Tim       1989  2002           13 yes   
#R>   7 Kemp, Shawn         1989  2002           13 yes   
#R>   8 Mason, Anthony      1989  2002           13 yes   
#R>   9 McCloud, George     1989  2002           13 yes   
#R>  10 Robinson, David     1989  2002           13 yes   
#R>  # ... with 11 more rows

Perhaps we want to find those that were playing during the year of your graduation.

nba_grad3 <- players2 %>% filter(start<=1989,end>=1989)
nba_grad3
#R>  # A tibble: 409 x 5
#R>     name            start   end years_played modern
#R>     <chr>           <dbl> <dbl>        <dbl> <chr> 
#R>   1 Willis, Kevin    1984  2006           22 yes   
#R>   2 Jones, Mark      1983  2004           21 yes   
#R>   3 Parish, Robert   1976  1996           20 yes   
#R>   4 Edwards, James   1977  1995           18 yes   
#R>   5 Jordan, Michael  1984  2002           18 yes   
#R>   6 Long, John       1978  1996           18 yes   
#R>   7 Mahorn, Rick     1980  1998           18 yes   
#R>   8 Malone, Karl     1985  2003           18 yes   
#R>   9 Malone, Moses    1976  1994           18 yes   
#R>  10 Oakley, Charles  1985  2003           18 yes   
#R>  # ... with 399 more rows

Perhaps we want to find those whose name was “Jordan.” This gets a bit tricky because name is formatted as Lastname, Firstname. In this case we want to find all instances where “Jordan” is somewhere in name. This will require a function that won’t be formally introduced until Module 11. This new function is called grepl() and it takes a string to find as the first argument and the name of character vector in which to look for that string as the second argument. It will return TRUE if the string is found in the vector or FALSE if it is not. Below is a quick, simple example.

test <- c("Ogle, Derek","Kim, Young","Jordan, Michael","Farmar, Jordan")
grepl("Jordan",test)
#R>  [1] FALSE FALSE  TRUE  TRUE

Here TRUE was returned only for the last two elements because they were the only two elements that contained “Jordan.”

Because filter() works by returning the rows that evaluate to TRUE, the grepl() code can be put in the place of the condition. For example, the code below returns all rows of players2 where name contains “Jordan.”

nba_jordans <- players2 %>% filter(grepl("Jordan",name))
nba_jordans
#R>  # A tibble: 20 x 5
#R>     name             start   end years_played modern
#R>     <chr>            <dbl> <dbl>        <dbl> <chr> 
#R>   1 Jordan, Michael   1984  2002           18 yes   
#R>   2 Farmar, Jordan    2006  2016           10 yes   
#R>   3 Jordan, DeAndre   2008  2018           10 yes   
#R>   4 Crawford, Jordan  2010  2017            7 yes   
#R>   5 Hill, Jordan      2009  2016            7 yes   
#R>   6 Jordan, Eddie     1977  1983            6 yes   
#R>   7 Jordan, Reggie    1993  1999            6 yes   
#R>   8 Jordan, Adonis    1993  1998            5 yes   
#R>   9 Clarkson, Jordan  2014  2018            4 yes   
#R>  10 Hamilton, Jordan  2011  2015            4 yes   
#R>  11 Jordan, Jerome    2011  2014            3 yes   
#R>  12 McRae, Jordan     2015  2018            3 yes   
#R>  13 Mickey, Jordan    2015  2017            2 yes   
#R>  14 Adams, Jordan     2014  2015            1 yes   
#R>  15 Bell, Jordan      2017  2018            1 yes   
#R>  16 Jordan, Thomas    1992  1992            0 yes   
#R>  17 Jordan, Walter    1980  1980            0 yes   
#R>  18 Loyd, Jordan      2018  2018            0 yes   
#R>  19 Sibert, Jordan    2018  2018            0 yes   
#R>  20 Williams, Jordan  2011  2011            0 yes

However, given the format of the data in name we need to be a little tricky to get all last names that are “Jordan” (note the extra comma on “Jordan,” below).

nba_jordans2 <- players2 %>% filter(grepl("Jordan,",name))
nba_jordans2
#R>  # A tibble: 8 x 5
#R>    name            start   end years_played modern
#R>    <chr>           <dbl> <dbl>        <dbl> <chr> 
#R>  1 Jordan, Michael  1984  2002           18 yes   
#R>  2 Jordan, DeAndre  2008  2018           10 yes   
#R>  3 Jordan, Eddie    1977  1983            6 yes   
#R>  4 Jordan, Reggie   1993  1999            6 yes   
#R>  5 Jordan, Adonis   1993  1998            5 yes   
#R>  6 Jordan, Jerome   2011  2014            3 yes   
#R>  7 Jordan, Thomas   1992  1992            0 yes   
#R>  8 Jordan, Walter   1980  1980            0 yes

We also need to be a little tricky to get all first names that are “Jordan.”

nba_jordans3 <- players2 %>% filter(grepl(", Jordan",name))
nba_jordans3
#R>  # A tibble: 12 x 5
#R>     name             start   end years_played modern
#R>     <chr>            <dbl> <dbl>        <dbl> <chr> 
#R>   1 Farmar, Jordan    2006  2016           10 yes   
#R>   2 Crawford, Jordan  2010  2017            7 yes   
#R>   3 Hill, Jordan      2009  2016            7 yes   
#R>   4 Clarkson, Jordan  2014  2018            4 yes   
#R>   5 Hamilton, Jordan  2011  2015            4 yes   
#R>   6 McRae, Jordan     2015  2018            3 yes   
#R>   7 Mickey, Jordan    2015  2017            2 yes   
#R>   8 Adams, Jordan     2014  2015            1 yes   
#R>   9 Bell, Jordan      2017  2018            1 yes   
#R>  10 Loyd, Jordan      2018  2018            0 yes   
#R>  11 Sibert, Jordan    2018  2018            0 yes   
#R>  12 Williams, Jordan  2011  2011            0 yes

It is intersting that there were no players in the NBA with the first name “Jordan” before Michael Jordan (the greatest player of all time) retired.

 

7.5.2 Wolves and Moose of Isle Royale

In Section 6.7.2 the irmw2 data frame was created for use in the graphing course.

irmw2
#R>  # A tibble: 61 x 6
#R>      year era   wolves moose winter_temp ice_bridges
#R>     <dbl> <chr>  <dbl> <dbl>       <dbl> <chr>      
#R>   1  1959 early     20  538.        1.4  no         
#R>   2  1960 early     22  564.        8.45 no         
#R>   3  1961 early     22  572.        9.75 yes        
#R>   4  1962 early     23  579.        2.15 yes        
#R>   5  1963 early     20  596.       -0.35 yes        
#R>   6  1964 early     26  620.       12.4  no         
#R>   7  1965 early     28  634.        1.25 yes        
#R>   8  1966 early     26  661.        1.7  yes        
#R>   9  1967 early     22  766.        2.75 yes        
#R>  10  1968 early     22  848.        5.85 yes        
#R>  # ... with 51 more rows

One of the things we did in that class is focus on the “early” years of the wolf-moose time series. Such a data frame is create below.

irmw_early <- irmw2 %>% filter(era=="early")

Other things that we could do with these data are …

  • Find years where the moose population was more than 1500 animals.
tmp <- irmw2 %>% filter(moose>1500)
tmp
#R>  # A tibble: 7 x 6
#R>     year era    wolves moose winter_temp ice_bridges
#R>    <dbl> <chr>   <dbl> <dbl>       <dbl> <chr>      
#R>  1  1992 middle     12 1697.        15.2 no         
#R>  2  1993 middle     13 1784.         9.1 no         
#R>  3  1994 middle     17 2017.        -0.4 yes        
#R>  4  1995 middle     16 2117.         9.5 no         
#R>  5  1996 middle     22 2398.         2.8 yes        
#R>  6  2017 recent      2 1600         15.4 no         
#R>  7  2019 recent     15 2060          3.5 yes
  • Find years where the wolf population was less than 10 animals.
tmp <- irmw2 %>% filter(wolves<10)
tmp
#R>  # A tibble: 7 x 6
#R>     year era    wolves moose winter_temp ice_bridges
#R>    <dbl> <chr>   <dbl> <dbl>       <dbl> <chr>      
#R>  1  2012 recent      9   750       17.2  no         
#R>  2  2013 recent      8   975        8.95 no         
#R>  3  2014 recent      9  1050       -1.05 yes        
#R>  4  2015 recent      3  1250        3.85 yes        
#R>  5  2016 recent      2  1300       12.0  no         
#R>  6  2017 recent      2  1600       15.4  no         
#R>  7  2018 recent      2  1475        6.05 yes
  • Find years where the wolf population was less than 10 animals and the moose population was greater than 1500 animals.
tmp <- irmw2 %>% filter(wolves<10,moose>1500)
tmp
#R>  # A tibble: 1 x 6
#R>     year era    wolves moose winter_temp ice_bridges
#R>    <dbl> <chr>   <dbl> <dbl>       <dbl> <chr>      
#R>  1  2017 recent      2  1600        15.4 no
  • Find years where ice bridges formed.
tmp <- irmw2 %>% filter(ice_bridges=="yes")
tmp
#R>  # A tibble: 26 x 6
#R>      year era   wolves moose winter_temp ice_bridges
#R>     <dbl> <chr>  <dbl> <dbl>       <dbl> <chr>      
#R>   1  1961 early     22  572.        9.75 yes        
#R>   2  1962 early     23  579.        2.15 yes        
#R>   3  1963 early     20  596.       -0.35 yes        
#R>   4  1965 early     28  634.        1.25 yes        
#R>   5  1966 early     26  661.        1.7  yes        
#R>   6  1967 early     22  766.        2.75 yes        
#R>   7  1968 early     22  848.        5.85 yes        
#R>   8  1969 early     17 1041.        7.8  yes        
#R>   9  1970 early     18 1045.        3.35 yes        
#R>  10  1971 early     20 1183.        3.2  yes        
#R>  # ... with 16 more rows
  • Find years in the 1970s decade and order by the descending number of wolves.
tmp <- irmw2 %>%
  filter(year>=1970,year<1980) %>%
  arrange(desc(wolves))
tmp
#R>  # A tibble: 10 x 6
#R>      year era    wolves moose winter_temp ice_bridges
#R>     <dbl> <chr>   <dbl> <dbl>       <dbl> <chr>      
#R>   1  1976 middle     44 1070.       10.4  no         
#R>   2  1979 middle     43  857.       -1.15 yes        
#R>   3  1975 middle     41 1139.        9.05 no         
#R>   4  1978 middle     40  845.        3.25 no         
#R>   5  1977 middle     34  949.        4.7  yes        
#R>   6  1974 early      31 1203.        5.65 yes        
#R>   7  1973 early      24 1215.       10.8  no         
#R>   8  1972 early      23 1243.       -0.05 yes        
#R>   9  1971 early      20 1183.        3.2  yes        
#R>  10  1970 early      18 1045.        3.35 yes
  • Find years in the 1980s decade, order by the descending number of wolves, and show only the top three years (i.e., most wolves).
tmp <- irmw2 %>%
  filter(year>=1980,year<1990) %>%
  arrange(desc(wolves)) %>%
  slice_head(n=3)
tmp
#R>  # A tibble: 3 x 6
#R>     year era    wolves moose winter_temp ice_bridges
#R>    <dbl> <chr>   <dbl> <dbl>       <dbl> <chr>      
#R>  1  1980 middle     50  788.        7.15 no         
#R>  2  1981 middle     30  767.       11.7  no         
#R>  3  1984 middle     24  927.       12.5  no

  1. value should be a character, factor, or integer.↩︎

  2. value must be numeric.↩︎

  3. value must be numeric.↩︎

  4. value must be numeric.↩︎

  5. value must be numeric.↩︎

  6. Note that this “or” operator is a “vertical line”” which is typed with the shift-backslash key.↩︎

  7. Some examples of arrange() are in Section 6.7.↩︎