Module 7 Wrangle Rows
The previous module demonstrated methods for wrangling columns (i.e., variables) in a data frame. In this module, methods to wrangle rows, which with tidy data are observations, in a data frame are introduced. The primary method of interest here is selecting a smaller subset of rows (i.e., filtering) for further analysis.
Manipulating rows means you are manipulating observations.
Again the descriptive examples below will use the bears
data frame from Section 3.3.1.
<- read_csv(file.path("data","Bears.csv"))
bears bears
#R> # A tibble: 8 x 3
#R> length.cm weight.kg loc
#R> <dbl> <dbl> <chr>
#R> 1 139 110 Bayfield
#R> 2 120. 60 Bayfield
#R> 3 149 85 Bayfield
#R> 4 141 100 Ashland
#R> 5 141 95 Ashland
#R> 6 150 85 Douglas
#R> 7 130. 105 Douglas
#R> 8 150 110 Douglas
The code here will also use the pipe operator, %>%
, to again help you become more comfortable with its use.
7.1 Selecting Specific Rows
Specific rows may be selected or omitted from a data frame using slice()
. Below are four simple examples.
%>% slice(1) # First row bears
#R> # A tibble: 1 x 3
#R> length.cm weight.kg loc
#R> <dbl> <dbl> <chr>
#R> 1 139 110 Bayfield
%>% slice(c(1,3,5)) # First, third, and fifth rows bears
#R> # A tibble: 3 x 3
#R> length.cm weight.kg loc
#R> <dbl> <dbl> <chr>
#R> 1 139 110 Bayfield
#R> 2 149 85 Bayfield
#R> 3 141 95 Ashland
%>% slice(-1) # All but the first row bears
#R> # A tibble: 7 x 3
#R> length.cm weight.kg loc
#R> <dbl> <dbl> <chr>
#R> 1 120. 60 Bayfield
#R> 2 149 85 Bayfield
#R> 3 141 100 Ashland
#R> 4 141 95 Ashland
#R> 5 150 85 Douglas
#R> 6 130. 105 Douglas
#R> 7 150 110 Douglas
%>% slice(-c(1,3,5)) # All but the first, third, and fifth rows bears
#R> # A tibble: 5 x 3
#R> length.cm weight.kg loc
#R> <dbl> <dbl> <chr>
#R> 1 120. 60 Bayfield
#R> 2 141 100 Ashland
#R> 3 150 85 Douglas
#R> 4 130. 105 Douglas
#R> 5 150 110 Douglas
Rows from the beginning (i.e., the “head”) or end (i.e., the “tail”) of the data frame may also be selected with slice_head()
or slice_tail()
, respectively. You may select a certain number of rows with n=
or an approximate proportion of rows with prop=
. Below are four examples.
%>% slice_head(n=3) # First three rows bears
#R> # A tibble: 3 x 3
#R> length.cm weight.kg loc
#R> <dbl> <dbl> <chr>
#R> 1 139 110 Bayfield
#R> 2 120. 60 Bayfield
#R> 3 149 85 Bayfield
%>% slice_head(prop=0.33) # Approx. first 33% of rows bears
#R> # A tibble: 2 x 3
#R> length.cm weight.kg loc
#R> <dbl> <dbl> <chr>
#R> 1 139 110 Bayfield
#R> 2 120. 60 Bayfield
%>% slice_tail(n=3) # Last three rows bears
#R> # A tibble: 3 x 3
#R> length.cm weight.kg loc
#R> <dbl> <dbl> <chr>
#R> 1 150 85 Douglas
#R> 2 130. 105 Douglas
#R> 3 150 110 Douglas
%>% slice_tail(prop=0.33) # Approx. last 33% of rows bears
#R> # A tibble: 2 x 3
#R> length.cm weight.kg loc
#R> <dbl> <dbl> <chr>
#R> 1 130. 105 Douglas
#R> 2 150 110 Douglas
Finally a random sample of rows from the data frame may be selected with slice_sample()
, again either using n=
or prop=
.
%>% slice_sample(n=3) # 3 random rows bears
#R> # A tibble: 3 x 3
#R> length.cm weight.kg loc
#R> <dbl> <dbl> <chr>
#R> 1 141 100 Ashland
#R> 2 141 95 Ashland
#R> 3 130. 105 Douglas
%>% slice_sample(prop=0.33) # Random approx. 33% rows. bears
#R> # A tibble: 2 x 3
#R> length.cm weight.kg loc
#R> <dbl> <dbl> <chr>
#R> 1 150 110 Douglas
#R> 2 139 110 Bayfield
Use the slice()
family of functions to select specific (or random) rows from a data frame.
7.2 Filtering Rows
Observations or rows can be selected from a data frame with filter()
. The directive arguments to filter()
are conditional expressions describing which observations from the data frame to maintain. Common operators used in these conditional expressions are in Table 7.1. The filter()
function works by evaluating the condition to either TRUE
or FALSE
and then returning all rows that evaluated to TRUE
.
Comparison Operator | Rows Returned from Original Data Frame |
---|---|
var==value
|
All rows where var IS equal to value
|
var!=value
|
All rows where var is NOT equal to value
|
var %in% c(value1,value2)
|
All rows where var IS IN (or one of the) vector of value s27
|
var >value
|
All rows where var is greater than value 28
|
var >=value
|
All rows where var is greater than or equal to value 29
|
var <value
|
All rows where var is less than value 30
|
var <=value
|
All rows where var is less than or equal to value 31
|
condition1,condition2 | All rows where BOTH conditions are true |
condition1 | condition2 | All rows where ONE or BOTH conditions are true32 |
The following are examples of new data frames created from bears
. The name of the new data frame (i.e., object left of the assignment operator) is tmp
(for temporary) in each example below because there is no plan to use these data frames further.
- Only observations from Bayfield county.
<- bears %>% filter(loc=="Bayfield")
tmp tmp
#R> # A tibble: 3 x 3
#R> length.cm weight.kg loc
#R> <dbl> <dbl> <chr>
#R> 1 139 110 Bayfield
#R> 2 120. 60 Bayfield
#R> 3 149 85 Bayfield
- Observations from both Bayfield and Ashland counties.
<- bears %>% filter(loc %in% c("Bayfield","Ashland"))
tmp tmp
#R> # A tibble: 5 x 3
#R> length.cm weight.kg loc
#R> <dbl> <dbl> <chr>
#R> 1 139 110 Bayfield
#R> 2 120. 60 Bayfield
#R> 3 149 85 Bayfield
#R> 4 141 100 Ashland
#R> 5 141 95 Ashland
- Observations NOT from Bayfield county.
<- bears %>% filter(loc != "Bayfield")
tmp tmp
#R> # A tibble: 5 x 3
#R> length.cm weight.kg loc
#R> <dbl> <dbl> <chr>
#R> 1 141 100 Ashland
#R> 2 141 95 Ashland
#R> 3 150 85 Douglas
#R> 4 130. 105 Douglas
#R> 5 150 110 Douglas
- Observations with a weight greater than 100 kg.
<- bears %>% filter(weight.kg>100)
tmp tmp
#R> # A tibble: 3 x 3
#R> length.cm weight.kg loc
#R> <dbl> <dbl> <chr>
#R> 1 139 110 Bayfield
#R> 2 130. 105 Douglas
#R> 3 150 110 Douglas
- Observations from Douglas County that weighed at least 110 kg.
<- bears %>% filter(loc=="Douglas",weight.kg>=110)
tmp tmp
#R> # A tibble: 1 x 3
#R> length.cm weight.kg loc
#R> <dbl> <dbl> <chr>
#R> 1 150 110 Douglas
The last example above illustrates that multiple conditional expressions in filter()
are combined as an “and” operator such that both conditions must be true.
Use filter()
to select rows from a data frame that match a logical condition.
It is good practice to examine a data frame after filtering to be sure that the new data frame contains the observations that you want. The data frames above are so small that you can simply and easily examine the entire data frame. However, this will not be the case with more realistic larger data frames. Thus, I suggest the following methods for “checking your filtering.”
- Simply display the data frame or the structure of the data frame to identify any obvious issues. For example, the code below should return all bears from Douglas County with a weight greater than 150 kg. Showing the data frame or the structure of the data frame both show that this data frame contains no data.
<- bears %>% filter(loc=="Douglas",weight.kg>=150)
tmp tmp
#R> # A tibble: 0 x 3
#R> # ... with 3 variables: length.cm <dbl>, weight.kg <dbl>, loc <chr>
str(tmp,give.attr=FALSE)
#R> spec_tbl_df [0 x 3] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
#R> $ length.cm: num(0)
#R> $ weight.kg: num(0)
#R> $ loc : chr(0)
- If you filter with respect to a categorical variable then use
unique()
with that categorical variable to examine its levels. For example, the filter below is expected to return observations for just Ashland and Bayfield counties. The use ofunique()
supports that this is what was returned.
<- bears %>% filter(loc %in% c("Bayfield","Ashland"))
tmp unique(tmp$loc)
#R> [1] "Bayfield" "Ashland"
- If you filter with respect to a quantitative variable then use
summary()
with that quantitative variable to examine its summary statistics. For example the fitler below is expected to return observations for lengths between 130 and 145 cm. The minimum and maximum values in thesummary()
results support that is what was returned.
<- bears %>% filter(length.cm>130,length.cm<145)
tmp summary(tmp$length.cm)
#R> Min. 1st Qu. Median Mean 3rd Qu. Max.
#R> 139.0 140.0 141.0 140.3 141.0 141.0
Examine the new data frame after filtering to ensure that it contains the observations you intended.
7.3 Arranging Rows
The arrange()
function is used to sort rows based on values in one or more variables.33 The default is ascending order. To sort in descending order then wrap the variable name in desc()
. If more than one variable is given then the rows are first sorted based on the first variable and then ties in the first variable are sorted based on the second variable. Examples of sorting are shown below.
- Alphabetically sort bears by location name.
<- bears %>% arrange(loc)
bears bears
#R> # A tibble: 8 x 3
#R> length.cm weight.kg loc
#R> <dbl> <dbl> <chr>
#R> 1 141 100 Ashland
#R> 2 141 95 Ashland
#R> 3 139 110 Bayfield
#R> 4 120. 60 Bayfield
#R> 5 149 85 Bayfield
#R> 6 150 85 Douglas
#R> 7 130. 105 Douglas
#R> 8 150 110 Douglas
- Sort bears from heaviest to lightest.
<- bears %>% arrange(desc(weight.kg))
bears bears
#R> # A tibble: 8 x 3
#R> length.cm weight.kg loc
#R> <dbl> <dbl> <chr>
#R> 1 139 110 Bayfield
#R> 2 150 110 Douglas
#R> 3 130. 105 Douglas
#R> 4 141 100 Ashland
#R> 5 141 95 Ashland
#R> 6 149 85 Bayfield
#R> 7 150 85 Douglas
#R> 8 120. 60 Bayfield
- Sort bears from heaviest to lightest within each location.
<- bears %>% arrange(loc,desc(weight.kg))
bears bears
#R> # A tibble: 8 x 3
#R> length.cm weight.kg loc
#R> <dbl> <dbl> <chr>
#R> 1 141 100 Ashland
#R> 2 141 95 Ashland
#R> 3 139 110 Bayfield
#R> 4 149 85 Bayfield
#R> 5 120. 60 Bayfield
#R> 6 150 110 Douglas
#R> 7 130. 105 Douglas
#R> 8 150 85 Douglas
- sort bears by size, first by length and then by weight.
<- bears %>% arrange(length.cm,weight.kg)
bears bears
#R> # A tibble: 8 x 3
#R> length.cm weight.kg loc
#R> <dbl> <dbl> <chr>
#R> 1 120. 60 Bayfield
#R> 2 130. 105 Douglas
#R> 3 139 110 Bayfield
#R> 4 141 95 Ashland
#R> 5 141 100 Ashland
#R> 6 149 85 Bayfield
#R> 7 150 85 Douglas
#R> 8 150 110 Douglas
Use arrange()
to sort rows in a data frame by the value(s) of variable(s).
7.4 Appending Rows
Two data frames can be combined with bind_rows()
IF they have the same column names and classes. For example suppose that two other data frames exist – bears2
has more information about bears and bobcats
has similar information about bobcats.
bears2
#R> length.cm weight.kg loc
#R> 1 135 100 Iron
#R> 2 142 115 Iron
#R> 3 143 110 Iron
bobcats
#R> length.cm weight.kg loc
#R> 1 75 6.2 Douglas
#R> 2 82 8.1 Douglas
#R> 3 71 7.4 Bayfield
#R> 4 79 7.6 Douglas
The code below appends the bears2
data frame to the bottom of the bears
data frame and then, for demonstration purposes, orders the bears by size.
<- bind_rows(bears,bears2) %>%
newbears arrange(length.cm,weight.kg)
newbears
#R> # A tibble: 11 x 3
#R> length.cm weight.kg loc
#R> <dbl> <dbl> <chr>
#R> 1 120. 60 Bayfield
#R> 2 130. 105 Douglas
#R> 3 135 100 Iron
#R> 4 139 110 Bayfield
#R> 5 141 95 Ashland
#R> 6 141 100 Ashland
#R> 7 142 115 Iron
#R> 8 143 110 Iron
#R> 9 149 85 Bayfield
#R> 10 150 85 Douglas
#R> 11 150 110 Douglas
The same could be done with the bears
and bobcats
data frames but there will be no way to then tell which observations are for bears and which are for bobcats. This deficiency can be overcome by giving names to the data frames within bind_rows()
and giving a variable name to .id=
for the new variable that will identify the groups. For example,
<- bind_rows("bear"=bears,"bobcat"=bobcats,.id="animal")
animals animals
#R> # A tibble: 12 x 4
#R> animal length.cm weight.kg loc
#R> <chr> <dbl> <dbl> <chr>
#R> 1 bear 120. 60 Bayfield
#R> 2 bear 130. 105 Douglas
#R> 3 bear 139 110 Bayfield
#R> 4 bear 141 95 Ashland
#R> 5 bear 141 100 Ashland
#R> 6 bear 149 85 Bayfield
#R> 7 bear 150 85 Douglas
#R> 8 bear 150 110 Douglas
#R> 9 bobcat 75 6.2 Douglas
#R> 10 bobcat 82 8.1 Douglas
#R> 11 bobcat 71 7.4 Bayfield
#R> 12 bobcat 79 7.6 Douglas
Note that more than two data frames can be combined with bind_rows()
.
Use bind_rows()
to combine two (or more) data frames that have the same variables (i.e., columns).
7.5 Examples in Context
7.5.1 NBA Players
In Section 6.7.1 the players2
data frame was created that showed the starting year, ending year, total years played, and whether the player was from the “modern” era or not for all NBA players.
players2
#R> # A tibble: 4,393 x 5
#R> name start end years_played modern
#R> <chr> <dbl> <dbl> <dbl> <chr>
#R> 1 Willis, Kevin 1984 2006 22 yes
#R> 2 Jones, Mark 1983 2004 21 yes
#R> 3 Carter, Vince 1998 2018 20 yes
#R> 4 Garnett, Kevin 1995 2015 20 yes
#R> 5 Nowitzki, Dirk 1998 2018 20 yes
#R> 6 Parish, Robert 1976 1996 20 yes
#R> 7 Abdul-Jabbar, Kareem 1969 1988 19 yes
#R> 8 Bryant, Kobe 1996 2015 19 yes
#R> 9 Cousy, Bob 1950 1969 19 no
#R> 10 Crawford, Jamal 2000 2018 18 yes
#R> # ... with 4,383 more rows
The graph shown in that same section was for all players with more than 18 years in the NBA. The data frame for that graph is constructed below.
<- players2 %>% filter(years_played>18)
nba_gt18 nba_gt18
#R> # A tibble: 9 x 5
#R> name start end years_played modern
#R> <chr> <dbl> <dbl> <dbl> <chr>
#R> 1 Willis, Kevin 1984 2006 22 yes
#R> 2 Jones, Mark 1983 2004 21 yes
#R> 3 Carter, Vince 1998 2018 20 yes
#R> 4 Garnett, Kevin 1995 2015 20 yes
#R> 5 Nowitzki, Dirk 1998 2018 20 yes
#R> 6 Parish, Robert 1976 1996 20 yes
#R> 7 Abdul-Jabbar, Kareem 1969 1988 19 yes
#R> 8 Bryant, Kobe 1996 2015 19 yes
#R> 9 Cousy, Bob 1950 1969 19 no
It might be interesting to see who started in the NBA in the year of your college graduation (using mine below).
<- players2 %>% filter(start==1989)
nba_grad1 nba_grad1
#R> # A tibble: 81 x 5
#R> name start end years_played modern
#R> <chr> <dbl> <dbl> <dbl> <chr>
#R> 1 Robinson, Clifford 1989 2006 17 yes
#R> 2 Divac, Vlade 1989 2004 15 yes
#R> 3 Barros, Dana 1989 2003 14 yes
#R> 4 Rice, Glen 1989 2003 14 yes
#R> 5 Anderson, Nick 1989 2002 13 yes
#R> 6 Hardaway, Tim 1989 2002 13 yes
#R> 7 Kemp, Shawn 1989 2002 13 yes
#R> 8 Mason, Anthony 1989 2002 13 yes
#R> 9 McCloud, George 1989 2002 13 yes
#R> 10 Robinson, David 1989 2002 13 yes
#R> # ... with 71 more rows
Perhaps those that started in the year of your graduation and played for more than a decade.
<- players2 %>% filter(start==1989,years_played>10)
nba_grad2 nba_grad2
#R> # A tibble: 21 x 5
#R> name start end years_played modern
#R> <chr> <dbl> <dbl> <dbl> <chr>
#R> 1 Robinson, Clifford 1989 2006 17 yes
#R> 2 Divac, Vlade 1989 2004 15 yes
#R> 3 Barros, Dana 1989 2003 14 yes
#R> 4 Rice, Glen 1989 2003 14 yes
#R> 5 Anderson, Nick 1989 2002 13 yes
#R> 6 Hardaway, Tim 1989 2002 13 yes
#R> 7 Kemp, Shawn 1989 2002 13 yes
#R> 8 Mason, Anthony 1989 2002 13 yes
#R> 9 McCloud, George 1989 2002 13 yes
#R> 10 Robinson, David 1989 2002 13 yes
#R> # ... with 11 more rows
Perhaps we want to find those that were playing during the year of your graduation.
<- players2 %>% filter(start<=1989,end>=1989)
nba_grad3 nba_grad3
#R> # A tibble: 409 x 5
#R> name start end years_played modern
#R> <chr> <dbl> <dbl> <dbl> <chr>
#R> 1 Willis, Kevin 1984 2006 22 yes
#R> 2 Jones, Mark 1983 2004 21 yes
#R> 3 Parish, Robert 1976 1996 20 yes
#R> 4 Edwards, James 1977 1995 18 yes
#R> 5 Jordan, Michael 1984 2002 18 yes
#R> 6 Long, John 1978 1996 18 yes
#R> 7 Mahorn, Rick 1980 1998 18 yes
#R> 8 Malone, Karl 1985 2003 18 yes
#R> 9 Malone, Moses 1976 1994 18 yes
#R> 10 Oakley, Charles 1985 2003 18 yes
#R> # ... with 399 more rows
Perhaps we want to find those whose name was “Jordan.” This gets a bit tricky because name
is formatted as Lastname, Firstname
. In this case we want to find all instances where “Jordan” is somewhere in name
. This will require a function that won’t be formally introduced until Module 11. This new function is called grepl()
and it takes a string to find as the first argument and the name of character vector in which to look for that string as the second argument. It will return TRUE
if the string is found in the vector or FALSE
if it is not. Below is a quick, simple example.
<- c("Ogle, Derek","Kim, Young","Jordan, Michael","Farmar, Jordan")
test grepl("Jordan",test)
#R> [1] FALSE FALSE TRUE TRUE
Here TRUE
was returned only for the last two elements because they were the only two elements that contained “Jordan.”
Because filter()
works by returning the rows that evaluate to TRUE
, the grepl()
code can be put in the place of the condition. For example, the code below returns all rows of players2
where name
contains “Jordan.”
<- players2 %>% filter(grepl("Jordan",name))
nba_jordans nba_jordans
#R> # A tibble: 20 x 5
#R> name start end years_played modern
#R> <chr> <dbl> <dbl> <dbl> <chr>
#R> 1 Jordan, Michael 1984 2002 18 yes
#R> 2 Farmar, Jordan 2006 2016 10 yes
#R> 3 Jordan, DeAndre 2008 2018 10 yes
#R> 4 Crawford, Jordan 2010 2017 7 yes
#R> 5 Hill, Jordan 2009 2016 7 yes
#R> 6 Jordan, Eddie 1977 1983 6 yes
#R> 7 Jordan, Reggie 1993 1999 6 yes
#R> 8 Jordan, Adonis 1993 1998 5 yes
#R> 9 Clarkson, Jordan 2014 2018 4 yes
#R> 10 Hamilton, Jordan 2011 2015 4 yes
#R> 11 Jordan, Jerome 2011 2014 3 yes
#R> 12 McRae, Jordan 2015 2018 3 yes
#R> 13 Mickey, Jordan 2015 2017 2 yes
#R> 14 Adams, Jordan 2014 2015 1 yes
#R> 15 Bell, Jordan 2017 2018 1 yes
#R> 16 Jordan, Thomas 1992 1992 0 yes
#R> 17 Jordan, Walter 1980 1980 0 yes
#R> 18 Loyd, Jordan 2018 2018 0 yes
#R> 19 Sibert, Jordan 2018 2018 0 yes
#R> 20 Williams, Jordan 2011 2011 0 yes
However, given the format of the data in name
we need to be a little tricky to get all last names that are “Jordan” (note the extra comma on “Jordan,” below).
<- players2 %>% filter(grepl("Jordan,",name))
nba_jordans2 nba_jordans2
#R> # A tibble: 8 x 5
#R> name start end years_played modern
#R> <chr> <dbl> <dbl> <dbl> <chr>
#R> 1 Jordan, Michael 1984 2002 18 yes
#R> 2 Jordan, DeAndre 2008 2018 10 yes
#R> 3 Jordan, Eddie 1977 1983 6 yes
#R> 4 Jordan, Reggie 1993 1999 6 yes
#R> 5 Jordan, Adonis 1993 1998 5 yes
#R> 6 Jordan, Jerome 2011 2014 3 yes
#R> 7 Jordan, Thomas 1992 1992 0 yes
#R> 8 Jordan, Walter 1980 1980 0 yes
We also need to be a little tricky to get all first names that are “Jordan.”
<- players2 %>% filter(grepl(", Jordan",name))
nba_jordans3 nba_jordans3
#R> # A tibble: 12 x 5
#R> name start end years_played modern
#R> <chr> <dbl> <dbl> <dbl> <chr>
#R> 1 Farmar, Jordan 2006 2016 10 yes
#R> 2 Crawford, Jordan 2010 2017 7 yes
#R> 3 Hill, Jordan 2009 2016 7 yes
#R> 4 Clarkson, Jordan 2014 2018 4 yes
#R> 5 Hamilton, Jordan 2011 2015 4 yes
#R> 6 McRae, Jordan 2015 2018 3 yes
#R> 7 Mickey, Jordan 2015 2017 2 yes
#R> 8 Adams, Jordan 2014 2015 1 yes
#R> 9 Bell, Jordan 2017 2018 1 yes
#R> 10 Loyd, Jordan 2018 2018 0 yes
#R> 11 Sibert, Jordan 2018 2018 0 yes
#R> 12 Williams, Jordan 2011 2011 0 yes
It is intersting that there were no players in the NBA with the first name “Jordan” before Michael Jordan (the greatest player of all time) retired.
7.5.2 Wolves and Moose of Isle Royale
In Section 6.7.2 the irmw2
data frame was created for use in the graphing course.
irmw2
#R> # A tibble: 61 x 6
#R> year era wolves moose winter_temp ice_bridges
#R> <dbl> <chr> <dbl> <dbl> <dbl> <chr>
#R> 1 1959 early 20 538. 1.4 no
#R> 2 1960 early 22 564. 8.45 no
#R> 3 1961 early 22 572. 9.75 yes
#R> 4 1962 early 23 579. 2.15 yes
#R> 5 1963 early 20 596. -0.35 yes
#R> 6 1964 early 26 620. 12.4 no
#R> 7 1965 early 28 634. 1.25 yes
#R> 8 1966 early 26 661. 1.7 yes
#R> 9 1967 early 22 766. 2.75 yes
#R> 10 1968 early 22 848. 5.85 yes
#R> # ... with 51 more rows
One of the things we did in that class is focus on the “early” years of the wolf-moose time series. Such a data frame is create below.
<- irmw2 %>% filter(era=="early") irmw_early
Other things that we could do with these data are …
- Find years where the moose population was more than 1500 animals.
<- irmw2 %>% filter(moose>1500)
tmp tmp
#R> # A tibble: 7 x 6
#R> year era wolves moose winter_temp ice_bridges
#R> <dbl> <chr> <dbl> <dbl> <dbl> <chr>
#R> 1 1992 middle 12 1697. 15.2 no
#R> 2 1993 middle 13 1784. 9.1 no
#R> 3 1994 middle 17 2017. -0.4 yes
#R> 4 1995 middle 16 2117. 9.5 no
#R> 5 1996 middle 22 2398. 2.8 yes
#R> 6 2017 recent 2 1600 15.4 no
#R> 7 2019 recent 15 2060 3.5 yes
- Find years where the wolf population was less than 10 animals.
<- irmw2 %>% filter(wolves<10)
tmp tmp
#R> # A tibble: 7 x 6
#R> year era wolves moose winter_temp ice_bridges
#R> <dbl> <chr> <dbl> <dbl> <dbl> <chr>
#R> 1 2012 recent 9 750 17.2 no
#R> 2 2013 recent 8 975 8.95 no
#R> 3 2014 recent 9 1050 -1.05 yes
#R> 4 2015 recent 3 1250 3.85 yes
#R> 5 2016 recent 2 1300 12.0 no
#R> 6 2017 recent 2 1600 15.4 no
#R> 7 2018 recent 2 1475 6.05 yes
- Find years where the wolf population was less than 10 animals and the moose population was greater than 1500 animals.
<- irmw2 %>% filter(wolves<10,moose>1500)
tmp tmp
#R> # A tibble: 1 x 6
#R> year era wolves moose winter_temp ice_bridges
#R> <dbl> <chr> <dbl> <dbl> <dbl> <chr>
#R> 1 2017 recent 2 1600 15.4 no
- Find years where ice bridges formed.
<- irmw2 %>% filter(ice_bridges=="yes")
tmp tmp
#R> # A tibble: 26 x 6
#R> year era wolves moose winter_temp ice_bridges
#R> <dbl> <chr> <dbl> <dbl> <dbl> <chr>
#R> 1 1961 early 22 572. 9.75 yes
#R> 2 1962 early 23 579. 2.15 yes
#R> 3 1963 early 20 596. -0.35 yes
#R> 4 1965 early 28 634. 1.25 yes
#R> 5 1966 early 26 661. 1.7 yes
#R> 6 1967 early 22 766. 2.75 yes
#R> 7 1968 early 22 848. 5.85 yes
#R> 8 1969 early 17 1041. 7.8 yes
#R> 9 1970 early 18 1045. 3.35 yes
#R> 10 1971 early 20 1183. 3.2 yes
#R> # ... with 16 more rows
- Find years in the 1970s decade and order by the descending number of wolves.
<- irmw2 %>%
tmp filter(year>=1970,year<1980) %>%
arrange(desc(wolves))
tmp
#R> # A tibble: 10 x 6
#R> year era wolves moose winter_temp ice_bridges
#R> <dbl> <chr> <dbl> <dbl> <dbl> <chr>
#R> 1 1976 middle 44 1070. 10.4 no
#R> 2 1979 middle 43 857. -1.15 yes
#R> 3 1975 middle 41 1139. 9.05 no
#R> 4 1978 middle 40 845. 3.25 no
#R> 5 1977 middle 34 949. 4.7 yes
#R> 6 1974 early 31 1203. 5.65 yes
#R> 7 1973 early 24 1215. 10.8 no
#R> 8 1972 early 23 1243. -0.05 yes
#R> 9 1971 early 20 1183. 3.2 yes
#R> 10 1970 early 18 1045. 3.35 yes
- Find years in the 1980s decade, order by the descending number of wolves, and show only the top three years (i.e., most wolves).
<- irmw2 %>%
tmp filter(year>=1980,year<1990) %>%
arrange(desc(wolves)) %>%
slice_head(n=3)
tmp
#R> # A tibble: 3 x 6
#R> year era wolves moose winter_temp ice_bridges
#R> <dbl> <chr> <dbl> <dbl> <dbl> <chr>
#R> 1 1980 middle 50 788. 7.15 no
#R> 2 1981 middle 30 767. 11.7 no
#R> 3 1984 middle 24 927. 12.5 no