A large number of small data sets are available in the FSA and FSAdata packages. These data sets may be useful for demonstrating typical fisheries science analyses in an undergraduate or early graduate fisheries science and management course or if one is self-teaching how to perform these analyses. Indeed, several of these data sets are used in the forthcoming Introductory Fisheries Analyses for R book.
There are at least two problems with delivering these data sets within an R package that limits their pedagogical utility. I describe these problems below and explain my solutions so that these data sets will be available to instructors in a more useful format, while still being maintained in R packages.
First Problem
Finding the Right Data Set
Finding the “right” data set in a package, especially packages like FSAdata that contain many data sets, can be difficult. This problem is ameliorated somewhat by the ability to search all data sets in a package with help.search()
using package=
and keyword="datasets"
(note that the result will appear in the Help pane if using RStudio or a browser if using R).
The data sets in the FSA and FSAdata packages have been augmented with specific topics in the “concepts” field of the help documentation that allow for a more focused search. However, one needs to know which specific topics have been used (and their spelling). Fortunately, these can be found with FSAdataTopics
.
Thus, for example, one can find all data sets in FSA and FSAdata that are tagged with the Age Comparison
concept using help.search()
with package=
and fields="concept"
.
This is an improvement, but still a bit of a nuisance.
A Solution
I have developed a Data page on the fishR website that lists all data sets in these two packages in these three different ways:
The second of these lists is most useful because it allows one to easily see the analytical topics and each data set that can be used for that type of analysis.
Additionally, for each data set shown in these lists, there are icons that link to the data displayed in a spreadsheet-like format (), to the data as a raw comma-separated values (CSV) text file (), and to meta-data documentation (). Click the icons in the previous sentence for examples.
Second Problem
Not the Real World
Data sets distributed with packages are loaded into the R workspace by including the name of the data set within data()
. For example, the AlewifeLH
data set distributed with the FSAdata package is loaded and its structure is examined below.
The problem here is that the only time a student will ever use data()
is if the data set exists within an R package. The student will not use data()
when they analyze their own data. Thus, the student gets no experience with the critical step of loading one’s own data into R.
A Solution
The raw CSV files that are linked to in the lists described in the first solution are particular useful here because a student can download this file to their computer and then use read.csv()
(or any of the other functions that can be used to load CSV files) to load the data into their R workspace. This more closely resembles a workflow that the student is likely to use with their own data.
It is also possible to read the CSV file directly from the webpage.
This may not be particularly useful because the address is so long (it can, however, be copied from the download icon in the lists) and the student is unlikely to have stored their own data at an internet site.
The Future
My hope is that you and others will submit small data sets to me that I can include in the FSAdata package and on the fishR Data page so that others may use these data in their classes. With this, perhaps a compendium of pedagogically useful fisheries-related data sets can be constructed.