Great Lakes Indian Fisheries and Wildlife Commission has been part of a long study of surveying, analyzing, understanding, and managing the numerous species of fish that tribal members harvest annually. Ranging from inland species, such as walleye, to Great Lakes species, such as siscowet, the studies done are compiled together to meet with the larger part, the Great Lakes Fisheries Commission, in terms of presenting data and creating and reviewing management plans for future years to come. Much appreciation to Bill Mattes for providing data for graphical and analytical purposes.
Majority of analyses here will focus mainly on siscowet lake trout within Lake Superior for the years of 1999, 2003, 2008, 2012, and 2017 due to limited data and sampling size.
setwd("C:/Users/khous/Desktop/College Work/MTH/MTH 250 - Graphing/Excel Data")
library(tidyverse)
library(dplyr)
library(NCStats)
library(ggplot2)
library(plyr)
ST5yrs <- read.csv("GLIFWC.csv") %>%
filterD(SPECIES=="ST") %>%
filterD(Year %in% c(1999,2003,2008,2012,2017)) %>%
mutate(Year=factor(Year,levels=c("1999","2003","2008","2012","2017"))) %>%
select(-ID,-SAMNUM,-GRID,-starts_with("WOUNDS"))
## Explicitly remove individuals for which an age was not recorded
ageclass <- ST5yrs %>%
filterD(!is.na(AGE))
## Explicitly remove individs for which both a weight and length were not recorded
WL <- ST5yrs %>%
filterD(!is.na(WEIGHT),!is.na(LENGTH))%>%
mutate(DEPTH_BIN=factor(DEPTH_BIN,levels=c("1","2","3","4","5"))) %>%
mutate(DEPTH_BIN2=FSA::mapvalues(DEPTH_BIN,from=c("1"),to=c("2")))
WL$log.WEIGHT <- log(WL$WEIGHT)
WL$log.LENGTH <- log(WL$LENGTH)
theme_KM <- theme_bw()+
theme(
axis.title = element_text(face="bold",size=rel(1.15)),
axis.text = element_text(size=rel(1.05)),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
panel.border=element_rect(size=0.5),
legend.position="none",
plot.caption = element_text(hjust=0,size=rel(1)))
In this first analysis, the first interest in the data was to determine how the age distribution of the siscowet lake trout (salvelinus namaycush siscowet) has shifted througout the years from 1999 - 2017 and determine possible shifts that may occur in the future based upon past years.
c1 <- ggplot(data=ageclass, mapping=aes(x=AGE))+
geom_histogram(color="black", fill="grey", binwidth=1)+
facet_wrap(~Year,ncol=1, scale="free_y")+
scale_y_continuous(name = "Frequency of Age", expand=expansion(c(0,0.15)))+
scale_x_continuous(name = "Age", breaks=seq(0,34,4), expand=expansion(c(0,0.01)))+
theme_KM+
labs(caption = "Figure 1: Age distribution of siscowet
lake trout (salvelinus namaycush
siscowet) between 4-5 year gaps
between the years 1999 - 2017.")
c1
There are a few, but not limited, noticeable observations made that can be seen within the plot:
As the years progress from the year 1999 to the year 2017, there is a gradual shift from older siscowet to a younger generation of fish
There was a bimodal distribution for the year 2003
The year 2017 shown the distribution of age shifting right once again
At the start of 1999, there were higher numbers of siscowet at an older age which began to decrease as years went on. Upon approaching the year 2017, there numbers of siscowet increased within the younger generation compared to previous years
The analysis done here is in the interests of determining how length and weight for siscowet vary among different depths within Lake Superior for five selected years. The categorical variable DEPTH_BIN is the range of depth in which the fish was captured (ex: DEPTH_BIN “1” = 0m - 99m)
One thing to note for this analysis, depth bin value of 1 has been combined with 2 due to depth bin of 1 having a low sampling size (n=7).
#Using IVR to find slope and intercept
(lm1 <- lm(log.WEIGHT~log.LENGTH*DEPTH_BIN2, data=WL))
coef(lm1)
#Filtering data to find slope and intercept
WLD2 <- WL %>%
filterD(DEPTH_BIN2==2)
(lm(log.WEIGHT~log.LENGTH, data=WLD2))
WLD3 <- WL %>%
filterD(DEPTH_BIN2==3)
(lm(log.WEIGHT~log.LENGTH, data=WLD3))
WLD4 <- WL %>%
filterD(DEPTH_BIN2==4)
(lm(log.WEIGHT~log.LENGTH, data=WLD4))
WLD5 <- WL %>%
filterD(DEPTH_BIN2==5)
(lm(log.WEIGHT~log.LENGTH, data=WLD5))
#Dataframe for annotating slope equation
anno <- data.frame(x1 = c(2.5,2.55,2.22,2.65), y1 = c(8.75,8.5,8.65,8.6),
lab = c("y = 3.421x - 3.297", "y = 3.264x - 2.793", "y = 3.349x - 3.021", "y = 3.335x - 2.991"),
DEPTH_BIN2 = c("2","3","4","5"))
anno
#Dataframe for annotating outliers
anno2 <- data.frame(x1 = c(log(18),log(19),log(33.3),log(7),log(16.5),log(17.5), log(18.5), log(19.2)),
y1 = c(log(1800),log(1750),log(7750),log(39),log(1700), log(1700), log(1800), log(2750)),
DEPTH_BIN2 = c("2","2","2","4", "4","5","5","5"))
anno2
#Dataframe for annotating the total number of individuals in each plot
WL.cor <- ddply(.data=WL,
.(DEPTH_BIN2),
summarize,
n=paste("n =", length(log.LENGTH)))
#
c2 <- ggplot(data=WL, mapping=aes(x=log.LENGTH, y=log.WEIGHT, color=DEPTH_BIN, fill=DEPTH_BIN2))+
geom_point(color="black", alpha=0.5)+
geom_smooth(method=lm, color="red", se=FALSE)+
scale_x_continuous(name= "log Total Length (in)", expand=expansion(c(0.02,0.019)))+
scale_y_continuous(name= "log Weight (g)", expand=expansion(c(0.025,0.05)))+
facet_wrap(~DEPTH_BIN2, nrow=2, scale = "free",labeller=labeller(DEPTH_BIN2= c("2"="0-299 m", "3"="300-399 m", "4"="400-499 m", "5"="500+ m")))+
geom_text(data=WL.cor, aes(x=c(3.3,3.25,3.25,3.35), y=c(4.75,5,4,5.35), label=n),
colour="black", size=5, inherit.aes=FALSE, parse=FALSE)+
geom_text(data=anno, aes(x=x1, y=y1, label=lab),
colour="black", size=5, inherit.aes=FALSE, parse=FALSE)+
geom_point(data=anno2, aes(x=x1, y=y1, color="red"), size=2)+
theme_KM+
labs(caption = "Figure 2: Scatterplot of log.LENGTH vs log.WEIGHT for siscowet lake trout for four different depth bins with regression line plotted.
Red dots are potential outliers."
)
c2
Observing the slopes of each one, no trends are really profound within this data analysis
Outliers may have a substantial effect in calculating slope and the intercept for each plot at varying depths (removing may show a better log Total Length and log Weight relationship)
Not the best way to compare relationships to one another (other suitable method may be an Indicator Variable Regression)
Wanted to look at age distribution of siscowet lake trout, histogram was the simpler function in mind to create it
faceting was simpler than having to create and manage different data frames and adding them through other functions, which would be excessive and more time consuming
wanted to show each indivdual point with a best-fit line, which lead to use of geom_point and geom_smooth
wanted to show relationship and sampling number to provide graphical information and possibly interpretation if major trends were noticeable.
similar to histogram, faceting simpler than creating different data frames
working around methods and scripts in which could provide useful data without having to do a lot of work and excessive coding
considering the orintation for faceting to provide a better illustration and space for the audience (included shifting from columns to rows) with the addition of trying to add annotations in plots.
In terms of showing data that is somewhat aesthetically pleasing and quick to read, needed to limit the amount of open space through mananging the function expansion=expand(c(#,#)) and through setting breaks if needed to adjust scales
maintaining a simple graphic that contained only black and white to reduce ink usage but also be able to illustrate different values such as using alpha to show concentration of points on the second analysis
added caption to assists reader with what they were looking at