gb <- read.csv("GreenBay.csv",stringsAsFactors=FALSE) %>% select(STATION,DATE,DailyAverageDewPointTemperature,DailyAverageDryBulbTemperature,DailyAverageRelativeHumidity,DailyAverageSeaLevelPressure,DailyAverageWetBulbTemperature,DailyAverageWindSpeed,DailyDepartureFromNormalAverageTemperature,DailyMaximumDryBulbTemperature,DailyMinimumDryBulbTemperature,DailyPeakWindDirection,DailyPeakWindSpeed,DailyPrecipitation,DailySnowfall) %>% rename(DewPoint=DailyAverageDewPointTemperature,AvgTemp=DailyAverageDryBulbTemperature,RH=DailyAverageRelativeHumidity,Pressure=DailyAverageSeaLevelPressure,WBTemp=DailyAverageWetBulbTemperature,WindSpeed=DailyAverageWindSpeed,TempAnom=DailyDepartureFromNormalAverageTemperature,MaxTemp=DailyMaximumDryBulbTemperature,MinTemp=DailyMinimumDryBulbTemperature,WindDirection=DailyPeakWindDirection,PeakWind=DailyPeakWindSpeed,Precip=DailyPrecipitation,Snowfall=DailySnowfall) %>% filter(!is.na(DewPoint)) %>% mutate(DATE=stringr::str_remove(DATE,"T.*$"), DATE=as.Date(DATE,format="%Y-%m-%d"), year=lubridate::year(DATE), mon=lubridate::month(DATE,label=TRUE), mon2=forcats::fct_rev(mon),season=case_when( mon %in% c("Dec","Jan","Feb") ~ "Winter",mon %in% c("Mar","Apr","May") ~ "Spring", mon %in% c("Jun","Jul","Aug") ~ "Summer",mon %in% c("Sep","Oct","Nov") ~ "Fall"),season=factor(season,levels=c("Spring","Summer","Fall","Winter"))) gb2 <- gb %>% group_by(mon2) %>% mutate(med.avg.temp=median(AvgTemp)) str(gb) head(gb)
The following graph comes from a set of weather data for Green Bay, Wisconsin from the year 2010 through the end of 2019 (A ten year data set). It was obtained from the National Oceanic and Atmospheric Administration (NOAA). The data includes daily recordings of variables such as temperature, precipitation, etc. The goal of this graph was to illustrate not only expected monthly temperature trends, but also the spread of the data each month.
p <- ggplot(data=gb2,mapping=aes(x=MaxTemp,y=mon2)) + geom_boxplot(aes(fill=med.avg.temp),alpha=.8) + stat_summary(fun=mean,geom="point",color="#F0E442",fill="#F0E442",size=1,shape=21) + scale_x_continuous(breaks=seq(-10,120,10),name="Temperature (°F)") + theme_bw() + theme(panel.grid.minor=element_blank(),legend.position="none") + scale_fill_gradient2(low="#0072B2",mid="#0072B2",high="red") + labs(title="Daily High Temperatures for Green Bay, Wisconsin", subtitle="2010-2019, grouped by month", caption="Source: https://www.ncdc.noaa.gov/", y=element_blank()) + theme(panel.grid.major=element_line(linetype="dashed"), plot.title=element_text(face="bold",size=14), axis.title=element_text(face="bold",size=12), axis.text=element_text(size=11)) p
The graph above clearly shows the relationship between month and daily high temperature for Green Bay, Wisconsin over the last ten years. July has the higher temperatures, whle January has the lowest. Furthermore, one can see that a month like March has a great amount of spread, most likely resulting from the fact that spring in Wisconsin can vary quite a bit weather-wise from year to year. The spread decreases as summer occurs and then spreads again in the fall for similar reasons to that of spring. March also has the most outliers, again a probable result of it varying a lot from year to year. December also has a few outliers on its lower end. This could be a result of some cold snaps that can occur in the winter where temperatures fall way below average. It should be noted that these temperatures are dry bulb temperatures, meaning there is no influence for a heat index or wind chill.
The graph is constructed using boxplots. I felt that boxplots were a good choice, because they are easy to read and compare in large numbers; there are 12 months to view at once. I personally like the boxplots to go from top to bottom as a visual, compared to the months going along the x-axis. I also colored the boxplots according to the temperature they represent. This made it easy to see at a glance that July had the warmest temperatures while Janurary had the coldest.
The following graph comes from a set of weather data for Green Bay, Wisconsin from the year 2010 through the end of 2019 (A ten year data set). It was obtained from the National Oceanic and Atmospheric Administration (NOAA). The data includes daily recordings of variables such as temperature, precipitation, etc. When constructing this graphic, I was interested in knowing if it tended to be windier in the winter season compared to that of summer. Meteorologically, winter should be windier than summer.
library(plyr) library(dplyr) cdat <- ddply(gb2, "season", summarise, rating.mean=mean(WindSpeed,na.rm=TRUE)) clrs<-c("#009E73","#F0E442","#D55E00","#0072B2") p2 <- ggplot(data=gb2,mapping=aes(x=WindSpeed)) + geom_histogram(mapping=aes(fill=season),binwidth=1,color="black",alpha=.7) + geom_vline(data=cdat, aes(xintercept=rating.mean), linetype="dashed", size=1, colour="black") + scale_x_continuous(name="Wind Speed (mph)", limits=c(0,25), breaks=seq(0,25,5), expand=expansion(mult=c(0,NA))) + scale_y_continuous(name="Number of Days", expand=expansion(mult=c(0,.05)), breaks=seq(0,120,30)) + facet_grid(row=vars(season)) + theme_bw() + scale_fill_manual(values=clrs)+ theme(legend.position="none") + theme(strip.background=element_rect(fill="gray70"), panel.spacing=unit(0,unit="mm"), panel.grid.major=element_line(linetype="dashed"), panel.grid.minor=element_blank(), plot.title=element_text(face="bold",size=14), axis.title=element_text(face="bold",size=12), axis.text.x=element_text(size=11)) + labs(title="Average Daily Wind Speed for Green Bay, Wisconsin", subtitle="2010-2019", caption="*Seasons follow the meterological definition by month. Source: https://www.ncdc.noaa.gov/") p2
The graph above illistates the relationship between the wind speed according to season. It is a histogram, meaning the y-axis depicts frequency. In this case, it is the number of days that experienced certain wind speeds. For an easier time of analysis, I included a line showing the place of the mean for each season. Using this mean line, one can see that winter tends to be windier than summer. However, you can also see that winter has a mean very close to that of spring. One can also see that summer tends to have less variability in wind while winter has the greatest variation.
I chose a histogram for a couple reasons. The first is that I think it was a good way to display my data as the main focus was to see which season had the windiest days. The other reason was to avoid year being a factor. Because winter is December, January, and February, it includes multiple years. Using a histrogram avoids this issue. I colored each graph with respect to the season and used colors off of the color blind friendly color palette. I also divided the histogram by every one mph. This created a nice, representitive spread for analysis compared to other binwidths tried. I also added a mean line because it made a nice visual for quick analysis.