fires <- read.csv("forestfires.csv")
str(fires)
head(fires)
fires <- fires %>%
mutate(month = factor(month, levels = c("jan", "feb", "mar", "apr", "may", "jun", "jul", "aug", "sep", "oct", "nov", "dec")))
fires
jail <- read.csv("traffic.csv")
str(jail)
head(jail)
jail <- jail %>%
select(RACE,SEX,ARREST.YEAR) %>%
mutate(SEX=plyr::mapvalues(SEX,
from=c("F","M"),
to=c("Female","Male")),
SEX=factor(SEX)) %>%
mutate(RACE=plyr::mapvalues(RACE,
from=c("B","W"),
to=c("Black","White")),
RACE=factor(RACE))
jail
theme_dark <- theme_bw() +
theme(panel.grid.major.x = element_blank(),
panel.grid.minor.x = element_blank(),
panel.grid.minor.y = element_line(color="gray30"),
panel.grid.major.y=element_line(color="gray30"),
legend.background=element_rect(fill="gray10"),
legend.text = element_text(color="white"),
legend.title = element_text(color="white"),
plot.background = element_rect(fill = "gray10"),
plot.title=element_text(color="white",face="bold",size=16),
plot.caption = element_text(color="white"),
plot.subtitle = element_text(color="white"),
plot.margin=margin(l=1,t=1,b=1,r=1.5,unit="cm"),
axis.text = element_text(color = "white",size=10),
axis.title=element_text(color="white",face="bold",size=14),
panel.background=element_rect(fill = "gray10"),
strip.background=element_rect(fill="black",color="gray50"),
strip.text=element_text(color="white",size=13),
panel.spacing=unit(5,unit="mm"))
theme_dark
The data that I used for the graph below recorded the number of forest fires that occured from 2000-2003 in the Montesinho Natural Park in the northest region of Portugal. The data also took into account the month, temperature, humidity, rainfall, and factors from the Fire Weather Index (FWI) system.
The main question that I wanted to explore was, which month had the most forest fires over the three year period? After looking at the end product of my graph, I have found that August and September had the greatest amount of forest fires by a huge amount, which makes sense because those months would have had the hottest and dryest weather out of the whole year.
I chose to create a bar plot because I wanted to count the total number of forest fires for each month. I chose the dark background because I liked the way it looked in our last exercises and I think it made the color of the bars pop a little bit more. I got rid of the vertical grid lines because I thought it looked less cluttered and made the plot easier to read.
ggplot(data = fires, mapping = aes(x=month)) +
geom_bar(fill="paleturquoise3") +
scale_x_discrete(name = "Month") +
scale_y_continuous(name = "Number of Forest Fires",
expand = expansion(mult = c(0,0.1))) +
labs(title="Montesinho Natural Park Forest Fires",
subtitle = "data recorded from 2000-2003",
caption="Source: http://www3.dsi.uminho.pt/pcortez/forestfires/") +
theme_dark
The data that I used for the graph below recorded information about the traffic arrests made at the Jefferson County (TX) Jail. The main factors that the data took into account were the charges, race, sex, year, and days served.
I was wondering which gender would be arrested more frequently for traffic violations and I had the hypothesis that black men and women would be arrested for traffic violations more often than white men and women. After looking at the end product of my graph, I found that of the people arrested for traffic violations in the Jefferson County Jail, most of them were men. I also found that my hypothesis was correct and that black men and women had significantly higher numbers of traffic arrests compared to their white counterparts.
I chose to create a faceted bar plot because I wanted to count the total number of traffic arrests and compare them across multiple groups. Again, I chose the dark background because I liked the way it looked in our last exercises and I think it made the color of the bars pop a little bit more. I got rid of the vertical grid lines and added more panel spacing because I thought it looked less cluttered and made the plot easier to read. I changed the colors of the plots to be separated by sex so that it was easier to see the difference in genders.
ggplot(data=jail, mapping=aes(x=ARREST.YEAR, fill=SEX)) +
geom_bar() +
scale_x_continuous(name = "Year", breaks = seq(2005,2015,1)) +
scale_y_continuous(name = "Number of Traffic Arrests", limits=c(0,425),expand = expansion(mult = 0,0.1)) +
scale_fill_manual(name="Sex", values = c("deeppink3","cornflowerblue")) +
facet_grid(rows = vars(RACE), cols = vars(SEX)) +
labs(title="Jefferson County (TX) Jail Traffic Arrests, 2005-2015",
subtitle = "traffic violation severity not accounted for",
caption="Source: https://github.com/BuzzFeedNews/2016-01-port-arthur-arrests") +
theme_dark +
theme(legend.position = "none", axis.text.x = element_text(size=8,angle = 45, vjust = 0.75))