mlbls <- c("Unknown","Cast Iron","Copper","Ductile Iron",
"High Density Polyethylene","Lead","Polyvinyl Chloride")
dlbls <- c("Unknown","1880s","1890s","1900s","1910s","1920s","1930s","1940s",
"1950s","1960s","1970s","1980s","1990s","2000s","2010s")
wm <- read.csv("WaterBreaksUpdated.csv") %>%
mutate(Material=plyr::mapvalues(Material,
from=c("","CI","CU","DI","HDPE","PB","PVC"),
to=mlbls),
Material=factor(Material,levels=mlbls),
Decade=plyr::mapvalues(Decade,from="",to="Unknown"),
Decade=factor(Decade,levels=dlbls))
str(wm)
head(wm)
clrs <- c("Unknown"="#E69F00","Cast Iron"="#56B4E9","Copper"="#009E73","Ductile Iron"="#F0E442","High Density Polyethylene"="#0072B2","Lead"="#CC79A7","Polyvinyl Chloride"="#D55E00")
This graph is made from workorder data collected by the Ashland, WI Public Works Department from 2004-2019. This past Winter, I worked as an intern for the public works department plotting water main breaks in GIS. I exported the data from GIS into a .csv file to create my graphs. The data include information about pipe material, pipe diameter, installation date, break date, etc. The American Waterworks Association has set a standard of 25 breaks/year/100 miles (.25 breaks per mile) of pipe as a way for public works departments to manage their assets. For Ashland, the standard is 14.75 breaks per year for our 59 miles of pipe (also .25 breaks per mile). Anything above this number indicates pipe replacement is necessary, so knowing which pipe material is producing the highest number of breaks allows the public works department to shift their focus to the problem areas. In essence, the question that I am trying to answer is: Which pipe materials are above the maximum number of recommended breaks per mile?
wm_sum1 <- wm %>%
group_by(Material) %>%
summarize(freq=n()) %>%
mutate(KUB=freq/56) %>%
ungroup()
wm_sum1
#R> # A tibble: 7 x 3
#R> Material freq KUB
#R> <fct> <int> <dbl>
#R> 1 Unknown 16 0.286
#R> 2 Cast Iron 122 2.18
#R> 3 Copper 1 0.0179
#R> 4 Ductile Iron 26 0.464
#R> 5 High Density Polyethylene 1 0.0179
#R> 6 Lead 1 0.0179
#R> 7 Polyvinyl Chloride 2 0.0357
b <- ggplot(data=wm_sum1,mapping=aes(x=Material,y=KUB))+
geom_bar(stat="identity",color="#56B4E9",fill="#56B4E9",alpha=0.75)+
scale_x_discrete(name="Pipe Material",labels=stringr::str_wrap(mlbls,width=15))+
scale_y_continuous(name="Number of Breaks per Mile",expand=expansion(mult=c(0,0.05)))+
scale_fill_manual(name="Material")+
labs(title="Break Rates per Material",subtitle="Broken Water Mains 2004-2019",caption="Source: Ashland, WI Public Works Department")+
geom_hline(yintercept=0.25, linetype='dotted', col = '#009E73',size=1)+
annotate(geom="label",x=5,y=0.45,hjust="left",label="Recommended # of Breaks")+
theme_bw()
b
The data show that cast iron pipes break much more than every other type of pipe material, and both cast iron and ductile iron pipes exceed the recommened number of breaks per mile. Therefore, efforts should focus on replacing pipes with these material types first. I chose to use a bar plot because the values on the x-axis are discrete. The colors I chose are colorblind friendly.
This graph seperates the number of broken pipes per year by the decade that the pipes were installed.This allows public works managers to identify the age of pipes that are the most in need of replacement.
b<-ggplot(data=wm,mapping=aes(x=YearBreak,fill=Material))+
geom_bar(width=1,color="black")+
scale_x_continuous(name="Year of Break",breaks=seq(2004,2019,3))+
scale_y_continuous(name="Number of Breaks",expand=expansion(mult=c(0,0.05)))+
labs(title="Breaks by Decade of Installation",subtitle="Broken Water Mains 2004-2019",caption="Source: Ashland, WI Public Works Department")+
scale_fill_manual(values=clrs)+
facet_wrap(vars(Decade),nrow=3,drop=FALSE)+
theme_bw()+
theme(axis.text.x = element_text(angle=-90, vjust=0.5))
b
The data show that the pipes that have been breaking the most between 2004-2019 were installed in the 1880s and 1890s.This makes sense as these are the oldest pipes, and these old cast iron pipes are the most abundant type of pipe that Ashland has. I chose to facet by decade to highlight the age of the pipes. It is important to note that the data for some of these years are incomplete, so the number of breaks in some years will be much lower than others, which makes the graphs look a bit wonky, but real life data are not always pretty.