class: center, middle, inverse, title-slide # Introduction ### Derek Ogle, April 2020 --- class: inverse, center, middle # What is a Graphic? --- # What is a Graphic? - Most simply ... a "data visualization." -- - Attempting to convey the "story" found in the data. - Patterns, signals, variability. -- - Simple, clear, effective, and, possibly, elegant. -- - LOTS of opinions about graphing principles. --- # Tufte's Broad Guidelines .pull-left[ - Graphical excellence. - Visual integrity. - Maximize data-ink ratio. - Aesthetic elegance. ] .pull-right[ ![Edward Tufte](https://derekogle.com/NCGraphing/modules/zimgs/Tufte.jpg) ] --- # Graphical Excellence - Graphic should provide the user with “*the greatest number of ideas, in the shortest time, using the least amount of ink, in the smallest space.*” -- - Can be simple or complex, but whichever it is it fits the data and the narrative. --- # Graphical Excellence .center[ <img src="https://derekogle.com/NCGraphing/modules/zimgs/polarization-feat3.jpg" width="90%" /> <font size="1"><a href="http://thehigherlearning.com/2015/05/03/this-visualization-shows-how-ridiculously-divided-our-congress-has-become/">The Higher Learning</a></font> ] --- # Visual Integrity - Numerical scales should be properly proportionate (and not made to exaggerate (usually) or obfuscate the pattern). -- - Variations should relate to the data rather than the artistic interpretation of the data. --- # Visual Integrity .center[ <img src="https://derekogle.com/NCGraphing/modules/zimgs/Bad_Graph_1.JPG" width="90%" /> <font size="1"><a href="https://flowingdata.com/category/statistics/mistaken-data/">FLOWINGDATA</a></font> ] --- # Maximize data-ink ratio - Ratio of ink *required* to present data to ink *used* to present the data. -- - The closer this ratio is to 1 the less distracted the user will be and the easier it will be to visualize the data. -- - Borders, backgrounds, 3-D effects, etc. tend to decrease this ratio. --- # Maximize data-ink ratio .center[ <img src="https://derekogle.com/NCGraphing/modules/zimgs/Tufte_figure2.png" width="90%" /> <font size="1">From Tufte (1983)</a></font> ] --- # Aesthetic elegance - Not in the sense of "physical beauty." -- - More in the sense of simplicity used to evoke the complexity of the data. --- # Aesthetic elegance .center[ <img src="https://derekogle.com/NCGraphing/modules/zimgs/Savings_By_Income.JPG" width="80%" /> <font size="1"><a href="https://flowingdata.com/category/statistics/mistaken-data/">FLOWINGDATA</a></font> ] --- # Further Guidelines <font size="2">From <a href="https://mschermann.github.io/data_viz_reader/fundamentals.html#best-practices">A Reader on Data Visualization</a></font> - **Five Second Rule** - The average modern attention span for viewing anything online is <5 s. - If you can not grab attention within 5 s, you have likely lost your audience. - Include clear titles and instructions; tell people what the visualization shows and how to interact with it. -- - **Design and Layout Matter** - Design and layout should ease understanding your message. - Incorporate principles of graphic design to present a compelling story. --- # Further Guidelines - **Keep it Simple** - Keep graphs simple and easy to interpret. - Keep only necessary elements in the graph. - Help your audience understand quickly what is going on. -- - **Pretty Does Not Mean Effective** - Aesthetically pleasing visualizations are not necessarily more effective. - Pretty and eye-catching may be nice, but communicating the data properly is paramount. -- - **Use Color Purposefully** - Use of color may be attractive, but it can also be distracting. - Color should be used only if it assists in conveying your message. - Be consistent with the color scheme. --- class: inverse, center, middle <font size="7">Our focus will be on the mechanics of making graphs in R with enough flexibility that you can follow any principles you desire.</font> --- class: inverse, center, middle # What is ggplot2? --- # What is ggplot2? - A package for making graphics in R. - R is now one of the most used programs for data analysis. - `ggplot2` is one of most used packages for data visualization in R. -- - Originally based on Leland Wilkinson's *The Grammar of Graphics*. -- - Extended and brought to R by Hadley Wickham. -- - `ggplot2` describes itself as > "a system for declaratively creating graphics. You provide the data, tell `ggplot2` how to map variables to aesthetics, what graphical primitives to use, and it takes care of the details." --- # ggplot2 in the Wild 1 .center[ <img src="https://derekogle.com/NCGraphing/modules/zimgs/Ex_ggplot2_1.JPG" width="600px" /> <font size="1">From <a href="https://www.int-res.com/articles/esr2020/41/n041p319.pdf">Roos NC, Taylor BM, Carvalho AR, Longo GO. 2020. Demography of the largest and most endangered Brazilian Parrotfish, Scarus trispinosus, reveals overfishing. Endangered Species Research 41:319-327.</a></font> ] --- # ggplot2 in the Wild 2 .center[ <img src="https://derekogle.com/NCGraphing/modules/zimgs/Ex_ggplot2_2.JPG" width="700px" /> <font size="1">From <a href="https://agupubs-onlinelibrary-wiley-com.ezproxy.uwsp.edu/doi/full/10.1002/2015WR017519">Lessels JS et al. 2016. Water sources and mixing in riparian wetlands revealed by tracers and geospatial analysis. Water Resources Research. 52:456-470</a></font> ] --- # ggplot2 in the Wild 3 .center[ <img src="https://derekogle.com/NCGraphing/modules/zimgs/Ex_ggplot2_3.JPG" width="500px" /> <font size="1">From <a href="https://fivethirtyeight.com/features/the-midwest-is-getting-drenched-and-its-causing-big-problems/">538.com</a></font> ] --- # ggplot2 in the Wild 4 .center[ <img src="https://derekogle.com/NCGraphing/modules/zimgs/Ex_ggplot2_4.JPG" width="800px" /> <font size="1">From <a href="https://www.ft.com/coronavirus-latest">Financial Times</a></font> ] --- class: inverse, center, middle # Very Basics of ggplot2 --- # `tidyverse` Package - A package that loads many other packages. - `ggplot2`: What we need to use the "Grammar of Graphics" in R. - `dplyr`: For manipulating data. - `tidyr`: Also for manipulating data. - `lubridate`: For manipulating dates. -- - Load packages with `library()`. - Must be done every time you start coding in R. ```r library(tidyverse) ``` --- # Data Format - All `ggplot2` plots need data, usually in a "tidy format." - **Rows**: Contain one **individual**/subject/unit. - **Columns**: Contain one **variable** (characteristic recorded about an individual). -- - Example from data on Isle Royale Wolves and Moose ``` year wolves moose winter_temp ice_bridges 1 1959 20 538 1.40 no 2 1960 22 564 8.45 no 3 1961 22 572 9.75 yes 4 1962 23 579 2.15 yes 5 1963 20 596 -0.35 yes 6 1964 26 620 12.40 no 7 1965 28 634 1.25 yes 8 1966 26 661 1.70 yes 9 1967 22 766 2.75 yes 10 1968 22 848 5.85 yes ``` --- # Simple Data - For purposes of this introduction, we will use the simple data entered below. ```r dfobj <- data.frame(var1=c(3,1,5), var2=c(2,4,6), lbls=c("a","b","c")) dfobj ``` ``` var1 var2 lbls 1 3 2 a 2 1 4 b 3 5 6 c ``` --- # ggplot Base Plot ```r p <- ggplot(data=dfobj, mapping=aes(x=var1,y=var2)) ``` - All ggplots begin with `ggplot()`. -- - *Usually* has `data=` that declares a *global* data.frame for use. -- - *Usually* has `mapping=` that declares how 'aesthetics' are mapped to 'variables'. -- - Aesthetics are wrapped in `aes()` and *usually* require variables mapped to `x` and `y`. -- - Result is assigned to an object (here `p`). -- - Items ('layers') will be added to this base plot. -- - Plot not shown until object is "evaluated" (i.e., type `p`). --- # ggplot Base Plot ```r p ``` <img src="Lecture_Intro_ggplot2_files/figure-html/unnamed-chunk-16-1.png" width="60%" /> --- # Geometric Objects - Called 'geoms' for short. -- - Declare the geometric object or shape that should be plotted. -- - A 'layer' that is 'added' to the base plot with `+`. -- - Functions are like `geom_XXXX()`. -- - Some common 'geoms' are: - `geom_point()`: Shows data as points (think *scatterplot*). - `geom_line()`: Shows data as a connected line (think *line plot*). - `geom_boxplot()`: shows data as a boxplot. - `geom_bar()`: Shows data as a vertical bar (think *bar chart*). - `geom_histogram()`: Show data as a histogram. - `geom_rug()`: Shows values as 'ticks' along the axis. --- class: inverse, center, middle # Adding geoms to Base Plot --- class: split-50 count: false .column[.content[ ```r *dfobj ``` ]] .column[.content[ ``` var1 var2 lbls 1 3 2 a 2 1 4 b 3 5 6 c ``` ]] --- class: split-50 count: false .column[.content[ ```r dfobj *p ``` ]] .column[.content[ ``` var1 var2 lbls 1 3 2 a 2 1 4 b 3 5 6 c ``` <img src="Lecture_Intro_ggplot2_files/figure-html/scatter_user_2_output-1.png" width="100%" /> ]] --- class: split-50 count: false .column[.content[ ```r dfobj p + * geom_point() ``` ]] .column[.content[ ``` var1 var2 lbls 1 3 2 a 2 1 4 b 3 5 6 c ``` <img src="Lecture_Intro_ggplot2_files/figure-html/scatter_user_3_output-1.png" width="100%" /> ]] --- class: split-50 count: false .column[.content[ ```r *dfobj *p ``` ]] .column[.content[ ``` var1 var2 lbls 1 3 2 a 2 1 4 b 3 5 6 c ``` <img src="Lecture_Intro_ggplot2_files/figure-html/line_user_1_output-1.png" width="100%" /> ]] --- class: split-50 count: false .column[.content[ ```r dfobj p + * geom_line() ``` ]] .column[.content[ ``` var1 var2 lbls 1 3 2 a 2 1 4 b 3 5 6 c ``` <img src="Lecture_Intro_ggplot2_files/figure-html/line_user_2_output-1.png" width="100%" /> ]] --- class: split-50 count: false .column[.content[ ```r dfobj p + geom_path() ``` ]] .column[.content[ ``` var1 var2 lbls 1 3 2 a 2 1 4 b 3 5 6 c ``` <img src="Lecture_Intro_ggplot2_files/figure-html/path_1_1_output-1.png" width="100%" /> ]] --- class: split-50 count: false .column[.content[ ```r dfobj p + geom_area() ``` ]] .column[.content[ ``` var1 var2 lbls 1 3 2 a 2 1 4 b 3 5 6 c ``` <img src="Lecture_Intro_ggplot2_files/figure-html/area_1_1_output-1.png" width="100%" /> ]] --- class: split-50 count: false .column[.content[ ```r dfobj p + geom_polygon() ``` ]] .column[.content[ ``` var1 var2 lbls 1 3 2 a 2 1 4 b 3 5 6 c ``` <img src="Lecture_Intro_ggplot2_files/figure-html/polygon_1_1_output-1.png" width="100%" /> ]] --- class: split-50 count: false .column[.content[ ```r dfobj p + geom_tile() ``` ]] .column[.content[ ``` var1 var2 lbls 1 3 2 a 2 1 4 b 3 5 6 c ``` <img src="Lecture_Intro_ggplot2_files/figure-html/tile_1_1_output-1.png" width="100%" /> ]] --- class: split-50 count: false .column[.content[ ```r dfobj p + geom_bar(stat="identity") ``` ]] .column[.content[ ``` var1 var2 lbls 1 3 2 a 2 1 4 b 3 5 6 c ``` <img src="Lecture_Intro_ggplot2_files/figure-html/bar_1_1_output-1.png" width="100%" /> ]] --- class: split-50 count: false .column[.content[ ```r dfobj p + geom_text(aes(label=lbls)) ``` ]] .column[.content[ ``` var1 var2 lbls 1 3 2 a 2 1 4 b 3 5 6 c ``` <img src="Lecture_Intro_ggplot2_files/figure-html/text_1_1_output-1.png" width="100%" /> ]] --- class: inverse, center, middle # Adding geoms to geoms --- class: split-50 count: false .column[.content[ ```r *p ``` ]] .column[.content[ <img src="Lecture_Intro_ggplot2_files/figure-html/forFun_auto_1_output-1.png" width="100%" /> ]] --- class: split-50 count: false .column[.content[ ```r p + * geom_bar(stat="identity") ``` ]] .column[.content[ <img src="Lecture_Intro_ggplot2_files/figure-html/forFun_auto_2_output-1.png" width="100%" /> ]] --- class: split-50 count: false .column[.content[ ```r p + geom_bar(stat="identity") + * geom_line() ``` ]] .column[.content[ <img src="Lecture_Intro_ggplot2_files/figure-html/forFun_auto_3_output-1.png" width="100%" /> ]] --- class: split-50 count: false .column[.content[ ```r p + geom_bar(stat="identity") + geom_line() + * geom_point() ``` ]] .column[.content[ <img src="Lecture_Intro_ggplot2_files/figure-html/forFun_auto_4_output-1.png" width="100%" /> ]] --- class: split-50 count: false .column[.content[ ```r p + geom_bar(stat="identity") + geom_line() + geom_point() + * geom_text(aes(label=lbls),vjust=2) ``` ]] .column[.content[ <img src="Lecture_Intro_ggplot2_files/figure-html/forFun_auto_5_output-1.png" width="100%" /> ]] --- class: inverse, center, middle # Next Time <font size="7">We will start adding more control over layers in the graph.</font>