--- title: "ggplot2 examples and exercises" subtitle: "slides 1-49" output: html_document author: Thomas Lin Pederson (slightly modified by Elizabeth Stanny) --- ```{r setup, include=FALSE} knitr::opts_chunk$set(echo = TRUE, warning = FALSE, message = FALSE) ``` This document contains code for (slides 1-49). The document is an RMarkdown document which means that it can be compiled, along with the code chunks thus executing and capturing the output of the code within the document. To read more about RMarkdown see the website for the package, as well as the [Get Started](https://rmarkdown.rstudio.com/lesson-1.html) guide. ### Exercises While it is encouraged to follow along in this document as the workshop progresses and execute the code to see the result, an important part is also to experiment and play, thus learning how the different settings affect the output. The document will contain code chunks with the code examples discussed during the talk, but it will also contain chunks intended for completing small exercises. These will use the examples as a starting point and ask you to modify the code to achieve a given output. Completing these are optional, but highly recommended, either during or after the workshop. ```{r load-packages, echo=FALSE} library(pacman) p_load(tidyverse, RColorBrewer) theme_set(theme_minimal()) # change plot theme for all plots ``` ### Datasets We will use an assortment of datasets throughout the document. The purpose is mostly to showcase different plots, and less on getting some divine insight into the world. While not necessary we will call `data()` before using a new dataset to indicate the introduction of a new dataset. ## Introduction We will look at the basic ggplot2 use using the faithful dataset, giving information on the eruption pattern of the Old Faithful geyser in Yellowstone National Park. ```{r intro-slide-27-33} data("faithful") # see variables in faithful faithful %>% glimpse() # Basic scatterplot ggplot(data = faithful, mapping = aes(x = eruptions, y = waiting)) + geom_point() # Data and mapping can be given both as global (in ggplot()) or per layer ggplot() + geom_point(mapping = aes(x = eruptions, y = waiting), data = faithful) ``` If an aesthetic is linked to data it is put into `aes()` ```{r intro-slide-34} ggplot(faithful) + geom_point(aes(x = eruptions, y = waiting, colour = eruptions < 3)) ``` If you simple want to set it to a value, put it outside of `aes()` ```{r intro-slide-35} ggplot(faithful) + geom_point(aes(x = eruptions, y = waiting), colour = 'steelblue') ``` Some geoms only need a single mapping and will calculate the rest for you ```{r intro-slide-36} ggplot(faithful) + geom_histogram(aes(x = eruptions)) ``` geoms are drawn in the order they are added. The point layer is thus drawn on top of the density contours in the example below. ```{r intro-slide-37} ggplot(faithful, aes(x = eruptions, y = waiting)) + geom_density_2d() + geom_point() ``` #### Intro/geom exercises Modify the code below to make the points larger squares and slightly transparent. See `?geom_point` for more information on the point layer. ```{r geom-ex-1} ggplot(faithful) + geom_point(aes(x = eruptions, y = waiting)) ``` Hint 1: transparency is controlled with `alpha`, and shape with `shape` Hint 2: remember the difference between mapping and setting aesthetics * * * Colour the two distributions in the histogram with different colours ```{r geom-ex-2} ggplot(faithful) + geom_histogram(aes(x = eruptions)) ``` Hint 1: For polygons you can map two different colour-like aesthetics: `colour` (the colour of the stroke) and `fill` (the fill colour) * * * Colour the distributions in the histogram by whether `waiting` is above or below `60`. What happens? ```{r geom-ex-3} ggplot(faithful) + geom_histogram(aes(x = eruptions)) ``` Change the plot above by setting `position = 'dodge'` in `geom_histogram()` (while keeping the colouring by `waiting`). What do `position` control? * * * Add a line that separates the two point distributions. See `?geom_abline` for how to draw straight lines from a slope and intercept. ```{r geom-ex-4} ggplot(faithful) + geom_point(aes(x = eruptions, y = waiting)) ``` ### Stat We will use the `mpg` dataset giving information about fuel economy on different car models. Every geom has a stat. This is why new data (`count`) can appear when using `geom_bar()`. ```{r stat-slide-40} data("mpg") # variable definitions # ?mpg # mpg %>% glimpse() ggplot(mpg) + geom_bar(aes(x = class)) ``` The stat can be overwritten. If we have precomputed count we don't want any additional computations to perform and we use the `identity` stat to leave the data alone ```{r stat-slide-41} mpg_counted <- mpg %>% count(class, name = 'count') ggplot(mpg_counted) + geom_bar(aes(x = class, y = count), stat = 'identity') ``` Most obvious geom+stat combinations have a dedicated geom constructor. The one above is available directly as `geom_col()` ```{r stat-slide-42} ggplot(mpg_counted) + geom_col(aes(x = class, y = count)) ``` Values calculated by the stat is available with the `after_stat()` function inside `aes()`. You can do all sorts of computations inside that. ```{r stat-slide-43} ggplot(mpg) + geom_bar(aes(x = class, y = after_stat(100 * count / sum(count)))) ``` Many stats provide multiple variations of the same calculation, and provides a default (here, `density`) ```{r stat-slide-44} ggplot(mpg) + geom_density(aes(x = hwy)) ``` While the others must be used with the `after_stat()` function ```{r stat-slide-45} ggplot(mpg) + geom_density(aes(x = hwy, y = after_stat(scaled))) ``` #### Stat exercises While most people use `geom_*()` when adding layers, it is just as valid to add a `stat_*()` with an attached geom. Look at `geom_bar()` and figure out which stat it uses as default. Then modify the code to use the stat directly instead (i.e. adding `stat_*()` instead of `geom_bar()`) ```{r stat-ex-1} ggplot(mpg) + geom_bar(aes(x = class)) ``` * * * Use `stat_summary()` to add a red dot at the mean `hwy` for each group ```{r stat-ex-2} ggplot(mpg) + geom_jitter(aes(x = class, y = hwy), width = 0.2) ``` Hint: You will need to change the default geom of `stat_summary()` ### Scales Scales define how the mapping you specify inside `aes()` should happen. All mappings have an associated scale even if not specified. ```{r scales-slide-47} ggplot(mpg) + geom_point(aes(x = displ, y = hwy, colour = class)) ``` take control by adding one explicitly. All scales follow the same naming conventions. ```{r scales-slide-48} ggplot(mpg) + geom_point(aes(x = displ, y = hwy, colour = class)) + scale_colour_brewer(type = 'qual') ``` Positional mappings (x and y) also have associated scales. ```{r scales-slide-49} ggplot(mpg) + geom_point(aes(x = displ, y = hwy)) + scale_x_continuous(breaks = c(3, 5, 6)) + scale_y_continuous(trans = 'log10') ``` #### Scales exercises Use `RColorBrewer::display.brewer.all()` to see all the different palettes from Color Brewer and pick your favourite. Modify the code below to use it ```{r scales-ex-1} ggplot(mpg) + geom_point(aes(x = displ, y = hwy, colour = class)) + scale_colour_brewer(type = 'qual') ``` * * * Modify the code below to create a bubble chart (scatterplot with size mapped to a continuous variable) showing `cyl` with size. Make sure that only the present amount of cylinders (4, 5, 6, and 8) are present in the legend. ```{r scales-ex-2} ggplot(mpg) + geom_point(aes(x = displ, y = hwy, colour = class)) + scale_colour_brewer(type = 'qual') ``` Hint: The `breaks` argument in the scale is used to control which values are present in the legend. Explore the different types of size scales available in ggplot2. Is the default the most appropriate here? * * * Modify the code below so that colour is no longer mapped to the discrete `class` variable, but to the continuous `cty` variable. What happens to the guide? ```{r scales-ex-3} ggplot(mpg) + geom_point(aes(x = displ, y = hwy, colour = class, size = cty)) ``` * * * The type of guide can be controlled with the `guide` argument in the scale, or with the `guides()` function. Continuous colours have a gradient colour bar by default, but setting it to `legend` will turn it back to the standard look. What happens when multiple aesthetics are mapped to the same variable and uses the guide type? ```{r scales-ex-4} ggplot(mpg) + geom_point(aes(x = displ, y = hwy, colour = cty, size = cty)) ```