A ggplot2 Tutorial for Beautiful Plotting in R

Posted by Cédric on Monday, August 5, 2019

Last update: 2019-08-16

Introductory Words

Begin of 2016, I had to prepare my PhD introductory talk and I started using ggplot2 to visualize my data since I never liked the syntax and style of base plots in R. Because I was short on time, I plotted these figures by try’n’error and with the help of lots of googling. The resource I came always back to was a blog entry called Beautiful plotting in R: A ggplot2 cheatsheet by Zev Ross, posted on 4. August 2014, updated last in January 2016. After giving the talk which contained some quite beautiful plots thanks to the blog post, I decided to go through this tutorial step-by-step. I learned so much from it and directly started modifying the codes and over the time I added some additional code snippets, chart types and resources.

Since the blog entry by Zev Ross was not updated for some years, I hosted the updated version (last update Jan 2017) on my GitHub. Now it finds its proper place on this homepage! (Plus I added some updates, for example the fantastic patchwork and ggforce packages. And pie charts because everyone looooves pie charts!)

Major changes I’ve made:

  • to follow the R style guide (e.g. by Hadley Wickham, Google or the Coding Club style guides),
  • to change style and aesthetics of plots (e.g. axis titles, legends and nice colors for all plots not only some),
  • to have a updated version which keeps track of changes in ggplot2,
  • to modify data import (GitHub source),
  • to have an executable R script for exercises and workshops
  • to include additional tipps on e.g:
    • other plot types (e.g. contour plot, rug representation, ridge plot)
    • how and why to use the viridis color palettes
    • creating minimal plots using the Tufte plotting style
    • how to adjust the plots title, subtitle and captionl
    • how to add different types of lines to a plot
    • how to change the order in a legend and legend key names
    • how to add labels to your data (and how to do it in a beautiful way)

Preparation

  • You can download the data we are using in this post here.
  • You can find the Rmarkdwon script with the code executed in this blogpost here.
  • You need to install the following packages to execute this tutorial:
    • ggplot2
    • ggthemes
    • tidyverse
    • extrafont
    • patchwork
    • cowplot
    • grid
    • gridExtra
    • ggrepel
    • reshape2
    • ggforce
    • ggridges
    • shiny
install.packages(c("ggplot2", "ggthemes", "tidyverse", "extrafont", 
                   "cowplot", "grid", "gridExtra", "ggrepel", 
                   "reshape2", "ggforce", "ggridges", "shiny"))

devtools::install_github("thomasp85/patchwork")

(For teaching reasons and if people jump to any plot, I load the package needed beside ggplot2 in the respective chunk.)

The Dataset

We are using data from the National Morbidity and Mortality Air Pollution Study (NMMAPS). To make the plots manageable we are limiting the data to Chicago and 1997-2000. For more detail on this dataset, consult Roger Peng’s book Statistical Methods in Environmental Epidemiology with R.

chic <- readr::read_csv("https://raw.githubusercontent.com/Z3tt/R-Tutorials/master/ggplot2/chicago-nmmaps.csv")
tibble::glimpse(chic)
## Observations: 1,461
## Variables: 10
## $ city     <chr> "chic", "chic", "chic", "chic", "chic", "chic", "chic", "chic", "chic", "chic", "chic", "chic", "chic", "chic", "chic", "chic", "chic", "chic", "chic", "chic", "chic", "chic", "chic", "chic", "chic", "chic", "chic", "chic", "chic", "chic", "chic", "chic", "chic", "chic", "chic", "chic", "chic", "chic", "chic", "chic", "chic", "chic", "chic", "chic", "chic", "chic", "chic", "chic", "chic", "chic", "chic", "chic", "chic", "chic", "chic", "chic", "chic", "chic", "chic", "chic", "chic", "chic", "chic", "chic", "chic", "chic", "chic", "chic", "chic", "chic", "chic", "chic", "chic", "chic", "chic", "chic", "chic", "chic", "chic", "chic", "chic", "chic", "chic", "chic", "chic"...
## $ date     <date> 1997-01-01, 1997-01-02, 1997-01-03, 1997-01-04, 1997-01-05, 1997-01-06, 1997-01-07, 1997-01-08, 1997-01-09, 1997-01-10, 1997-01-11, 1997-01-12, 1997-01-13, 1997-01-14, 1997-01-15, 1997-01-16, 1997-01-17, 1997-01-18, 1997-01-19, 1997-01-20, 1997-01-21, 1997-01-22, 1997-01-23, 1997-01-24, 1997-01-25, 1997-01-26, 1997-01-27, 1997-01-28, 1997-01-29, 1997-01-30, 1997-01-31, 1997-02-01, 1997-02-02, 1997-02-03, 1997-02-04, 1997-02-05, 1997-02-06, 1997-02-07, 1997-02-08, 1997-02-09, 1997-02-10, 1997-02-11, 1997-02-12, 1997-02-13, 1997-02-14, 1997-02-15, 1997-02-16, 1997-02-17, 1997-02-18, 1997-02-19, 1997-02-20, 1997-02-21, 1997-02-22, 1997-02-23, 1997-02-24, 1997-02-25, 1997-...
## $ death    <dbl> 137, 123, 127, 146, 102, 127, 116, 118, 148, 121, 110, 127, 129, 151, 128, 132, 116, 142, 124, 124, 127, 121, 134, 120, 109, 109, 115, 105, 114, 120, 117, 126, 97, 96, 119, 125, 116, 118, 121, 114, 111, 107, 127, 98, 104, 122, 124, 120, 106, 103, 139, 133, 109, 121, 111, 105, 107, 123, 124, 125, 108, 114, 104, 120, 134, 101, 102, 125, 119, 115, 121, 112, 127, 99, 125, 115, 113, 105, 113, 120, 105, 119, 147, 123, 108, 117, 110, 106, 96, 119, 119, 99, 120, 130, 97, 105, 102, 104, 137, 111, 108, 96, 100, 105, 128, 120, 98, 118, 94, 117, 121, 110, 110, 108, 121, 114, 116, 109, 123, 115, 101, 118, 100, 126, 126, 121, 114, 112, 111, 111, 107, 124, 104, 107, 109, 133, 108, 109...
## $ temp     <dbl> 36.0, 45.0, 40.0, 51.5, 27.0, 17.0, 16.0, 19.0, 26.0, 16.0, 1.5, 1.0, 3.0, 10.0, 19.0, 9.5, -3.0, 0.0, 14.0, 31.0, 35.0, 36.5, 26.0, 32.0, 14.5, 11.0, 17.0, 2.0, 8.0, 16.5, 31.5, 35.0, 36.5, 30.0, 34.5, 30.0, 26.0, 25.5, 25.5, 26.0, 27.0, 23.5, 21.0, 20.5, 25.5, 20.0, 18.5, 30.0, 48.5, 37.5, 35.5, 36.0, 26.0, 28.0, 21.5, 25.5, 36.5, 34.5, 37.5, 45.5, 35.0, 33.5, 38.0, 33.0, 26.5, 35.5, 39.0, 37.0, 44.0, 37.0, 33.5, 37.5, 26.5, 19.0, 24.5, 45.0, 33.5, 35.5, 46.0, 53.5, 37.5, 32.5, 33.0, 40.5, 44.0, 60.5, 55.5, 43.5, 37.5, 38.5, 44.5, 53.0, 59.5, 62.5, 60.5, 45.0, 34.0, 28.5, 30.0, 30.5, 33.5, 33.5, 38.5, 41.5, 49.0, 43.0, 40.5, 40.0, 45.5, 49.0, 45.0, 43.0, 48.5, 47.5, 4...
## $ dewpoint <dbl> 37.50000, 47.25000, 38.00000, 45.50000, 11.25000, 5.75000, 7.00000, 17.75000, 24.00000, 5.37500, -6.62500, -8.87500, 1.50000, 11.50000, 23.25000, -9.75000, -10.37500, -4.12500, 22.62500, 27.25000, 41.62500, 20.75000, 18.75000, 29.50000, -1.37500, 17.12500, 8.37500, -6.37500, 11.00000, 16.37500, 33.75000, 29.66667, 29.62500, 28.00000, 32.00000, 24.25000, 21.87500, 23.37500, 22.50000, 21.00000, 21.75000, 19.50000, 11.60000, 16.37500, 23.00000, 15.25000, 8.12500, 32.62500, 41.37500, 27.50000, 44.12500, 29.62500, 24.25000, 14.62500, 10.87500, 27.12500, 35.00000, 30.25000, 36.00000, 44.00000, 27.37500, 29.37500, 28.87500, 28.62500, 13.37500, 35.25000, 28.25000, 32.62500, 33....
## $ pm10     <dbl> 13.052268, 41.948600, 27.041751, 25.072573, 15.343121, 9.364655, 20.228428, 33.134819, 12.118381, 24.761534, 18.126151, 16.013770, 34.991079, 64.945403, 26.941955, 27.022906, 18.837025, 31.859740, 30.923168, 19.894566, 27.882017, 18.508762, 11.845698, 26.687346, 16.612825, 21.641455, 22.672498, 28.101180, 51.776607, 48.741462, 24.686329, 23.784943, 27.762150, 21.600928, 17.050900, 10.157749, 15.943086, 33.010704, 14.955909, 30.410449, 23.914813, 22.972347, 12.712336, 22.719836, 35.676001, 28.373076, 15.662430, 38.744847, 27.597166, 17.612211, 29.768805, 7.340321, 7.856717, 7.908915, 17.834350, 41.124012, 34.052583, 19.749350, 26.126759, 28.129506, 9.940940, 15.980970, 2...
## $ o3       <dbl> 5.659256, 5.525417, 6.288548, 7.537758, 20.760798, 14.940874, 11.920985, 8.678477, 13.355892, 10.448264, 15.866094, 15.115290, 9.381068, 8.029508, 7.066111, 20.113023, 15.363898, 12.713223, 9.616133, 16.840369, 12.758676, 21.024213, 18.665072, 7.131938, 17.167861, 9.960118, 9.167350, 13.613967, 7.945009, 7.660619, 11.882608, 16.676182, 12.032368, 21.849559, 10.887549, 14.894031, 15.957824, 14.391243, 19.749645, 12.397635, 14.193562, 20.492388, 23.091993, 20.171005, 15.453240, 19.526661, 20.019234, 17.297562, 27.013275, 19.055436, 6.890252, 16.313610, 23.015853, 24.990318, 18.939318, 12.526243, 7.962753, 13.194153, 15.178614, 13.860717, 30.992349, 29.260852, 15.413875, 1...
## $ time     <dbl> 3654, 3655, 3656, 3657, 3658, 3659, 3660, 3661, 3662, 3663, 3664, 3665, 3666, 3667, 3668, 3669, 3670, 3671, 3672, 3673, 3674, 3675, 3676, 3677, 3678, 3679, 3680, 3681, 3682, 3683, 3684, 3685, 3686, 3687, 3688, 3689, 3690, 3691, 3692, 3693, 3694, 3695, 3696, 3697, 3698, 3699, 3700, 3701, 3702, 3703, 3704, 3705, 3706, 3707, 3708, 3709, 3710, 3711, 3712, 3713, 3714, 3715, 3716, 3717, 3718, 3719, 3720, 3721, 3722, 3723, 3724, 3725, 3726, 3727, 3728, 3729, 3730, 3731, 3732, 3733, 3734, 3735, 3736, 3737, 3738, 3739, 3740, 3741, 3742, 3743, 3744, 3745, 3746, 3747, 3748, 3749, 3750, 3751, 3752, 3753, 3754, 3755, 3756, 3757, 3758, 3759, 3760, 3761, 3762, 3763, 3764, 3765, 3766, ...
## $ season   <chr> "Winter", "Winter", "Winter", "Winter", "Winter", "Winter", "Winter", "Winter", "Winter", "Winter", "Winter", "Winter", "Winter", "Winter", "Winter", "Winter", "Winter", "Winter", "Winter", "Winter", "Winter", "Winter", "Winter", "Winter", "Winter", "Winter", "Winter", "Winter", "Winter", "Winter", "Winter", "Winter", "Winter", "Winter", "Winter", "Winter", "Winter", "Winter", "Winter", "Winter", "Winter", "Winter", "Winter", "Winter", "Winter", "Winter", "Winter", "Winter", "Winter", "Winter", "Winter", "Winter", "Winter", "Winter", "Winter", "Winter", "Winter", "Winter", "Winter", "Winter", "Winter", "Winter", "Winter", "Winter", "Winter", "Winter", "Winter", "Winter"...
## $ year     <dbl> 1997, 1997, 1997, 1997, 1997, 1997, 1997, 1997, 1997, 1997, 1997, 1997, 1997, 1997, 1997, 1997, 1997, 1997, 1997, 1997, 1997, 1997, 1997, 1997, 1997, 1997, 1997, 1997, 1997, 1997, 1997, 1997, 1997, 1997, 1997, 1997, 1997, 1997, 1997, 1997, 1997, 1997, 1997, 1997, 1997, 1997, 1997, 1997, 1997, 1997, 1997, 1997, 1997, 1997, 1997, 1997, 1997, 1997, 1997, 1997, 1997, 1997, 1997, 1997, 1997, 1997, 1997, 1997, 1997, 1997, 1997, 1997, 1997, 1997, 1997, 1997, 1997, 1997, 1997, 1997, 1997, 1997, 1997, 1997, 1997, 1997, 1997, 1997, 1997, 1997, 1997, 1997, 1997, 1997, 1997, 1997, 1997, 1997, 1997, 1997, 1997, 1997, 1997, 1997, 1997, 1997, 1997, 1997, 1997, 1997, 1997, 1997, 1997, ...
head(chic, 10)
## # A tibble: 10 x 10
##    city  date       death  temp dewpoint  pm10    o3  time season  year
##    <chr> <date>     <dbl> <dbl>    <dbl> <dbl> <dbl> <dbl> <chr>  <dbl>
##  1 chic  1997-01-01   137  36      37.5  13.1   5.66  3654 Winter  1997
##  2 chic  1997-01-02   123  45      47.2  41.9   5.53  3655 Winter  1997
##  3 chic  1997-01-03   127  40      38    27.0   6.29  3656 Winter  1997
##  4 chic  1997-01-04   146  51.5    45.5  25.1   7.54  3657 Winter  1997
##  5 chic  1997-01-05   102  27      11.2  15.3  20.8   3658 Winter  1997
##  6 chic  1997-01-06   127  17       5.75  9.36 14.9   3659 Winter  1997
##  7 chic  1997-01-07   116  16       7    20.2  11.9   3660 Winter  1997
##  8 chic  1997-01-08   118  19      17.8  33.1   8.68  3661 Winter  1997
##  9 chic  1997-01-09   148  26      24    12.1  13.4   3662 Winter  1997
## 10 chic  1997-01-10   121  16       5.38 24.8  10.4   3663 Winter  1997

The ggplot2 Package

ggplot2 is a system for declaratively creating graphics, based on The Grammar of Graphics. You provide the data, tell ggplot2 how to map variables to aesthetics, what graphical primitives to use, and it takes care of the details.

Consequently, a ggplot is built up from a few basic elements:

  1. Data:
    The raw data that you want to plot.
  2. Geometries geom_:
    The geometric shapes that will represent the data.
  3. Aesthetics aes():
    Aesthetics of the geometric and statistical objects, such as color, size, shape, transparency and position.
  4. Scales scale_:
    Maps between the data and the aesthetic dimensions, such as data range to plot width or factor values to colors.
  5. Statistical transformations stat_:
    Statistical summaries of the data, such as quantiles, fitted curves and sums.
  6. Coordinate system coord_:
    The transformation used for mapping data coordinates into the plane of the data rectangle.
  7. Facets facet_:
    The arrangement of the data into a grid of plots.
  8. Visual themes theme():
    The overall visual defaults of a plot, such as background, grids, axes, default typeface, sizes and colors.

A Default ggplot

First, we are going to load the ggplot2package (which we can also load via the tidyverse package collection):

library(ggplot2)
#library(tidyverse)

ggplot2 syntax is different from base R. As shown before, we always start to define a plotting element by calling ggplot(data = df, aes(x = variable1, y = variable2)) which just tells ggplot2 that we are going to work with that data. Thus, only a panel is created when running this because ggplot2 does not know how we want to plot that data.

(g <- ggplot(chic, aes(x = date, y = temp)))

Tipp: By using parentheses while creating an object the object will be printed immediately (instead of writing g <- ggplot(...) and then g).

Let’s tell ggplot which style we want to use:

g + geom_point()

No worries, we are going to learn several plot types at a later point.

Change Color of Points

Within this command, you already can insert aesthetics as changing the color of your points:

g + geom_point(color = "firebrick")

By applying that to our plotting element, the following plots based on g are going to have red points.

And let’s get rid of the greyish default ggplot look by setting a different built-in theme, e.g. theme_bw:

theme_set(theme_bw())

g + geom_point(color = "firebrick")

(You can find more on how to use built-in themes and how to customize themes in the secion “Working with Themes”.)

Jump back to Table of Content.

Working with Axes

Add Axis Labels

Let’s add some well-written labels to the axes:

ggplot(chic, aes(x = date, y = temp)) +
  geom_point(color = "firebrick") + 
  labs(x = "Year", y = expression(paste("Temperature (", degree ~ F, ")")))

Move Labels Away from the Plot & Change Color

theme() is an essential command to modify all kinds of theme elements (texts and titles, boxes, symbols, backgrounds, …). We will use a lot of them – to see what is possible have a look here.

ggplot(chic, aes(x = date, y = temp)) +
  geom_point(color = "firebrick") + 
  labs(x = "Year", y = "Temperature (°F)") + 
  theme(axis.title.x = element_text(color = "sienna", 
                                    size = 15, vjust = -0.35),
        axis.title.y = element_text(color = "orangered", 
                                    size = 15, vjust = 0.35))

Change Size & Angle of Tick Text

Using angle and vjust you can adjust the position of the text (0 = left-alligned, 0.5 = centered, 1 = right-alligned):

ggplot(chic, aes(x = date, y = temp)) +
  geom_point(color = "firebrick") + 
  labs(x = "Year", y = "Temperature (°F)") +
  theme(axis.text.x = element_text(angle = 50, size = 16, 
                                   vjust = 0.5))

Remove Axis Ticks & Tick Text

There may be rarely a reason to do so - but this is how it works:

ggplot(chic, aes(x = date, y = temp)) +
  geom_point(color = "firebrick") + 
  labs(x = "Year", y = "Temperature (°F)") +
  theme(axis.ticks.y = element_blank(), 
        axis.text.y = element_blank())

If you want to get rid of a theme element, the element of the argument is always element_blank().

Limit Axis Range

Sometimes you want to zoom into your data. You can do this without subsettting your data:

ggplot(chic, aes(x = date, y = temp)) +
  geom_point(color = "firebrick") + 
  labs(x = "Year", y = "Temperature (°F)") +
  ylim(c(0, 50))

Alternatively you can use g + scale_x_continuous(limits = c(0, 50)) or g + coord_cartesian(xlim = c(0, 50)). The former removes all data points outside the range and the second adjusts the visible area.

Force Plot to Start at Origin

Related to that, you can force R to plot the graph starting at the origin:

library(tidyverse)

chic %>% 
  dplyr::filter(temp > 25, o3 > 20) %>% 
  ggplot(aes(x = temp, y = o3)) + 
    geom_point() + 
    labs(x = expression(paste("Temperature higher than 25 ", degree ~ F, "")), 
         y = "Ozone higher than 20 ppb") + 
   expand_limits(x = 0, y = 0)

Using coord_cartesian(xlim = c(0, max(chic_red$temp)), ylim = c(0, max(chic_red$o3))) will lead to the same result.

But also force it to literally start at the origin!

chic %>% 
  dplyr::filter(temp > 25, o3 > 20) %>% 
  ggplot(aes(x = temp, y = o3)) + 
    geom_point() + 
    labs(x = expression(paste("Temperature higher than 25 ", degree ~ F, "")), 
         y = "Ozone higher than 20 ppb") + 
    expand_limits(x = 0, y = 0) + 
    scale_x_continuous(expand = c(0, 0)) + 
    scale_y_continuous(expand = c(0, 0)) +
    coord_cartesian(clip = "off")

Axes with Same Scaling

For demonstrating purposes, let’s plot Temperature against Temperature with some random noise:

ggplot(chic, aes(x = temp, y = temp + rnorm(nrow(chic), sd = 20))) +
   geom_point() +
   labs(x = "Temperature (°F)") +
   xlim(c(0, 100)) + ylim(c(0, 150)) +
   coord_equal()

Use a Function to Alter Labels

Sometimes it is handy to alter your labels a little, perhaps adding units or percent signs without adding them to your data. You can use a function in this case. Here is an example:

ggplot(chic, aes(x = date, y = temp)) +
   geom_point(color = "firebrick") +
   labs(x = "Year", y = "Temperature (°F)") +
   scale_y_continuous(label = function(x) {return(paste(x, "Degrees Fahrenheit"))})  

Jump back to Table of Content.

Working with Titles

Add a Title

We can add a title via the ggtitle() function:

ggplot(chic, aes(x = date, y = temp)) +
  geom_point(color = "firebrick") + 
  labs(x = "Year", y = "Temperature (°F)") + 
  ggtitle("Temperatures in Chicago")

Alternatively, you can use g + labs(tite = "Temperatures in Chicago"). Here you can add several arguments, e.g. additionally a subtitle and a caption:

ggplot(chic, aes(x = date, y = temp)) +
  geom_point(color = "firebrick") + 
  labs(x = "Year", y = "Temperature (°F)",
       title = "Temperatures in Chicago", 
       subtitle = "Seasonal pattern of daily temperatures from 1997 to 2001", 
       caption = "Data: NMMAPS")

Make Title Bold & Add a Space at the Baseline

The face argument can be used to make the font bold or italic. The margin argument uses the margin function and you provide the top, right, bottom and left margins (the default unit is points).

ggplot(chic, aes(x = date, y = temp)) +
  geom_point(color = "firebrick") + 
  labs(x = "Year", y = "Temperature (°F)", 
       title = "Temperatures in Chicago") + 
  theme(plot.title = element_text(size = 15, face = "bold", 
                                  margin = margin(10, 0, 10, 0)))

(A nice way to remember the order of the margin arguments is "trouble that resembles the first letter of the four sides.)

Adjust Position of Titles

Allignement is controlled by hjust (which stands for horizontal adjustment):

ggplot(chic, aes(x = date, y = temp)) +
  geom_point(color = "firebrick") +  
  labs(x = "Year", y = "Temperature (°F)", 
       title = "Temperatures in Chicago") + 
  theme(plot.title = element_text(size = 15, face = 4, hjust = 1))

Of course, there it is also possible to adjust the vertical alignment, controlled by vjust.

Use a Non-Traditional Font in Your Title

Note that you can also use different fonts. To use fonts which are installed on your machine (and you may be using in your office program) we get help from a package called extrafont. After we loaded the package, you need to import and load the fonts ofinstalled on your device:

library(extrafont)
extrafont::font_import()
## Importing fonts may take a few minutes, depending on the number of fonts and the speed of the system.
## Continue? [y/n]
## extrafont::loadfonts(device = "win")

You can have a look on your imported font libary, by typing fonts() or fonttable().

Now, we can use one of those font families:

ggplot(chic, aes(x = date, y = temp)) +
  geom_point(color = "firebrick") + 
  labs(x = "Year", y = "Temperature (°F)", title = "Temperatures in Chicago") + 
  theme(plot.title = element_text(size = 18, family = "Merriweather"))

(You can also set a non-default font for all text elements of your plots, for more details see section “Working with Themes”. I am going to use Roboto Condensed as new default font for the following plots.)

theme_set(theme_bw(base_size = 12, base_family = "Roboto Condensed"))

Change Spacing in Multi-Line Text

You can use the lineheight argument to change the spacing between lines. In this example, I have squished the lines together a bit (lineheight < 1).

ggplot(chic, aes(x = date, y = temp)) +
  geom_point(color = "firebrick") + 
  labs(x = "Year", y = "Temperature (°F)") + 
  ggtitle("Temperatures in Chicago\nfrom 1997 to 2001") + 
  theme(plot.title = element_text(size = 16, face = "bold", 
                                  vjust = 1, lineheight = 0.75))

Jump back to Table of Content.

Working with Legends

We will color code the plot based on season. You can see that by default the legend title is what we specified in the color argument:

ggplot(chic, aes(x = date, y = temp, color = factor(season))) +
  geom_point() +
  labs(x = "Year", y = "Temperature (°F)")

Turn Off the Legend

Always one of the first question is: “How can I get rid of the legend?”.

It is quite easy and always works with legend.position = "none":

ggplot(chic, aes(x = date, y = temp, color = factor(season))) +
  geom_point() +
  labs(x = "Year", y = "Temperature (°F)") + 
  theme(legend.position = "none")

You can also use guides(fill = F) or use scale_fill_discrete(guide = F) depending on the specific case.

Turn Off Legend Titles

As we already learned, use element_blank() to draw nothing:

ggplot(chic, aes(x = date, y = temp, color = factor(season))) +
  geom_point() +
  labs(x = "Year", y = "Temperature (°F)") + 
  theme(legend.title = element_blank())

Change Legend Position

If you want to place the legend not on the right, one uses legend.position as argument in theme:

ggplot(chic, aes(x = date, y = temp, color = factor(season))) +
  geom_point() +
  labs(x = "Year", y = "Temperature (°F)") + 
  theme(legend.position = "bottom")

Possible positions are “top”, “right”, “bottom”, and “left”.

Change Style of Legend Titles

You can change the appearance of the legend title by adjusting the theme element legend.title:

ggplot(chic, aes(x = date, y = temp, color = factor(season))) +
  geom_point() +
  labs(x = "Year", y = "Temperature (°F)") + 
  theme(legend.title = element_text(color = "chocolate", 
                                    size = 14, face = "bold"))

Change Legend Title

The most easiest way to change the title of the legend using the labs argument as well:

ggplot(chic, aes(x = date, y = temp, color = factor(season))) +
  geom_point() +
  labs(x = "Year", y = "Temperature (°F)", color = "Seasons\nindicated\nby colors:") + 
  theme(legend.title = element_text(color = "chocolate", 
                                    size = 14, face = "bold"))

The legend details can be changed via scale_color_discrete or scale_color_continuous depending on the type of variable displaying.

ggplot(chic, aes(x = date, y = temp, color = factor(season))) +
  geom_point() +
  labs(x = "Year", y = "Temperature (°F)") + 
  theme(legend.title = element_text(color = "chocolate", 
                                    size = 14, face = "bold")) +
  scale_color_discrete(name = "Seasons\nindicated\nby colors:")

Change Order of Legend Keys

We can archieve this by changing the levels of season:

chic$season <- factor(chic$season, levels = c("Spring", "Summer", 
                                              "Autumn", "Winter"))

ggplot(chic, aes(x = date, y = temp, color = factor(season))) +
  geom_point() +
  labs(x = "Year", y = "Temperature (°F)")

Change Legend Labels

We are going to replace the seasons by the months which they are covering:

ggplot(chic, aes(x = date, y = temp, color = factor(season))) +
  geom_point() +
  labs(x = "Year", y = "Temperature (°F)") + 
  theme(legend.title = element_text(color = "chocolate", 
                                    size = 14, face = 2)) +
  scale_color_discrete("Seasons:", labels = c("Mar - May", "Jun - Aug", 
                                              "Sep - Nov", "Dec - Feb"))

Change Background Boxes in the Legend

To change the background color (fill) of the legend keys, we adjust the setting for the theme element legend.key:

ggplot(chic, aes(x = date, y = temp, color = factor(season))) +
  geom_point() +
  labs(x = "Year", y = "Temperature (°F)") + 
  theme(legend.key = element_rect(fill = "darkgoldenrod1"),
        legend.title = element_text(color = "chocolate", 
                                    size = 14, face = 2)) +
  scale_color_discrete("Seasons:")

If you want to get rid of them entirely use fill = NA.

Change Size of Legend Symbols

Points in the legend get a little lost, especially without the boxes. To override the default try:

ggplot(chic, aes(x = date, y = temp, color = factor(season))) +
  geom_point() +
  labs(x = "Year", y = "Temperature (°F)") + 
  theme(legend.key = element_rect(fill = NA),
        legend.title = element_text(color = "chocolate", 
                                    size = 14, face = 2)) +
  scale_color_discrete("Seasons:") +
  guides(color = guide_legend(override.aes = list(size = 6)))

Leave a Layer Off the Legend

Let’s say you have a point layer and you add a rug plot of the same data. By default, both the points and the “line” end up in the legend like this:

ggplot(chic, aes(x = date, y = temp, color = factor(season))) +
  geom_point() +
  labs(x = "Year", y = "Temperature (°F)") + 
  geom_rug() +
  theme(legend.title = element_text(color = "chocolate", 
                                    size = 14, face = 2)) +
  scale_color_discrete("Seasons:")

You can use show.legend = F to turn off a layer in the legend:

ggplot(chic, aes(x = date, y = temp, color = factor(season))) +
  geom_point() +
  labs(x = "Year", y = "Temperature (°F)") + 
  geom_rug(show.legend = F) +
  theme(legend.title = element_text(color = "chocolate", size = 14, face = 2)) +
  scale_color_discrete("Seasons:")

Manually Adding Legend Items

ggplot2 will not add a legend automatically unless you map aethetics (color, size etc) to a variable. There are times, though, that I want to have a legend so that it is clear what you are plotting.

Here is the default:

ggplot(chic, aes(x = date, y = o3)) +
  geom_line(color = "gray") +
  geom_point(color = "darkorange2") +
  labs(x = "Year", y = "Ozone")

We can force a legend by mapping a guide to a variable. We are mapping the lines and the points using aes() and we are mapping not to a variable in our dataset but to a single string (so that we get just one color for each).

ggplot(chic, aes(x = date, y = o3)) +
  geom_line(aes(color = "line")) +
  geom_point(aes(color = "points")) +
  labs(x = "Year", y = "Ozone") +
  scale_color_discrete("Type:")

We are getting close but this is not what we want. We want gray and red! To change the color, we use scale_color_manual(). Additionally, we override the legend aesthetics using the guide() function.

Voila! Now, we have a plot with frey lines and red pints as well as a single gray line and a single red point as legend symbols:

ggplot(chic, aes(x = date, y = o3)) + 
  geom_line(aes(color = "line")) +  
  geom_point(aes(color = "points")) +
  labs(x = "Year", y = "Ozone") +
  scale_color_manual("", guide = "legend", 
                     values = c("points" = "darkorange2", 
                                "line" = "gray")) +
  guides(color = guide_legend(override.aes = list(linetype = c(1, 0), 
                                                  shape = c(NA, 16))))

Jump back to Table of Content.

Working with Backgrounds & Grid Lines

There are ways to change the entire look of your plot with one function (see below) but if you want to simply change the colors of some elements, you can also do that.

Change the Panel Color

ggplot(chic, aes(x = date, y = temp)) +
  geom_point(color = "firebrick") +
  labs(x = "Year", y = "Temperature (°F)") +
  theme(panel.background = element_rect(fill = "moccasin"))

Change Grid Lines

There are two types of grid lines: major grid lines indicating the ticks and minor grid lines between the major ones.

ggplot(chic, aes(x = date, y = temp)) +
  geom_point(color = "firebrick") +
  labs(x = "Year", y = "Temperature (°F)") + 
  theme(panel.background = element_rect(fill = "grey90"),
        panel.grid.major = element_line(color = "gray10", size = 0.5),
        panel.grid.minor = element_line(color = "gray70", size = 0.25))

Furthermore, you can also define the breaks between both, major and minor grid lines:

ggplot(chic, aes(x = date, y = temp)) +
  geom_point(color = "firebrick") +
  labs(x = "Year", y = "Temperature (°F)") + 
  scale_y_continuous(breaks = seq(0, 100, 10),
                     minor_breaks = seq(0, 100, 2.5))

Change the Plot Background Color

To change the background color (fill) of the plot area, one needs to adjust the theme element plot.background:

ggplot(chic, aes(x = date, y = temp)) +
  geom_point(color = "firebrick") +
  labs(x = "Year", y = "Temperature (°F)") + 
  theme(plot.background = element_rect(fill = "gray60"))

Jump back to Table of Content.

Working with Margins

Sometimes it is useful to add a little space to the plot margin. Similar to the previous examples we can use an argument to the theme() function. In this case the argument is plot.margin. As In the previous example we already illustrated the default margin by changing the background color using plot.background.

Now let us add extra space to both the left and right. The argument, plot.margin, can handle a variety of different units (cm, inches, etc.) but it requires the use of the function unit from the package grid to specify the units. Here I am using a 5 cm margin on the right and left.

ggplot(chic, aes(x = date, y = temp)) +
  geom_point(color = "firebrick") +
  labs(x = "Year", y = "Temperature (°F)") + 
  theme(plot.background = element_rect(fill = "gray60"),
        plot.margin = unit(c(1, 5, 1, 5), "cm"))

The order of the margin sides is top, right, bottom, left - a nice way to remember this order is "trouble that sorts the first letter of the four sides.

Jump back to Table of Content.

Working with Multi-Panel Plots

The ggplot2 package has two nice functions for creating multi-panel plots. They are related but a little different facet_wrap creates essentially a ribbon of plots based on a single variable while facet_grid can take two variables.

Create a Single Row of Plots Based on One Variable

facet_wrap creates a facet of a single variable, written with a tilde in front: facet_wrap(~ variable). The appearance of these subplots is controlled by the arguments ncol and nrow:

g <- ggplot(chic, aes(x = date, y = temp)) +
       geom_point(color = "chartreuse4") +
       labs(x = "Year", y = "Temperature (°F)")

g + facet_wrap(~ year, nrow = 1) +
    theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust = 1))

Create a Matrix of Plots Based on One Variable

g + facet_wrap(~ year, nrow = 2)

Allow Scales to Roam Free

The default for multi-panel plots in ggplot2 is to use equivalent scales in each panel. But sometimes you want to allow a panels own data to determine the scale. This is not often a good idea since it may give your user the wrong impression about the data but to do this you can set scales = "free" like this:

g + facet_wrap(~ year, nrow = 2, scales = "free")

Note that both, x and y axes differ in their range!

Create a Grid of Plots Based on Two Variables

In case of two variables, facet_grid does the job. Here, the order of the variables detemrines the number of rows and columns:

ggplot(chic, aes(x = date, y = temp)) +
  geom_point(color = "orangered") +
  theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust = 1)) +
  labs(x = "Year", y = "Temperature (°F)") +
  facet_grid(year ~ season)

To change from row to column arrangement you can change facet_grid(year ~ season) to facet_grid(season ~ year).

Put Two (Different) Plots Side by Side

There are several ways how plots can be combined. The most easiest approach in my opinion is the patchwork package by Thomas Lin Pedersen:

p1 <- ggplot(chic, aes(x = date, y = temp, 
                       color = factor(season))) + 
        geom_point() + 
        geom_rug() +
        labs(x = "Year", y = "Temperature (°F)")

p2 <- ggplot(chic, aes(x = date, y = o3)) + 
        geom_line(color = "gray") + 
        geom_point(color = "darkorange2") + 
        labs(x = "Year", y = "Ozone")

library(patchwork)
p1 + p2

We can change the order by “dividing” both plots (and note the allignement even though one has a legend and one doesn’t!):

p1 / p2

And also nested plots are possible!

(g + p2) / p1

(Note the alignment of the plots even though only one row contains a legend.)

Alternatively, the cowplot package by Claus Wilke provides the same functonality (and lots of other good utilities):

library(cowplot)
plot_grid(p1, p2)

… and so does the gridExtra package as well:

library(gridExtra)
grid.arrange(p1, p2, ncol = 2)

Jump back to Table of Content.

Working with Themes

Change the Overall Plotting Style

You can change the entire look of the plots by using themes. As an example, Jeffrey Arnold has put together the library ggthemes with several custom themes. For a list you can visit the ggthemes site. Without any coding you can just adapt several styles, some of them well known for their style and aesthetics.

Here is an example copying the plotting style in the The Economist magazine:

library(ggthemes)

ggplot(chic, aes(x = date, y = temp, color = factor(season))) +
  geom_point() +
  labs(x = "Year", y = "Temperature (°F)") + 
  ggtitle("Ups and Downs of Chicago's Daily Temperatures") +
  theme_economist() + 
  scale_color_economist(name = "Seasons:") +
  theme(legend.title = element_text(size = 12, face = "bold"))

Another example is the plotting style of Tufte, a minimal ink theme based on Edward Tufte’s book The Visual Display of Quantitative Information. This is the book that popularized Minard’s chart depicting Napoleon’s march on Russia as one of the best statistical drawings ever created. Tuftes plots became famous due to the purism in their style. But see yourself:

set.seed(2019)
chic.red <- chic[sample(nrow(chic), 50), ]

ggplot(chic.red, aes(x = temp, y = o3)) +
  geom_point() +
  labs(x = "Temperature (°F)", y = "Ozone") + 
  ggtitle("Temperature and Ozone Levels in Chicago") +
  theme_tufte() +
  stat_smooth(method = "lm", col = "black", size = 0.7, 
              fill = "gray60", alpha = 0.2)

Since Tufte’s style is about minimalism, we first reduced the number of data points shown to (at least) try to follow his rules. (Do not care about that stat_smooth() command, I will explain it later. Just added it to make plot more interesting.)

ggplot(chic.red, aes(x = temp, y = o3)) +
  geom_point() +
  labs(x = "Temperature (°F)", y = "Ozone") + 
  ggtitle("Temperature and Ozone Levels in Chicago") +
  theme_tufte() +
  stat_smooth(method = "lm", col = "black", size = 0.7, 
              fill = "gray60", alpha = 0.2) + 
  geom_rangeframe()

If you like the way of plotting have a look on this blog entry recreating several Tufte plots in R.

Change the Size of All Plot Text Elements

It is incredibly easy to change the size of all the text elements at once. If you have a closer look at the default theme (see chapter “Create and Use Your Custom Theme” below) you will notice that the sizes of all the elements are relative (rel()) to the base_size. As a result, you can simply change the base_size if you want to increase readability of your plots:

theme_set(theme_gray(base_size = 30, base_family = "Roboto Condensed"))

ggplot(chic, aes(x = date, y = temp, color = factor(season))) + 
  geom_point() + 
  labs(x = "Year", y = "Temperature (°F)") + 
  guides(color = F) 

Create and Use Your Custom Theme

If you want to change the theme for an entire session you can use theme_set as in theme_set(theme_bw()). The default is called theme_gray. If you wanted to create your own custom theme, you could extract the code directly from the gray theme and modify. Note that the rel() function change the sizes relative to the base_size.

theme_gray
## function (base_size = 11, base_family = "", base_line_size = base_size/22, 
##     base_rect_size = base_size/22) 
## {
##     half_line <- base_size/2
##     theme(line = element_line(colour = "black", size = base_line_size, 
##         linetype = 1, lineend = "butt"), rect = element_rect(fill = "white", 
##         colour = "black", size = base_rect_size, linetype = 1), 
##         text = element_text(family = base_family, face = "plain", 
##             colour = "black", size = base_size, lineheight = 0.9, 
##             hjust = 0.5, vjust = 0.5, angle = 0, margin = margin(), 
##             debug = FALSE), axis.line = element_blank(), axis.line.x = NULL, 
##         axis.line.y = NULL, axis.text = element_text(size = rel(0.8), 
##             colour = "grey30"), axis.text.x = element_text(margin = margin(t = 0.8 * 
##             half_line/2), vjust = 1), axis.text.x.top = element_text(margin = margin(b = 0.8 * 
##             half_line/2), vjust = 0), axis.text.y = element_text(margin = margin(r = 0.8 * 
##             half_line/2), hjust = 1), axis.text.y.right = element_text(margin = margin(l = 0.8 * 
##             half_line/2), hjust = 0), axis.ticks = element_line(colour = "grey20"), 
##         axis.ticks.length = unit(half_line/2, "pt"), axis.ticks.length.x = NULL, 
##         axis.ticks.length.x.top = NULL, axis.ticks.length.x.bottom = NULL, 
##         axis.ticks.length.y = NULL, axis.ticks.length.y.left = NULL, 
##         axis.ticks.length.y.right = NULL, axis.title.x = element_text(margin = margin(t = half_line/2), 
##             vjust = 1), axis.title.x.top = element_text(margin = margin(b = half_line/2), 
##             vjust = 0), axis.title.y = element_text(angle = 90, 
##             margin = margin(r = half_line/2), vjust = 1), axis.title.y.right = element_text(angle = -90, 
##             margin = margin(l = half_line/2), vjust = 0), legend.background = element_rect(colour = NA), 
##         legend.spacing = unit(2 * half_line, "pt"), legend.spacing.x = NULL, 
##         legend.spacing.y = NULL, legend.margin = margin(half_line, 
##             half_line, half_line, half_line), legend.key = element_rect(fill = "grey95", 
##             colour = "white"), legend.key.size = unit(1.2, "lines"), 
##         legend.key.height = NULL, legend.key.width = NULL, legend.text = element_text(size = rel(0.8)), 
##         legend.text.align = NULL, legend.title = element_text(hjust = 0), 
##         legend.title.align = NULL, legend.position = "right", 
##         legend.direction = NULL, legend.justification = "center", 
##         legend.box = NULL, legend.box.margin = margin(0, 0, 0, 
##             0, "cm"), legend.box.background = element_blank(), 
##         legend.box.spacing = unit(2 * half_line, "pt"), panel.background = element_rect(fill = "grey92", 
##             colour = NA), panel.border = element_blank(), panel.grid = element_line(colour = "white"), 
##         panel.grid.minor = element_line(size = rel(0.5)), panel.spacing = unit(half_line, 
##             "pt"), panel.spacing.x = NULL, panel.spacing.y = NULL, 
##         panel.ontop = FALSE, strip.background = element_rect(fill = "grey85", 
##             colour = NA), strip.text = element_text(colour = "grey10", 
##             size = rel(0.8), margin = margin(0.8 * half_line, 
##                 0.8 * half_line, 0.8 * half_line, 0.8 * half_line)), 
##         strip.text.x = NULL, strip.text.y = element_text(angle = -90), 
##         strip.placement = "inside", strip.placement.x = NULL, 
##         strip.placement.y = NULL, strip.switch.pad.grid = unit(half_line/2, 
##             "pt"), strip.switch.pad.wrap = unit(half_line/2, 
##             "pt"), plot.background = element_rect(colour = "white"), 
##         plot.title = element_text(size = rel(1.2), hjust = 0, 
##             vjust = 1, margin = margin(b = half_line)), plot.subtitle = element_text(hjust = 0, 
##             vjust = 1, margin = margin(b = half_line)), plot.caption = element_text(size = rel(0.8), 
##             hjust = 1, vjust = 1, margin = margin(t = half_line)), 
##         plot.tag = element_text(size = rel(1.2), hjust = 0.5, 
##             vjust = 0.5), plot.tag.position = "topleft", plot.margin = margin(half_line, 
##             half_line, half_line, half_line), complete = TRUE)
## }
## <bytecode: 0x000000000be9fe28>
## <environment: namespace:ggplot2>

Now, let us modify the default theme function and have a look at the result:

theme_custom <- function (base_size = 12, base_family = "Roboto Condensed") {
  half_line <- base_size/2
  theme(line = element_line(color = "black", size = 0.5, linetype = 1, lineend = "butt"), 
        rect = element_rect(fill = "white", color = "black", size = 0.5, linetype = 1), 
        text = element_text(family = base_family, face = "plain", color = "black", 
                            size = base_size, lineheight = 0.9, hjust = 0.5, vjust = 0.5, 
                            angle = 0, margin = margin(), debug = F), 
        axis.line = element_blank(), 
        axis.line.x = NULL, 
        axis.line.y = NULL, 
        axis.text = element_text(size = base_size * 1.1, color = "gray30"), 
        axis.text.x = element_text(margin = margin(t = 0.8 * half_line/2), vjust = 1), 
        axis.text.x.top = element_text(margin = margin(b = 0.8 * half_line/2), vjust = 0), 
        axis.text.y = element_text(margin = margin(r = 0.8 * half_line/2), hjust = 1), 
        axis.text.y.right = element_text(margin = margin(l = 0.8 * half_line/2), hjust = 0), 
        axis.ticks = element_line(color = "gray30", size = 0.7), 
        axis.ticks.length = unit(half_line / 1.5, "pt"), 
        axis.title.x = element_text(margin = margin(t = half_line), vjust = 1, 
                                    size = base_size * 1.3, face = "bold"), 
        axis.title.x.top = element_text(margin = margin(b = half_line), vjust = 0), 
        axis.title.y = element_text(angle = 90, margin = margin(r = half_line), 
                                    vjust = 1, size = base_size * 1.3, face = "bold"), 
        axis.title.y.right = element_text(angle = -90, vjust = 0, 
                                          margin = margin(l = half_line)), 
        legend.background = element_rect(color = NA), 
        legend.spacing = unit(0.4, "cm"), 
        legend.spacing.x = NULL, 
        legend.spacing.y = NULL, 
        legend.margin = margin(0.2, 0.2, 0.2, 0.2, "cm"), 
        legend.key = element_rect(fill = "gray95", color = "white"), 
        legend.key.size = unit(1.2, "lines"), 
        legend.key.height = NULL, 
        legend.key.width = NULL, 
        legend.text = element_text(size = rel(0.8)), 
        legend.text.align = NULL, 
        legend.title = element_text(hjust = 0), 
        legend.title.align = NULL, 
        legend.position = "right", 
        legend.direction = NULL, 
        legend.justification = "center", 
        legend.box = NULL, 
        legend.box.margin = margin(0, 0, 0, 0, "cm"), 
        legend.box.background = element_blank(), 
        legend.box.spacing = unit(0.4, "cm"), 
        panel.background = element_rect(fill = "white", color = NA),
        panel.border = element_rect(color = "gray30", 
                                    fill = NA, size = 0.7),
        panel.grid.major = element_line(color = "gray90", size = 1),
        panel.grid.minor = element_line(color = "gray90", size = 0.5, 
                                        linetype = "dashed"),
        panel.spacing = unit(base_size, "pt"), 
        panel.spacing.x = NULL, 
        panel.spacing.y = NULL, 
        panel.ontop = F, 
        strip.background = element_rect(fill = "white", color = "gray30"), 
        strip.text = element_text(color = "black", size = base_size), 
        strip.text.x = element_text(margin = margin(t = half_line, 
                                                    b = half_line)), 
        strip.text.y = element_text(angle = -90, margin = margin(l = half_line, 
                                                                 r = half_line)), 
        strip.placement = "inside", 
        strip.placement.x = NULL, 
        strip.placement.y = NULL, 
        strip.switch.pad.grid = unit(0.1, "cm"), 
        strip.switch.pad.wrap = unit(0.1, "cm"), 
        plot.background = element_rect(color = NA), 
        plot.title = element_text(size = base_size * 1.8, hjust = 0.5, 
                                  vjust = 1, face = "bold", 
                                  margin = margin(b = half_line * 1.2)), 
        plot.subtitle = element_text(size = base_size * 1.3, hjust = 0.5, vjust = 1, 
                                     margin = margin(b = half_line * 0.9)), 
        plot.caption = element_text(size = rel(0.9), hjust = 1, vjust = 1, 
                                    margin = margin(t = half_line * 0.9)),
        plot.tag = element_text(size = rel(1.2), hjust = 0.5, vjust = 0.5), 
        plot.tag.position = "topleft", 
        plot.margin = margin(base_size, base_size, base_size, base_size), complete = T)
}

Have a look on the modified aesthetics with its new look of panel and gridlines as well as axes ticks, texts and titles:

theme_set(theme_custom())

ggplot(chic, aes(x = date, y = temp, color = factor(season))) + 
  geom_point() + labs(x = "Year", y = "Temperature (°F)") + guides(color = F)

This way of changing the plot design is highly recommended! It allows you to quickly change any element of your plots by changing it once. You can within a few seconds plot all your results in a congruent style and adapt it to other needs (e.g. a presentation with bigger font size or journall requirements)

You can also set quick changes using theme_update():

theme_custom <- theme_update(panel.background = element_rect(fill = "gray60"))

ggplot(chic, aes(x = date, y = temp, color = factor(season))) + 
  geom_point() + labs(x = "Year", y = "Temperature (°F)") + guides(color = F)

For further exercises, we are going to use our own theme with a white filling and without the smaller grid lines:

theme_custom <- theme_update(panel.background = element_rect(fill = "white"),
                             panel.grid.major = element_line(size = 0.5),
                             panel.grid.minor = element_blank())

Jump back to Table of Content.

Working with Colors

For simple applications working with colors is straightforward in ggplot2 but when you have more advanced needs it can be a challenge. For a more advanced treatment of the topic you should probably get your hands on Hadley’s book which has nice coverage. There are a few other good sources including the R Cookbook and the ggplot2 online docs. Tian Zheng at Columbia has created a useful PDF of R colors.

In order to use color with your data, most importantly you need to know if you are dealing with a categorical or continuous variable.

Categorical Variables: Manually Select Colors

(g <- ggplot(chic, aes(x = date, y = temp, color = factor(season))) +
        geom_point() + 
        labs(x = "Year", y = "Temperature (°F)") +
        theme(legend.title = element_blank()) +
        scale_color_manual(values = c("dodgerblue4", "darkolivegreen4", 
                                      "darkorchid3", "goldenrod1")))

Categorical Variables: Use Built-In Palettes

One can use the ColorBrewer palettes by calling scale_*_brewer that are built-in functions in the ggplot2 package:

g + scale_color_brewer(palette = "Set1")

You can ignore the message in the console, replacing the existing scale is what we want.

Categorical Variables: Use Tableau colors

Tableau is a famous visualiztion software with a well-known color palette. For R user it is available via the ggthemes command scale_color_tableau():

library(ggthemes)
g + scale_color_tableau()

Continuous Variables: Default Color Schemes

In our example we will change the variable we want to color to ozone, a continuous variable that is strongly related to temperature (higher temperature = higher ozone). The function scale_color_gradient() is a sequential gradient while scale_color_gradient2() is diverging.

Here is the default ggplot2 continuous color scheme (sequential color scheme):

ggplot(chic, aes(x = date, y = temp, color = o3)) + 
  geom_point() + 
  labs(x = "Year", y = "Temperature (°F)") +
  scale_color_continuous("Ozone:")

This code produces the same plot:

ggplot(chic, aes(x = date, y = temp, color = o3)) +  
  geom_point() + 
  labs(x = "Year", y = "Temperature (°F)") +
  scale_color_gradient()

And here is the diverging default color scheme:

ggplot(chic, aes(x = date, y = temp, color = o3)) +  
  geom_point() + 
  labs(x = "Year", y = "Temperature (°F)") +
  scale_color_gradient2()

Continuous Variables: Manually Set a Sequential Color Scheme

Gradually changing color palettes that are used for continuous variables can be manually set via scale_*_gradient:

ggplot(chic, aes(x = date, y = temp, color = o3)) + 
  geom_point() + 
  labs(x = "Year", y = "Temperature (°F)") +
  scale_color_gradient(low = "darkkhaki", high = "darkgreen", "Ozone:")

Temperature data is normally distributed so how about a diverging color scheme (rather than sequential). For diverging color you can use the scale_color_gradient2 function:

mid <- max(chic$o3) / 2  # or mid <- mean(chic$o3)

ggplot(chic, aes(x = date, y = temp, color = o3)) + 
  geom_point() + 
  labs(x = "Year", y = "Temperature (°F)") + 
  scale_color_gradient2(midpoint = mid, low = "blue4", 
                        mid = "white", high = "red4", "Ozone:")

Continuous Variables: The Beautiful Viridis Color Palette

The viridis color palettes do not only make your plots look pretty and good to perceive but also easier to read by those with colorblindness and print well in gray scale. You can test how your plots might appear under various form of colorblindness using dichromate) package.

The following multi-panel plot illustrates two out of the four viridis palettes:

g <- ggplot(chic, aes(x = date, y = temp, color = o3)) + 
       geom_point() + 
       labs(x = "Year", y = "Temperature (°F)")

library(viridis)
p1 <- g + scale_color_viridis("Ozone:") + ggtitle("'viridis' (default)")
p2 <- g + scale_color_viridis(option = "inferno", "Ozone:") + ggtitle("'inferno'")
p3 <- g + scale_color_viridis(option = "cividis", "Ozone:") + ggtitle("'cividis'")

library(patchwork)
(p1 + p2 + p3) * theme(legend.position = "bottom")

It is also possible to use the viridis color palettes for discrete variables:

ggplot(chic, aes(x = date, y = temp, color = factor(season))) +
  geom_point() + 
  labs(x = "Year", y = "Temperature (°F)") +
  theme(legend.title = element_blank()) +
  scale_color_viridis(discrete = T, end = 1)

Jump back to Table of Content.

Working with Lines

Add Horizonal or Vertical Lines to a Plot

You might want to highlight a given range or threshold, which can be done plotting a line at these defined coordinates using geom_hline() (for “horizontal lines”) or geom_vline() (for “vertical lines”):

ggplot(chic, aes(x = date, y = temp, color = o3)) + 
  geom_point() + 
  labs(x = "Year", y = "Temperature (°F)") + 
  geom_hline(yintercept = c(0, 73))

ggplot(chic, aes(x = temp, y = o3)) +
  geom_point(alpha = 0.5) +
  labs(x = "Temperature (°F)", y = "Ozone") + 
  geom_vline(aes(xintercept = median(temp)), size = 1.2, 
             color = "firebrick", linetype = "dashed") + 
  geom_hline(aes(yintercept = median(o3)), size = 1.2, 
             color = "firebrick", linetype = "dashed")

If you want to add a line with a slope not being 0 or 1, respectively, you need to use geom_abline(). This is for example the case if you add a regresssion line:

reg <- lm(o3 ~ temp, data = chic)
ggplot(chic, aes(x = temp, y = o3)) +
  geom_point(alpha = 0.5) +
  labs(caption = paste0("y = ", round(coefficients(reg)[2], 2), 
                        " * x + ", round(coefficients(reg)[1], 2)), 
       x = "Temperature (°F)", y = "Ozone") + 
  geom_abline(intercept = coefficients(reg)[1], slope = coefficients(reg)[2], 
              color = "darkorange2", size = 1.5)

Later, we will learn how to add a linear fit with one command using stat_smooth(method = "lm"). However, there might be other reasons to add a line with a given slope.

Jump back to Table of Content.

Working with Text

Add Labels to Your Data

Sometimes, we want to label our data points. To avoid overlaying and -crowding by text labels, we use a 1% sample of the original data, equally representing the four seasons.

set.seed(1)

library(tidyverse)
sample <- chic %>% 
  dplyr::group_by(season) %>% 
  dplyr::sample_frac(0.01)

## code without pipes: 
## sample <- sample_frac(group_by(chic, season), 0.01)

chic %>% 
  group_by(season) %>% 
  sample_frac(0.01) %>% 
  ggplot(aes(x = date, y = temp, label = season)) +
    geom_point() + 
    geom_text(aes(color = factor(temp)), hjust = 0.5, vjust = -0.5) +
    labs(x = "Year", y = "Temperature (°F)") +
    xlim(as.Date(c('1997-01-01', '2000-12-31'))) + 
    ylim(c(0, 90)) +
    theme(legend.position = "none")

Okay, avoiding overlays of labels did not work out. But don’t worry, we are going to fix it in a minute!

You can also use geom_label for boxes:

ggplot(sample, aes(x = date, y = temp, label = season)) +
  geom_point() + 
  geom_label(aes(fill = factor(temp)), color = "white", 
             fontface = "bold", hjust = 0.5, vjust = -0.25) +
  labs(x = "Year", y = "Temperature (°F)") +
  xlim(as.Date(c('1997-01-01', '2000-12-31'))) +
  ylim(c(0, 90)) +
  theme(legend.position = "none")

A cool thing is the ggrepel package which provides geoms for ggplot2 to repel overlapping text as in our examples above. Here, we also show both, the original data and our sample data which gets labeled:

library(ggrepel)
ggplot(chic, aes(x = date, y = temp, label = season)) +
  geom_point(alpha = 0.5) +
  geom_point(data = sample, aes(color = factor(temp)), size = 2.5) +
  geom_label_repel(data = sample, aes(fill = factor(temp)), 
                   color = "white", fontface = "bold") +
  labs(x = "Year", y = "Temperature (°F)") +
  theme(legend.position = "none")

This also works for the pure text labels by using geom_text_repel. Have a look at all the usage examples.

Add Text Annotation in the Top-Right, Top-Left etc.

With ggplot2 you can set annotation coordinates to Inf but this is only moderately useful. Here is an example (based on code from this Google group) using the library grid that allows you to specify the location based on scaled coordinates where 0 is low and 1 is high.

The grobTree function from the grid package creates a grid graphical object and textGrob creates the text graphical object. The annotation_custom() function comes from ggplot2 and is designed to use a grob as input.

library(grid)
my_grob <- grobTree(textGrob("This text stays in place!", 
                             x = 0.1, y = 0.9, hjust = 0, 
                             gp = gpar(col = "black", 
                                       fontsize = 15, 
                                       fontface = "bold")))

ggplot(chic, aes(x = temp, y = o3)) +
  geom_point(color = "tan", alpha = 0.5) + 
  labs(x = "Temperature (°F)", y ="Ozone") +
  annotation_custom(my_grob)

The value of this is particularly evident when you have multiple plots with different scales. In the plot below you see that the axis scales vary yet the same code as above can be used to put the annotation is the same place on each facet.

ggplot(chic, aes(x = temp, y = o3)) +
  geom_point(color = "tan") + 
  labs(x = "Temperature (°F)", y ="Ozone") +
  facet_wrap(~ season, scales = "free") +
  annotation_custom(my_grob)

Jump back to Table of Content.

Working with Coordinates

Flip a Plot

It is incredibly easy to flip a plot on its side. Here I have added the coord_flip() which is all you need to flip the plot (by the way, we are trying a new plot type by using geom_boxpot()).

ggplot(chic, aes(x = season, y = o3)) +
  geom_boxplot(fill = "indianred") + 
  labs(x = "Season", y = "Ozone") +
  coord_flip()

Reverse an Axis

You can also easily reverse an axis using scale_x_reverse() or scale_y_reverse(), respectively:

ggplot(chic, aes(x = date, y = temp, color = o3)) + 
  geom_point() + 
  labs(x = "Year", y = "Temperature (°F)") + 
  scale_y_reverse()

Transform an Axis

… or transform the default linear mapping by using scale_y_log10() or scale_y_sqrt(). As an example, here is a log10-transformed axis (which introduces NA’s in this case so be careful):

ggplot(chic, aes(x = date, y = temp, color = o3)) + 
  geom_point() + 
  labs(x = "Year", y = "Temperature (°F)") + 
  scale_y_log10(lim = c(0.1, 100))

Circularize a Plot

It is also possible to circularize (polarizee?) the coordinate system by calling `coord_polar.

library(tidyverse)

chic %>% 
  dplyr::group_by(season) %>% 
  dplyr::summarize(o3 = median(o3)) %>% 
  ggplot(aes(x = season, y = o3)) +
    geom_col(aes(fill = factor(season))) + 
    labs(x = "", y = "Median Ozone Level") +
    coord_polar() +
    guides(fill = F)

This coordinate system allows to draw pie charts as well:

chic %>% 
  dplyr::mutate(o3_avg = median(o3)) %>% 
  dplyr::filter(o3 > o3_avg) %>% 
  dplyr::mutate(n_all = n()) %>% 
  dplyr::group_by(season) %>% 
  dplyr::summarize(rel = n() / unique(n_all)) %>% 
  ggplot(aes(x = "", y = rel)) +
    geom_col(aes(fill = factor(season)), width = 1) + 
    labs(x = "", 
         y = "Proportion of Days Exceeding\nthe Median Ozone Level") +
    coord_polar("y") +
    scale_fill_brewer(palette = "Set1", name = "Season:") +
    theme(axis.ticks = element_blank())

Jump back to Table of Content.

Working with Chart Types

Alternatives to a Box Plot

Box plots are great, but they can be so incredibly boring. There are alternatives, but first we are plotting a common box plot:

g <- ggplot(chic, aes(x = season, y = o3)) + 
       labs(x = "Season", y = "Ozone")

g + geom_boxplot(fill = "indianred")

Effective? Yes.
Interesting? No.

1. Alternative: Plot of Points

Let’s plot just each data point of the raw data:

g + geom_point(color = "firebrick")

Not only boring but uninformative. To improve the plot, one could add transparency to deal with overplotting:

g + geom_point(color = "firebrick", alpha = 0.1)

However, setting transparency is difficult here since either the overlap is still to high or the extreme values are not visible. Bad, so let’s try something else.

2. Alternative: Jitter the Points

Try adding a little jitter to the data. I like this for in-house visualization but be careful using jittering because you are purposely adding noise to your data and this can result in misinterpretation of your data.

g + geom_jitter(aes(color = season), alpha = 0.25, 
                position = position_jitter(width = 0.3)) +
    theme(legend.position = "none")

3. Alternative: Violin Plots

Violin plots, similar to box plots except you are using a kernel density to show where you have the most data, are a useful visualization.

g + geom_violin(color = "sienna", fill = "red", alpha = 0.4)

4. Alternative: Combining Violin Plots with Jitter

We can of course combine both, estimated densities and the raw data points:

g + geom_violin(aes(color = season), fill = "gray80", alpha = 0.5) +
    geom_jitter(aes(color = season), alpha = 0.25, 
                position = position_jitter(width = 0.3)) +
    theme(legend.position = "none") +
    coord_flip()

The ggforce package provides so-called sina functions where the width of the jitter is controlled by the density distribution of the data - that makes the jittering a bit more visually appealing:

library(ggforce)

g + geom_violin(aes(color = season), fill = "gray80", alpha = 0.5) +
    geom_sina(aes(color = season), alpha = 0.25) +
    theme(legend.position = "none") +
    coord_flip()

5. Alternative: Combining Violin Plots with Box Plots

To allow for easy estimation of quantiles, we can also add the box of the boxplot inside the violins to indicate 25%-quartile, median and 75%-quartile:

g + geom_violin(aes(fill = season), color = "transparent", alpha = 0.5) +
    geom_boxplot(outlier.alpha = 0, coef = 0, 
                 color = "gray40", width = 0.1) +
    theme(legend.position = "none") +
    coord_flip()

Create a Rug Representation to a Plot

A rug represents the data of a single quantitative variable, displayed as marks along an axis. In most cases, it is used in addition to scatterplots or heatmaps to visualize the overall distribution of one or both of the variables:

ggplot(chic, aes(x = date, y = temp, color = factor(season))) +
  geom_point() +
  geom_rug() +
  labs(x = "Year", y = "Temperature (°F)") +
  theme(legend.position = "none")

ggplot(chic, aes(x = date, y = temp, color = factor(season))) +
  geom_point() +
  geom_rug(sides = "r", alpha = 0.3) +
  labs(x = "Year", y = "Temperature (°F)") +
  theme(legend.position = "none")

Create a Tiled Correlation Plot

First step is to create the correlation matrix. We are using Pearson because all the variables are fairly normally distributed (but you may consider Spearman if your variables follow a different pattern). Note that since a correlation matrix has redundant information we are setting half of it to NA.

corm <- round(cor(chic[ , sort(c("death", "temp", "dewpoint", "pm10", "o3"))], 
                  method = "pearson", use = "pairwise.complete.obs"), 2)
corm[lower.tri(corm)] <- NA
corm
##          death dewpoint    o3 pm10  temp
## death        1    -0.47 -0.24 0.00 -0.49
## dewpoint    NA     1.00  0.45 0.33  0.96
## o3          NA       NA  1.00 0.21  0.53
## pm10        NA       NA    NA 1.00  0.37
## temp        NA       NA    NA   NA  1.00

Now we put the resulting matrix in long format using the melt function from the reshape2 package and drop the records with NA values:

library(reshape2)
corm <- melt(corm)
corm$Var1 <- as.character(corm$Var1)
corm$Var2 <- as.character(corm$Var2)
corm <- na.omit(corm)
head(corm, 10)
##        Var1     Var2 value
## 1     death    death  1.00
## 6     death dewpoint -0.47
## 7  dewpoint dewpoint  1.00
## 11    death       o3 -0.24
## 12 dewpoint       o3  0.45
## 13       o3       o3  1.00
## 16    death     pm10  0.00
## 17 dewpoint     pm10  0.33
## 18       o3     pm10  0.21
## 19     pm10     pm10  1.00

For the plot we will use geom_tile but if you have a lot of data you might consider geom_raster which can be much faster.

ggplot(corm, aes(x = Var2, y = Var1)) +
   geom_tile(data = corm, aes(fill = value), color = "white") +
   labs(x = "Variable 2", y = "Variable 1") +
   scale_fill_gradient2(low = "blue", high = "red", mid = "white", 
                        midpoint = 0, limit = c(-1, 1), 
                        name = "Correlation\n(Pearson)") +
   theme(axis.text.x = element_text(angle = 45, size = 11, 
                                    vjust = 1, hjust = 1)) +
   coord_equal()

Create a Contour Plot

Contour plots are s nice way to display three-dimensional data by indicating die thresholds of values. Here, we are going to plot the dew point (i.e. the temperature at which airborne water vapor will condense to form liquid dew) related to temperature and ozone levels:

## interpolate data
library(akima)
fld <- with(chic, interp(x = temp, y = o3, z = dewpoint))

## prepare data in long format
library(reshape2)
df <- melt(fld$z, na.rm = T)
names(df) <- c("x", "y", "Dewpoint")
df$Temperature <- fld$x[df$x]
df$Ozone <- fld$y[df$y]

g <- ggplot(data = df, aes(x = Temperature, y = Ozone, z = Dewpoint)) +
         theme(panel.background = element_rect(fill = "white"),
               panel.border = element_rect(color = "black", fill = NA),
               legend.title = element_text(size = 15),
               axis.text = element_text(size = 12),
               axis.title.x = element_text(size = 15, vjust = -0.5),
               axis.title.y = element_text(size = 15, vjust = 0.2),
               legend.text = element_text(size = 12))
         
g + stat_contour(aes(color = ..level.., fill = Dewpoint))

Surprise! As it is defined, the drew point is in most cases equal to the measured temperature.

The lines are indicating different levels of drew points, but this is not a pretty plot and also hard to read due to missing borders. Let’s try a tile plot using the viridis color palette to encode the dewpoint of each combination of ozone level and temperature:

g + geom_tile(aes(fill = Dewpoint)) +
    scale_fill_viridis(option = "inferno")

How does it look if we combine a contour plot and a tile plot to fill the area under the contour lines?

g + geom_tile(aes(fill = Dewpoint)) + 
    stat_contour(color = "white", size = 0.7, bins = 5) + 
    scale_fill_viridis()

Create a Ridge Plot

Ridge(line) plots are a new type of plots which is very popular at the moment.

While you can create those plots with basic ggplot commands the popularity lead to a package that make it easier create those plots: ggridges. We are going to use this package here.

library(ggridges)
ggplot(chic, aes(x = temp, y = factor(year))) + 
   geom_density_ridges(fill = "gray90") +
   labs(x = "Temperature (°F)", y = "Year")

You can easily specify the overlap and the trailing tails by using the arguments rel_min_height and scale, respectively. The package also comes with it’s own theme (but I would prefer to build my own, see chapter “Create and Use Your Custom Theme”). Additionally, we change the colors based on year to make it more appealing.

ggplot(chic, aes(x = temp, y = factor(year), fill = year)) + 
  geom_density_ridges(alpha = 0.8, color = "white", 
                      scale = 2.5, rel_min_height = 0.01) + 
  labs(x = "Temperature (°F)", y = "Year") + 
  guides(fill = F) + 
  theme_ridges()

You can also get rid of the overlap using values below 1 for the scaling argument (but this somehow contradicts the idea of ridge plots…). Here is an example additionally using the viridis color gradient:

ggplot(chic, aes(x = temp, y = season, fill = ..x..)) + 
  geom_density_ridges_gradient(scale = 0.9, gradient_lwd = 0.5, 
                               color = "black") + 
  scale_fill_viridis(option = "plasma", name = "") + 
  labs(x = "Temperature (°F)", y = "Season:") +
  theme_ridges(font_family = "Roboto Condensed", grid = F)

We can also compare several groups per ridgeline and coloring them according to their group. This follows the idea of Marc Belzunces.

library(tidyverse)

## only plot extreme season using dplyr from the tidyverse
ggplot(data = filter(chic, season %in% c("Summer", "Winter")), 
         aes(x = temp, y = year, fill = paste(year, season))) +
  geom_density_ridges(alpha = 0.7, rel_min_height = 0.01, 
                      color = "white", from = -5, to = 95) +
  scale_fill_cyclical(breaks = c("1997 Summer", "1997 Winter"),
                      labels = c(`1997 Summer` = "Summer", 
                                 `1997 Winter` = "Winter"),
                      values = c("tomato", "dodgerblue"),
                      name = "Season:", guide = "legend") +
  theme_ridges(font_family = "Roboto Condensed") + 
  labs(x = "Temperature (°F)", y = "Year")

The ggridges packages is also helpful to create histograms for different groups using stat = "binline" in the geom_density_ridges command:

ggplot(chic, aes(x = temp, y = factor(year), fill = year)) + 
  geom_density_ridges(stat = "binline", bins = 25, scale = 0.9, 
                      draw_baseline = F, show.legend = F) + 
  theme_ridges(font_family = "Roboto Condensed") +
  labs(x = "Temperature (°F)", y = "Season")

Working with Ribbons (AUC, CI, etc.)

This is not a perfect dataset for demonstrating this, but using ribbon can be useful. In this example we will create a 30-day running average using the filter() function so that our ribbon is not too noisy.

chic$o3run <- as.numeric(stats::filter(chic$o3, rep(1/30, 30), sides = 2))

ggplot(chic, aes(x = date, y = o3run)) +
   geom_line(color = "chocolate", lwd = 0.8) +
   labs(x = "Year", y = "Temperature (°F)")

How does it look if we fill in the area below the curve using the geom_ribbon() function?

ggplot(chic, aes(x = date, y = o3run)) +
   geom_ribbon(aes(ymin = 0, ymax = o3run), fill = "orange", 
               color = "orange", alpha = 0.4) +
   geom_line(color = "chocolate", lwd = 0.8) +
   labs(x = "Year", y = "Temperature (°F)")

Nice to indicate the area under the curve (AUC) but this is not the conventional way to use geom_ribbon(). Instead, we draw a ribbon that gives us one standard deviation above and below our data:

chic$mino3 <- chic$o3run - sd(chic$o3run, na.rm = T)
chic$maxo3 <- chic$o3run + sd(chic$o3run, na.rm = T)

ggplot(chic, aes(x = date, y = o3run)) +
   geom_ribbon(aes(ymin = mino3, ymax = maxo3), alpha = 0.5, 
               fill = "darkseagreen3", color = "transparent") +
   geom_line(color = "aquamarine4", lwd = 0.7) +
   labs(x = "Year", y = "Temperature (°F)")

Jump back to Table of Content.

Working with Smoothings

It is amazingly easy to add a smoothing to your data using ggplot2.

Default: Adding a LOESS or GAM Smoothing

You can simply use stat_smooth() – not even a formula is required. This adds a LOESS (locally weighted scatterplot smoothing, method = "loess") if you have fewer than 1000 points or a GAM (generalized additive model, method = "gam") otherwise. Since we have more than 1000 points, the smoothing is based on a GAM.

ggplot(chic, aes(x = date, y = temp)) + 
  geom_point(color = "gray40", alpha = 0.5)+
  labs(x = "Year", y = "Temperature (°F)") +
  stat_smooth()

Specifying the Formula for Smoothing

ggplot2 allows you to specify the model you want it to use. Lets say you want to increase the GAM dimension (add some additional wiggles to the smooth):

ggplot(chic, aes(x = date, y = temp)) + 
   geom_point(color = "gray40", alpha = 0.3) +
   labs(x = "Year", y = "Temperature (°F)") +
   stat_smooth(method = "gam", formula = y ~ s(x, k = 1000), 
               se = F, size = 1.3, aes(col = "1000")) +
   stat_smooth(method = "gam", formula = y ~ s(x, k = 100), 
               se = F, size = 1, aes(col = "100")) +
   stat_smooth(method = "gam", formula = y ~ s(x, k = 10), 
               se = F, size = 0.8, aes(col = "10")) +
   scale_color_manual(name = "k", values = c("darkorange2", 
                                             "firebrick", 
                                             "dodgerblue3"))

Adding a Linear Fit

Though the default is a LOESS or GAM smoothing, it is also easy to add a standard linear fit:

ggplot(chic, aes(x = temp, y = death)) +
   geom_point(color = "gray40", alpha = 0.5) +
   labs(x = "Temperature (°F)", y = "Deaths") +
   stat_smooth(method = "lm", col = "firebrick", se = F, size = 1.3)

Jump back to Table of Content.

Working with Interactive Plots

Shiny

Shiny is a package from RStudio that makes it incredibly easy to build interactive web applications with R. For an introduction and live examples, visit the Shiny homepage.

To look at the potential use, you can check out the Hello Shiny examples. This is the first one:

library(shiny)
runExample("01_hello")

Plot.ly

Plot.ly is a great tool for easily creating online, interactive graphics directly from your ggplot2 plots. The process is surprisingly easy and can be done from within R.

Jump back to Table of Content.

Remarks, Tipps & Tricks

Using ggplot2 in Loops and Functions

The grid-based graphics functions in lattice and ggplot2 create a graph object. When you use these functions interactively at the command line, the result is automatically printed, but in source() or inside your own functions you will need an explicit print() statement, i.e. print(g) in most of our examples. See also the Q&A page of R.

Additional Sources

Jump back to Table of Content.