We already saw some of R’s built in plotting facilities with the function plot
. A more recent and much more powerful plotting library is ggplot2
. ggplot2
is another mini-language within R, a language for creating plots. It implements ideas from a book called [“The Grammar of Graphics” [url https://www.amazon.com/Grammar-Graphics-Statistics-Computing/dp/0387245448]]. The syntax can be a little strange, but there are plenty of examples in the online documentation.
If ggplot2
isn’t already installed, we would need to install it.
# install.packages("ggplot2")
# or
# install.packages("tidyverse")
ggplot2
is part of the Tidyverse, so loadinging the tidyverse
package will load ggplot2
.
library(tidyverse)
## ── Attaching packages ────────────────────────────────── tidyverse 1.2.1 ──
## ✔ ggplot2 2.2.1 ✔ purrr 0.2.4
## ✔ tibble 1.3.4 ✔ dplyr 0.7.4
## ✔ tidyr 0.7.2 ✔ stringr 1.2.0
## ✔ readr 1.1.1 ✔ forcats 0.2.0
## ── Conflicts ───────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
Producing a plot with ggplot2
, we must give three things:
We continue using the Gapminder data, which was loaded with:
gap <- read_csv("gapminder.csv")
## Parsed with column specification:
## cols(
## country = col_character(),
## continent = col_character(),
## year = col_integer(),
## lifeExp = col_double(),
## pop = col_integer(),
## gdpPercap = col_double()
## )
Let’s make our first ggplot.
ggplot(gap, aes(x=year, y=lifeExp)) +
geom_point()
The call to ggplot
and aes
sets up the basics of how we are going to represent the various columns of the data frame. aes
defines the “aesthetics”, which is how columns of the data frame map to graphical attributes such as x and y position, color, size, etc. We then literally add layers of graphics to this.
aes
is another example of magic “non-standard evaluation”, arguments to aes
may refer to columns of the data frame directly.
Further aesthetics can be used. Any aesthetic can be either numeric or categorical, an appropriate scale will be used.
ggplot(gap, aes(x=year, y=lifeExp, color=continent, size=pop)) +
geom_point()
This R code will get the data from the year 2007:
gap2007 <- filter(gap, year == 2007)
Create a ggplot of this with gdpPercap
on the x-axis and lifeExp
on the y-axis.
To draw lines, we need to use a “group” aesthetic.
ggplot(gap, aes(x=year, y=lifeExp, group=country, color=continent)) +
geom_line()
A wide variety of geoms are available. Here we show Tukey box-plots. Note again the use of the “group” aesthetic, without this ggplot will just show one big box-plot.
ggplot(gap, aes(x=year, y=lifeExp, group=year)) +
geom_boxplot()
geom_smooth
can be used to show trends.
ggplot(gap, aes(x=year, y=lifeExp)) +
geom_point() +
geom_smooth()
## `geom_smooth()` using method = 'gam'
Aesthetics can be specified globally in ggplot
, or as the first argument to individual geoms. Here, the “group” is applied only to draw the lines, and “color” is used to produce multiple trend lines:
ggplot(gap, aes(x=year, y=lifeExp)) +
geom_line(aes(group=country)) +
geom_smooth(aes(color=continent))
## `geom_smooth()` using method = 'loess'
Geoms can be added that use a different data frame, using the data=
argument.
gap_australia <- filter(gap, country == "Australia")
ggplot(gap, aes(x=year, y=lifeExp, group=country)) +
geom_line() +
geom_line(data=gap_australia, color="red", size=2)
Notice also that the second geom_line
has some further arguments controlling its appearance. These are not aesthetics, they are not a mapping of data to appearance, rather they are direct specification of the appearance. There isn’t an associated scale as when color was an aesthetic.
Adding labs
to a ggplot adjusts the labels given to the axes and legends. A plot title can also be specified.
ggplot(gap, aes(x=year, y=lifeExp)) +
geom_point() +
labs(x="Year", y="Life expectancy", title="Gapminder")
Type scale_
and press the tab key. You will see functions giving fine-grained controls over various scales (x, y, color, etc). Limits on the scale can be set, as well as transformations (eg log10), and breaks (labelled values).
Suppose we want our y-axis to start at zero.
ggplot(gap, aes(x=year, y=lifeExp)) +
geom_point() +
scale_y_continuous(limits=c(0,100))
The lims
function can also be used to set limits.
Very fine grained control is possible over the appearance of ggplots, see the ggplot2 documentation for details and further examples.
Continuing with your scatter-plot of the 2007 data, add axis labels to your plot.
Advanced: Give your x axis a log scale (see the documentation on scale_x_continuous
, specifically the trans
argument).
Faceting lets us quickly produce a collection of small plots. The plots all have the same scales and the eye can easily compare them.
ggplot(gap, aes(x=year, y=lifeExp, group=country)) +
geom_line() +
facet_wrap(~ continent)
Note the use of ~
, which we’ve not seen before. ~
syntax is used in R to specify dependence on some set of variables, for example when specifying a linear model. Here the information in each plot is dependent on the continent.
Let’s return again to your scatter-plot of the 2007 data.
Adjust your plot to now show data from all years, with each year shown in a separate facet, using facet_wrap(~ year)
.
Advanced: Highlight Australia in your plot.
The act of plotting a ggplot is actually triggered when it is printed. In an interactive session we are automatically printing each value we calculate, but if you are using a for loop, or other R programming constructs, you might need to explcitly print( )
the plot.
Ggplots can be saved using ggsave
.
# Plot created but not shown.
p <- ggplot(gap, aes(x=year, y=lifeExp)) + geom_point()
# Only when we try to look at the value p is it shown
p
# Alternatively, we can explicitly print it
print(p)
# To save to a file
ggsave("test.png", p)