ggplot2
seriesIn Part 1 of this series, we will:
Explore the grammar of graphics
Map data to aesthetics
Understand layer components
Interpret ggplot2
documentation
Create a layered plot
Introduce function and syntax of visual elements
“The fundamental principles or rules of an art or science” - Oxford English Dictionary
reveal composition of complicated graphics
strong foundation for understanding a range of graphics
guide for well-formed or correct graphics
Note
See “The Grammar of Graphics” by Leland Wilkinson (2005) and “A Layered Grammar of Graphics” by Hadley Wickham (2010)
ggplot2
builds complex plots iteratively, one layer at a time.
What are the necessary components of a plot?
What are necessary components of a layer?
A plot contains:
Data and aesthetic mapping
Layer(s) containing geometric object(s) and statistical transformation(s)
Scales
Coordinate system
(Optional) facets or themes
A layer contains:
Data with aesthetic mapping
A statistical transformation, or stat
A geometric object, or geom
A position adjustment
Data can be added to either the entire ggplot object or a particular layer.
Input data must be a dataframe in ‘tidy’ format:
every column is a variable
every row is an observation
every cell is a single value
Note
See “Tidy Data” by Wickham (2014) and the associated vignette
species | bill_length_mm | bill_depth_mm | body_mass_g |
---|---|---|---|
Adelie | 39.1 | 18.7 | 3750 |
Adelie | 39.5 | 17.4 | 3800 |
Gentoo | 46.7 | 15.3 | 5200 |
Gentoo | 43.3 | 13.4 | 4400 |
Chinstrap | 46.1 | 18.2 | 3250 |
Chinstrap | 51.3 | 18.2 | 3750 |
Can be supplied to initial ggplot()
call, in individual layers, or a combo
ggplot()
data and aesthetics are inherited, but can be overridden
Can be supplied to initial ggplot()
call, in individual layers, or a combo
ggplot()
data and aesthetics are inherited, but can be overridden
Specifying a constant inside aes()
with quotes creates a legend on the fly
layer()
A layer contains:
Data with aesthetic mapping
A statistical transformation, or stat
A geometric object, or geom
A position adjustment
Note
All geom_*()
or stat_*()
calls are customized shortcuts for the layer()
function.
Defining each of the components of a layer or whole graphic can be tiresome
ggplot2
has a hierarchy of defaults
So you can make a graph in 2 lines of code!
stat_*
vs. geom_*
“Every geom has a default statistic, and every statistic has a default geom.” - Wickham (2010)
stat_*
transforms the data
geom_*
control the type of plot renderedTip
When in doubt, check the documentation
stat_count()
and geom_bar()
are equivalent
stat_density()
and geom_density()
are not equivalent
In general, use geom_*()
unless you are trying to:
Track all geom and stat options
Exercise
For each of the following problems, suggest a useful geom:
For example, boxplots and errorbars can’t be stacked.
Exercise
What properties must a geom possess to be stackable?
What properties must a geom possess to be dodgeable?
Exercise
What are the two layers in this plot? What data when into each?
Each scale is a function that translate data space (in data units) into aesthetic space (e.g., pixels)
A guide (axis or legend) is the inverse function, that converts visual properties back to data
Each scale is a function that translate data space (in data units) into aesthetic space (e.g., pixels)
A guide (axis or legend) is the inverse function, that converts visual properties back to data
Every aesthetic in a plot is associated with exactly one scale.
Scale functions names are made of 3 pieces separated by “_”:
scale
the name of the primary aesthetic (color
, shape
, x
)
the name of the scale (discrete
, continuous
, brewer
)
Coordinate systems have 2 primary roles:
Combine the x
and y
position aesthetics to produce a 2-dimensional position on the plot
In coordination with faceting (optional), draw axes and panel backgrounds
Linear:
coord_cartesian()
: common default
coord_flip()
: x and y axes flipped
coord_fixed()
: fixed aspect ratio
Non-linear:
coord_map()
/coord_quickmap()
/coord_sf()
: map projections, x
and y
become longitude and latitude
coord_polar()
: polar coordinates, x
and y
become angle and radius
coord_trans()
: apply transformations
Creates small multiples to show different subsets:
facet_null()
: default
facet_wrap()
: “wraps” a 1d ribbon of panels into 2d
facet_grid()
: 2d grid of panels defined by row and column
Exercise
Recreate the figure below. How would you get the gray points to show up on all facets?
Controls non-data elements of plots (e.g., to match a style guide).
Theme elements specify the non-data elements you can control: plot.title
, legend.position
Each element has an element function to describe its visual properties: element_text()
, element_blank()
The theme()
function allows overriding of the default theme: theme(legend.title = element_blank())
Penguin artwork by @allison_horst
Hadley Wickham’s “A layered grammar of graphics” (2010)
Hadley Wickham’s “ggplot2: Elegant Graphics for Data Analysis, 3rd edition”, now available online
“R for Data Science”, by Hadley Wickham, Mine Cetinkaya-Rundel, & Garret Grolemund, especially chapters 2, 10, and 12
See us at drop-in hours