3 Visualizations
3.1 Introduction
Whoever said a picture is worth 1,000 words severely understated how many words a picture is actually worth. When working with data, there is a strong argument to make that nothing is more important than visuals.
If there is one piece of advice to take from this textbook, it is this:
After running summary statistics, always visualize your data!
You may be thinking “how powerful can a visualization even be?” That is a great question, that Anscombe’s quartet will help answer.
Quick history lesson. In 1973 (before the invention of R), a statistician named Francis Anscombe created four unique datasets, which all had identical summary statistics.
library(tidyverse)
library(quartets)
library(knitr)
anscombe_quartet %>%
group_by(dataset) %>%
summarise(mean_x = mean(x),
variance_x = var(x),
mean_y = mean(y),
variance_y = var(y),
correlation = cor(x, y)) %>%
kable(digits = 2,caption = "A breakdown of summary statistics from the four individual datasets Anscombe created.")| dataset | mean_x | variance_x | mean_y | variance_y | correlation |
|---|---|---|---|---|---|
| (1) Linear | 9 | 11 | 7.5 | 4.13 | 0.82 |
| (2) Nonlinear | 9 | 11 | 7.5 | 4.13 | 0.82 |
| (3) Outlier | 9 | 11 | 7.5 | 4.12 | 0.82 |
| (4) Leverage | 9 | 11 | 7.5 | 4.12 | 0.82 |
As the table above shows, all four of the different datasets show the same means, variances, and correlations (more on that in chapter 6). With just these summary statistics, you’d likely think “eh, all the data is the same, these datasets are basically identical.”
Wrong.
When we graph these four datasets, we see something totally different than what the table shows.
ggplot(anscombe_quartet, aes(x = x, y = y)) +
geom_point() +
geom_smooth(method = "lm") +
facet_wrap(~dataset)
#> `geom_smooth()` using formula = 'y ~ x'Figure 3.1: While the four datasets looked identical in the table, the visualized Anscombe datasets show an entirely different picture.
With our visualization, we are introduced to an entirely different way of seeing our data. The table showed that the summary statistics were identical, but here we can see:
- Dataset One has a linear relationship between x and y.
- Dataset Two has a nonlinear relationship between x and y.
- Dataset Three, while still linear, has an outlier.
- Dataset Four shows something totally different from the rest!
Visualizations provide insights into data that sometimes numbers can’t show.
This chapter is meant less to be a lesson, and more to be a reference page to come to when you need to make graphs. You do not need to remember every plotting option shown here. This chapter is designed to be returned to whenever you need a reminder or example.
3.2 Learning Objectives
By the end of this chapter, you will be able to:
- Explain why visualizing data is a critical step alongside summary statistics
- Describe the core grammar of graphics used by
ggplot2(data, aesthetics, geometry) - Create common plot types using
ggplot2, including scatterplots, bar charts, column charts, histograms, density plots, boxplots, and line graphs - Map and customize aesthetics such as color, shape, size, fill, and transparency
- Enhance visual clarity using labels, themes, facets, coordinate transformations, and reference lines
- Add contextual information to plots using trend lines, error bars, and text labels
- Interpret visual patterns to identify relationships, distributions, outliers, and trends in data
- Create visualizations that are interpretable and reproducible when viewed independently of accompanying text
With that being said, let’s get right to it.
3.3 Base R
One of the strongest qualities in R is its ability to create visualizations, powered by ggplot2. However, it is possible to use base R to create plots as well. It is recommended to use ggplot2; however, you may encounter base R plots in older scripts or documentation, so an example is included for familiarity.
values <- c(100, 17, 45, 55, 44)
barplot(values, xlab = "X-axis", ylab = "Y-axis", main ="Base R Bar Chart")Figure 3.2: An example of a bar chart created using base R.
3.4 ggplot2
Before we go into visualizing our data, we should probably see what data we will be working with! Similar to how R comes preinstalled with datasets, ggplot2 also comes with prepacked data that can be utilized.
kable(head(mpg), caption = "A base R dataset: Fuel economy data from 1999 to 2008 for 38 popular models of cars.")| manufacturer | model | displ | year | cyl | trans | drv | cty | hwy | fl | class |
|---|---|---|---|---|---|---|---|---|---|---|
| audi | a4 | 1.8 | 1999 | 4 | auto(l5) | f | 18 | 29 | p | compact |
| audi | a4 | 1.8 | 1999 | 4 | manual(m5) | f | 21 | 29 | p | compact |
| audi | a4 | 2.0 | 2008 | 4 | manual(m6) | f | 20 | 31 | p | compact |
| audi | a4 | 2.0 | 2008 | 4 | auto(av) | f | 21 | 30 | p | compact |
| audi | a4 | 2.8 | 1999 | 6 | auto(l5) | f | 16 | 26 | p | compact |
| audi | a4 | 2.8 | 1999 | 6 | manual(m5) | f | 18 | 26 | p | compact |
kable(head(economics), caption = "A base R dataset: US Economic Time Series.")| date | pce | pop | psavert | uempmed | unemploy |
|---|---|---|---|---|---|
| 1967-07-01 | 506.7 | 198712 | 12.6 | 4.5 | 2944 |
| 1967-08-01 | 509.8 | 198911 | 12.6 | 4.7 | 2945 |
| 1967-09-01 | 515.6 | 199113 | 11.9 | 4.6 | 2958 |
| 1967-10-01 | 512.2 | 199311 | 12.9 | 4.9 | 3143 |
| 1967-11-01 | 517.4 | 199498 | 12.8 | 4.7 | 3066 |
| 1967-12-01 | 525.1 | 199657 | 11.8 | 4.8 | 3018 |
kable(head(diamonds), caption = "A base R dataset: Prices of over 50,000 round cut diamonds.")| carat | cut | color | clarity | depth | table | price | x | y | z |
|---|---|---|---|---|---|---|---|---|---|
| 0.23 | Ideal | E | SI2 | 61.5 | 55 | 326 | 3.95 | 3.98 | 2.43 |
| 0.21 | Premium | E | SI1 | 59.8 | 61 | 326 | 3.89 | 3.84 | 2.31 |
| 0.23 | Good | E | VS1 | 56.9 | 65 | 327 | 4.05 | 4.07 | 2.31 |
| 0.29 | Premium | I | VS2 | 62.4 | 58 | 334 | 4.20 | 4.23 | 2.63 |
| 0.31 | Good | J | SI2 | 63.3 | 58 | 335 | 4.34 | 4.35 | 2.75 |
| 0.24 | Very Good | J | VVS2 | 62.8 | 57 | 336 | 3.94 | 3.96 | 2.48 |
kable(head(mtcars), caption = "A base R dataset: Motor Trend Car Road Tests.")| mpg | cyl | disp | hp | drat | wt | qsec | vs | am | gear | carb | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Mazda RX4 | 21.0 | 6 | 160 | 110 | 3.90 | 2.620 | 16.46 | 0 | 1 | 4 | 4 |
| Mazda RX4 Wag | 21.0 | 6 | 160 | 110 | 3.90 | 2.875 | 17.02 | 0 | 1 | 4 | 4 |
| Datsun 710 | 22.8 | 4 | 108 | 93 | 3.85 | 2.320 | 18.61 | 1 | 1 | 4 | 1 |
| Hornet 4 Drive | 21.4 | 6 | 258 | 110 | 3.08 | 3.215 | 19.44 | 1 | 0 | 3 | 1 |
| Hornet Sportabout | 18.7 | 8 | 360 | 175 | 3.15 | 3.440 | 17.02 | 0 | 0 | 3 | 2 |
| Valiant | 18.1 | 6 | 225 | 105 | 2.76 | 3.460 | 20.22 | 1 | 0 | 3 | 1 |
Additionally, ggplot2 is part of the tidyverse package. So, you can either load ggplot2 or tidyverse if you want to visualize using ggplot2. Because ggplot2 is part of the tidyverse, everything you learned in Chapter 2 about pipelines and data manipulation carries directly into visualizations.
3.4.1 Basics
When using ggplot2, there are unlimited possibilities on what you can manipulate/influence. This may be daunting, but always remember that all plots work on the same framework.
plot = data + aesthetics + geometry
No matter what ggplot you are making, no matter how many characteristics you influence, all ggplot2 needs are three things:
- The data: the data being used to make the plot
- The aesthetics: x/y/color/shape/etc. In
ggplot2aesthetics is shortened to aes. - The geometry: plot type (e.g., scatterplot, boxplot, etc.)
With only those three things, you can make any type of visualization you want or need. From there, you can build as far as your mind can see. When you do want to add more levels to your plot, you do so by using the + sign.
Here is an example of a basic graph made with ggplot:
ggplot(data = mpg, aes(x = cty, y = hwy)) +
geom_point()Figure 3.3: An example of a visualization made using ggplot2.
This is our first ggplot (a scatterplot) we have created, so let’s break this down:
- The data: we are using the mpg dataset.
- The aesthetics: the x variable is cty and the y is hwy.
- The geometry:
geom_point(), which creates points on the graph.- Importantly, to add geometry, you do need to add the + sign.
With only two lines of code, a scatterplot was created! Since the framework has been established, it is time to build some visualizations!
Note: In this chapter, there will be basic examples of each visualization type, and advanced examples of each visualization type. This is done to display the scope of what can be done with this package (ggplot2). It is encouraged to experiment with the code, change numbers, remove things, and compare and contrast the differences between the experimented visuals.
3.4.2 Scatterplot - geom_point()
When data has two continuous (for example, numeric) variables, and you want to visualize their relationship, a scatterplot is a fantastic choice. This is the basis for relationship analysis that topics such as linear regression (more about this in Chapter 8) rely on.
The geometry that needs to be specified is geom_point().
ggplot(data = mpg, aes(x = cty, y = hwy)) +
geom_point()Figure 3.4: A basic example of a scatterplot using ggplot2.
Typically it is best practice to have a “line of best fit” when creating scatterplots, similar to Anscombe’s visualization. To do that, you can utilize the geom_smooth() command.
# Line of Best Fit
ggplot(mpg, aes(x = cty, y = hwy)) +
geom_point() +
geom_smooth(method = "lm") # lm stands for "linear model"
#> `geom_smooth()` using formula = 'y ~ x'Figure 3.5: An example of a scatterplot with a line of best fit.
Notice that to add geom_smooth, which is another layer of our visualization, we needed to add another + sign. It is used quite literally in ggplot2.
Let’s add some more information to our scatterplot.
ggplot(data = mpg, aes(x = cty, y = hwy, color = class, shape = drv)) +
geom_point(alpha = 0.8, size = 2.5) + # opacity and size
labs(
title = "City vs Highway MPG", # adds a title to the plot
x = "City MPG", # adds a x-axis label to the plot
y = "Highway MPG", # adds a y-axis label to the plot
color = "Vehicle Class", # adds a color label to the plot
shape = "Drivetrain" # adds a shape label to the plot
) +
facet_wrap(~ class) + # breaks the graph into individual graphs
geom_smooth(method = "lm", se = TRUE) # adds a line of best fit
#> `geom_smooth()` using formula = 'y ~ x'Figure 3.6: A scatterplot incorporating multiple aesthetics and faceting.
3.4.3 Bar Chart (counts) and Column Chart (values)
Bar charts visualize counts of a discrete variable, while column charts visualize pre-summarized numeric values.
3.4.3.1 Bar Chart - geom_bar()
For a bar chart, the geometry used is geom_bar().
# BASIC (counts by class) - geom_bar() counts rows automatically
ggplot(mpg, aes(x = class)) +
geom_bar()Figure 3.7: An example of a basic bar chart.
Depending on how long the variable names are, it may be best to switch the x and y axis. They would still act the same, but they would just flip on the coordinate plane. The x variable would still be the x variable, and the y would still be the y variable, but flipped. To do this, you can utilize the coord_flip() command.
ggplot(mpg, aes(x = manufacturer)) +
geom_bar(fill = "steelblue", color = "white") +
coord_flip() +
labs(title = "Counts by Manufacturer", x = "", y = "Count")Figure 3.8: An example of a coordinate flipped bar chart.
Depending on the audience, a stacked bar chart may be the best way to visualize the data. To do that, you can add position = "stack" and ggplot does the rest.
# What if we want a stacked bar chart (default with fill)?
ggplot(mpg, aes(x = manufacturer, fill = drv)) +
geom_bar(position = "stack", color = "white") +
coord_flip() +
labs(
title = "Counts by Manufacturer and Drivetrain",
x = "",
y = "Count",
fill = "Drivetrain"
) +
theme_minimal()Figure 3.9: An example of a stacked bar chart.
If you need to create a grouped bar chart instead, you can add position = "dodge" to create your visual.
#What if we want a grouped bar chart
ggplot(mpg, aes(x = manufacturer, fill = drv)) +
geom_bar(position = "dodge", color = "white") +
coord_flip() +
labs(
title = "Counts by Manufacturer and Drivetrain",
x = "",
y = "Count",
fill = "Drivetrain"
) +
theme_minimal()Figure 3.10: An example of a grouped bar chart.
3.4.3.2 Column Chart - geom_col()
Column charts work best when you have pre-summarized values, and not raw values.
# USING PRE-SUMMARIZED VALUES - geom_col() requires explicit values
class_counts <- mpg %>%
count(class) # counts rows by class
kable(class_counts, caption = "Pre-summarized values")| class | n |
|---|---|
| 2seater | 5 |
| compact | 47 |
| midsize | 41 |
| minivan | 11 |
| pickup | 33 |
| subcompact | 35 |
| suv | 62 |
Once the values have been summarized explicitly, you use geom_col() to create a column chart.
ggplot(class_counts, aes(x = class, y = n)) +
geom_col()Figure 3.11: An example of a basic column chart.
There are a few things you can do to a column chart to add some flavor. For example:
- Use the
reordercommand to reorder the columns into ascending or descending order based on their n values. - Inside of
geom_colchange thewidthof the columns. - Depending on the bars, you can change the
legend.positionto a particular spot (or remove it entirely) from the visualization.
# PLUS AESTHETICS (polished)
ggplot(class_counts, aes(x = reorder(class, n), y = n, fill = class)) +
geom_col(width = 0.7, color = "white") + # width = bar thickness, color = border
coord_flip() + # flip for readability
labs(title = "Counts by Vehicle Class", x = "", y = "Count") +
theme(legend.position = "none")Figure 3.12: An example of a column chart with polished aesthetics.
3.4.4 Histograms and Density Plots (distribution)
Histograms are perfect for when you are looking to display distribution.
3.4.4.1 Histograms - geom_histogram()
For the geometry of a histogram, you use geom_histogram().
# BASIC
ggplot(mpg, aes(x = hwy)) +
geom_histogram(binwidth = 3)Figure 3.13: An example of a basic histogram.
R automatically assigns bins when creating a histogram unless otherwise instructed. There are several things that can be influenced:
- bin width: use
binwidthto change the width of the bins - boundary: use
boundaryto set the separation between the bins
# STYLED (bin edges + colors)
ggplot(mpg, aes(hwy)) +
geom_histogram(binwidth = 5, boundary = 0,
fill = "red", color = "orange") +
labs(title = "Highway MPG Distribution", x = "Highway MPG", y = "Frequency")Figure 3.14: An example of a styled histogram.
In the case that you need a stacked histogram, below is code to create that. The secret here is using the fill command.
# MAPPED FILL (stacked by class)
ggplot(mpg, aes(hwy, fill = class)) +
geom_histogram(binwidth = 10, color = "white") +
labs(title = "Highway MPG Distribution by Vehicle Class",
x = "Highway MPG", y = "Count", fill = "Vehicle Class") +
theme(legend.position = "bottom")Figure 3.15: An example of a filled histogram.
3.4.4.2 Density - geom_density()
Density plots still show the distribution of data, but instead of doing it in bins like a histogram, accomplish this through an outline. The geometry for a density plot is geom_density().
# BASIC
ggplot(diamonds, aes(x = price)) +
geom_density()Figure 3.16: An example of a basic density plot.
In the below example, the visualization is filtering within the data portion to only keep the cuts “Good”, “Ideal” and “Premium.” Since it is utilizing color inside the aesthetics, this will create a grouped density plot, showing three different lines for each of the three different cuts.
Note: If there was no filtering, this code would still separate into different lines.
# GROUPED
ggplot(diamonds %>% filter(cut %in% c("Good", "Ideal", "Premium")),
aes(price, color = cut)) +
geom_density() +
labs(title = "Price Density by Cut", x = "Price", y = "Density") +
theme(legend.position = "bottom")Figure 3.17: An example of a grouped density plot.
3.4.5 Boxplot - geom_boxplot()
Commonly known as a box and whisker plot, boxplots are fantastic for providing numerical insights between categorical variables. There are a few different pieces of a boxplot:
- Whiskers: there are two whiskers on each boxplot
- Lower Whisker: shows the lower 25% of the data. The bottom is the lowest value in the dataset
- Upper Whisker: shows the upper 25% of the data. The top is the highest value in the dataset
- Box: the box itself shows the middle 50% of the data. This includes:
- Interquartile Range (IQR): the lowest line in the bar is the 25th percentile and the top is the 75th percentile
- Median: the darker line inside of the box
- Outliers: any data point that is above or below the upper and lower whisker, respectively.
The geometry for a boxplot is geom_boxplot().
# BASIC
ggplot(mpg, aes(x = class, y = hwy)) +
geom_boxplot() +
labs(title = "Highway MPG by Vehicle Class", x = "Vehicle Class", y = "Highway MPG")Figure 3.18: An example of a basic boxplot.
Sometimes the points on a boxplot (or any plot) can be indistinguishable due to them being so close together. In that case, utilize the geom_jitter() command.
# WITH JITTERED POINTS OVERLAID
ggplot(mpg, aes(class, hwy)) +
geom_boxplot(outlier.shape = NA) +
geom_jitter(width = 0.15, alpha = 0.4, size = 1.5) +
coord_flip() +
labs(title = "Highway MPG by Vehicle Class (with Points)",
x = "", y = "Highway MPG")Figure 3.19: An example of a boxplot with jittered points.
3.4.6 Lines (time series) - geom_line()
Time and time again, when working with time-series data, a line graph is created. The geometry for a line graph is geom_line().
# BASIC: unemployment over time
ggplot(economics, aes(x = date, y = unemploy)) +
geom_line() +
labs(title = "US Unemployment Over Time",
x = "Date", y = "Number Unemployed (thousands)")Figure 3.20: An example of a basic line graph.
Like in the scatterplot, you can add a line of best fit.
# PLUS: LM vs LOESS contrast
ggplot(economics, aes(date, unemploy)) +
geom_line(linewidth = 1.6) +
geom_smooth(method = "lm", se = FALSE, color = "red") +
labs(title = "US Unemployment with Linear Trend",
x = "Date", y = "Number Unemployed (thousands)")
#> `geom_smooth()` using formula = 'y ~ x'Figure 3.21: An example of line graph with a LM line of best fit.
Instead of using “lm” for our method, let’s try “loess” and see our results.
ggplot(economics, aes(date, unemploy)) +
geom_line(linewidth = 0.6) +
geom_smooth(method = "loess", se = TRUE) + # loess = flexible smoothing; se = confidence band
labs(title = "US Unemployment with LOESS Smooth",
x = "Date", y = "Number Unemployed (thousands)")
#> `geom_smooth()` using formula = 'y ~ x'Figure 3.22: An example of line graph with a LM line of best fit.
3.4.7 Put text on the plot - geom_text()
No matter the type of plot, it may help the viewer understand your plot better if you add labels to some of your data points, for example, the most extreme. To do this, utilize the geom_text() command.
# BASIC: label extreme points
mpg_extremes <- mpg %>% slice_max(order_by = hwy, n = 5)
mpg_extremes
#> # A tibble: 6 × 11
#> manufacturer model displ year cyl trans drv cty
#> <chr> <chr> <dbl> <int> <int> <chr> <chr> <int>
#> 1 volkswagen jetta 1.9 1999 4 manu… f 33
#> 2 volkswagen new beet… 1.9 1999 4 manu… f 35
#> 3 volkswagen new beet… 1.9 1999 4 auto… f 29
#> 4 toyota corolla 1.8 2008 4 manu… f 28
#> 5 honda civic 1.8 2008 4 auto… f 25
#> 6 honda civic 1.8 2008 4 auto… f 24
#> # ℹ 3 more variables: hwy <int>, fl <chr>, class <chr>
ggplot(mpg, aes(displ, hwy)) +
geom_point() +
geom_text(data = mpg_extremes, aes(label = model), nudge_y = 1, size = 3) +
labs(title = "Top Highway MPG Models Labeled",
x = "Engine Displacement (L)", y = "Highway MPG")Figure 3.23: An example of adding text inside a plot.
The code above first identifies the top three hwy values using slice_max(), pairs that with geom_text() and labels only the extreme data points. Now, viewers can read from the plot which model of cars have the best highway miles per gallon.
3.4.8 Error bars (requires summary stats) - geom_errorbar()
In times where you want error bars to be displayed, you first need to compute the mean and standard error for your values.
# Compute mean & standard error for hwy by class
summ_hwy <- mpg %>%
group_by(class) %>%
summarize(
mean_hwy = mean(hwy, na.rm = TRUE),
se_hwy = sd(hwy, na.rm = TRUE) / sqrt(n()))Once completed, you can utilize the geom_errorbar() command to add error bars to your plot.
# Points + error bars (plot error bars first so points sit on top)
ggplot(summ_hwy, aes(class, mean_hwy)) +
geom_errorbar(aes(ymin = mean_hwy - se_hwy, ymax = mean_hwy + se_hwy), width = 0.2) +
geom_point(size = 2) +
coord_flip() +
labs(title = "Mean Highway MPG (± SE) by Class", x = "", y = "Mean Highway MPG")Figure 3.24: An example of a plot with error bars.
3.4.9 Reference lines
Let’s say there is a scenario where you are looking for data above and below a particular threshold. In this case, reference lines can become an essential tool for yourself and your viewers. Once the threshold is established (mean, median, really any number of significance to you), you can utilize the geom_hline() or the geom_vline() commands to create horizontal or vertical reference lines, respectively.
# Horizontal line at overall mean
overall_mean <- mean(mpg$hwy, na.rm = TRUE)
ggplot(mpg, aes(displ, hwy)) +
geom_point(alpha = 0.6) +
geom_hline(yintercept = overall_mean, linetype = "dashed") +
labs(title = "Reference Line at Overall Mean Highway MPG",
x = "Engine Displacement (L)", y = "Highway MPG")Figure 3.25: An example of plot with a horizontal reference line.
# Vertical line at displ = 3
ggplot(mpg, aes(displ, hwy)) +
geom_point(alpha = 0.6) +
geom_vline(xintercept = 3, linetype = "dotted") +
labs(title = "Reference Line at Engine Displacement = 3L",
x = "Engine Displacement (L)", y = "Highway MPG")Figure 3.26: An example of plot with a vertical reference line.
3.5 Key Takeaways
- Always visualize your data! Summary statistics can hide patterns (and problems) in your data.
ggplot2follows a consistent grammar: data + aesthetics + geometry.geom_bar()counts rows automatically;geom_col()plots pre-summarized values.- Use
labs(),theme(), andcoord_flip()to improve clarity and readability. facet_wrap()helps compare groups by creating small multiples.- Trend lines (
geom_smooth()), error bars (geom_errorbar()), labels (geom_text()), and reference lines (geom_hline(),geom_vline()) help communicate the story in your data.
3.6 Checklist
When creating a visualization, have you:
3.7 ggplot2 Visualization Reference
Unlike other chapters, visualization relies on combining multiple components rather than calling single functions. This section serves as a reference for common geometries, aesthetics, and commands used throughout the book.
3.7.1 Summary of ggplot Geometries
Below is a list of plot types, their purpose, and the geom command used:
- Scatterplot - Relationships -
geom_point() - Bar Chart - Counts -
geom_bar() - Column Chart - Pre-summarized values -
geom_col() - Histogram - Distribution -
geom_histogram() - Density Plot - Distribution -
geom_density() - Boxplot - Group comparison -
geom_boxplot() - Line Graph - Time -
geom_line()
3.7.2 Summary of other ggplot commands
Below is a list of other commands used to alter plots:
- Aesthetics:
color: Changes the color of the points.shape: Changes the shape of the points.alpha: Changes the opacity of the point.size: Changes the size of the point.fill: Controls the interior color of shapes
labs(): Creates labels, including title, x-axis, and y-axis.facet_wrap(): Creates individual plots and puts it into one graphic.coord_flip(): Flips the axes without changing the underlying variables.theme(): Controls the overall appearance of the plottheme_minimal(): Makes the most basic looking plot.theme(legend.position = "..."): Dictates where (if at all) the legend appears on the plot.geom_text(): Adds text to the data points within the plot.geom_hline(): Adds a horizontal reference line.geom_vline(): Adds a vertical reference line.geom_smooth(): Adds trend lines.reorder(): Reorders categorical variables based on the values of another variable.
3.8 💡 Reproducibility Tip:
With visualizations (especially in R) there are nearly limitless possibilities. To support reproducibility, aim to create figures that clearly communicate their purpose even when viewed on their own.
When creating a visualization, ask yourself:
- What question is this visualization answering?
- What do I want my audience to understand from it?
- What would someone understand if they saw this figure without any surrounding text?
To help close the gap between these questions, use informative labels and captions, that will help guide users on what they’re seeing.
Within ggplot2, functions like labs() allow you to clearly label axes, titles, and legends so the intent of the plot is immediately clear. When working in R Markdown or bookdown (Section 10.4.3.2.2), figure captions (using fig.cap) provide additional context that travels with the figure wherever it appears.
Visualizations that are well-labeled and properly captioned are easier to interpret, reuse, and reproduce—both by others and by your future self.