Code
# Load required packages
library(tidyverse)
library(viridis)
library(scales)

2.1 Bar plots

Bar plots are ideal for plotting frequencies. Although it is easy to create a simple bar plot, there are several things that can be done to improve on them:

  • If there are multiple categories, use a fill to further emphasize that the data represents different categories. Also think about which color to use and the ordering of the colors to best represent the data.

  • Order the categories in terms of their frequency

  • Add frequency labels to make it easier to see the exact frequencies

There are two basic ways to construct a bar plot in ggplot. The first way is to use the raw data and have ggplot calculate the frequencies.

Code
ggplot(mpg, aes(x = fct_rev(fct_infreq(class)), fill = class)) +
  geom_bar(alpha = .85) +
  geom_text(
    stat = "count",
    mapping = aes(label = after_stat(count)),
    vjust = -0.5,
    color = "grey20",
    size = 3
  ) +
  labs(x = "class") +
  guides(fill = "none") +
  scale_fill_viridis(
    discrete = TRUE,
    option = "mako", 
    begin = .05, 
    end = .95
  ) +
  theme_minimal()

If you want to flip the order from high to low, remove the fct_rev() function from the code.

The second way is to first calculate the frequencies yourself and then use the resulting data frame to plot the frequencies.

Code
counts <- count(mpg, class)

ggplot(counts, aes(x = reorder(class, n), y = n, fill = class)) +
  geom_col(alpha = .85) +
  geom_text(
    mapping = aes(label = n),
    vjust = -0.5,
    color = "grey20",
    size = 3
  ) +
  labs(x = "class", y = "count") +
  guides(fill = "none") +
  scale_fill_viridis(discrete = TRUE, option = "mako", end = .95) +
  theme_minimal()

2.1.1 Percentages

If you want to plot percentages instead, you can use the same two techniques as before: 1) have ggplot calculate the percentages or 2) do it yourself first.

If you want to have ggplot calculate a percentage, you can use ..count../sum(..count..).

Code
ggplot(mpg, aes(x = fct_rev(fct_infreq(class)), fill = class)) +
  geom_bar(aes(y = after_stat(count/sum(count))), alpha = .85) +
  geom_text(
    mapping = aes(
      label = percent(after_stat(count/sum(count))),
      y = after_stat(count/sum(count))
    ), 
    stat = "count", 
    vjust = -.5,
    color = "grey20"
  ) +
  labs(x = "class") +
  guides(fill = "none") +
  scale_fill_viridis(
    discrete = TRUE, 
    option = "mako", 
    begin = .05, 
    end = .95
  ) +
  theme_minimal()

Alternatively, you calculate the percentages yourself first.

Code
counts <- mpg %>%
  count(class) %>%
  mutate(pct = n / sum(n))

ggplot(counts, aes(x = reorder(class, n), y = pct, fill = class)) +
  geom_col(alpha = .85) +
  geom_text(
    mapping = aes(label = percent(pct)),
    vjust = -0.5,
    color = "grey20",
    size = 3
  ) +
  labs(x = "class", y = "count") +
  guides(fill = "none") +
  scale_y_continuous(labels = percent_format(accuracy = 1)) +
  scale_fill_viridis(discrete = TRUE, option = "mako", end = .95) +
  theme_minimal()

The code of this latter approach seems to be a lot simpler, so could be recommended on that basis.

2.1.2 Stacked bar plots

Stacked bar plots can be used to add additional information to a bar plot, but they generally are difficult to interpret. It’s difficult to assess the varying sizes of the stacked bar charts. Plotting Likert responses as a stacked bar char may be an exception, as long as it is accompanied by labeled percentages and rotated horizontally. See the Likert chapter for more information.

Usually the better way to add additional information to a bar plot is to use faceting.

2.2 Scatter plots

Scatter plots are useful to visualize the relationship between two numeric variables. If this is indeed the goal of the graph, it is useful to also plot a line that summarizes the relationship.

Code
ggplot(mtcars, aes(x = wt, y = mpg)) +
  geom_point(alpha = .5) +
  geom_smooth(
    method = "lm",
    alpha = .25,
    color = "black"
  ) +
  theme_minimal()