2  Bar plots

2.1 Setup

Code
library(tidyverse)
library(scales)

lighter_blue <- "#BFE7F4"
light_blue <- "#7ED2F1"
blue <- "#327291"
yellow <- "#FBB21D"
dark_gray <- "#303A40"
gray <- "#464D53"

theme_set(theme_minimal(base_size = 17))
update_geom_defaults("bar", aes(fill = blue))
update_geom_defaults("text", aes(color = "gray20"))
update_geom_defaults("label", aes(color = "gray20"))

2.2 Frequencies

Bar plots are ideal for plotting frequencies. Although it is easy to create a simple bar plot, there are several things that can be done to improve on them:

  • Order the categories in terms of their frequency
  • Add frequency labels to make it easier to see the exact frequencies

There are two basic ways to construct a bar plot in ggplot. The first way is to use the raw data and have ggplot calculate the frequencies.

Code
ggplot(mpg, aes(x = fct_rev(fct_infreq(class)))) +
  geom_bar() +
  geom_label(
    stat = "count",
    mapping = aes(label = after_stat(count)),
    vjust = 1,
    color = "white",
    fill = "transparent",
    label.size = NA,
    label.padding = unit(0.5, "lines")
  ) +
  labs(x = "class") +
  guides(fill = "none") +
  labs(x = "test") +
  scale_y_continuous(expand = expansion(mult = c(0, 0.05))) +
  theme(
    panel.grid.major.x = element_blank()
  )

Bar plot with counts calculated by ggplot2

If you want to flip the order from high to low, remove the fct_rev() function from the code.

The second way is to first calculate the frequencies yourself and then use the resulting data frame to plot the frequencies.

Code
counts <- count(mpg, class)

ggplot(counts, aes(x = reorder(class, n), y = n, fill = class)) +
  geom_col(alpha = .85) +
  geom_text(
    mapping = aes(label = n),
    vjust = -0.5,
    color = "grey20",
    size = 3
  ) +
  labs(x = "class", y = "count") +
  guides(fill = "none") +
  theme_minimal()

2.2.1 Percentages

If you want to plot percentages instead, you can use the same two techniques as before: 1) have ggplot calculate the percentages or 2) do it yourself first.

If you want to have ggplot calculate a percentage, you can use ..count../sum(..count..).

Code
ggplot(mpg, aes(x = fct_rev(fct_infreq(class)), fill = class)) +
  geom_bar(aes(y = after_stat(count / sum(count))), alpha = .85) +
  geom_text(
    mapping = aes(
      label = percent(after_stat(count / sum(count))),
      y = after_stat(count / sum(count))
    ),
    stat = "count",
    vjust = -.5,
    color = "grey20"
  ) +
  labs(x = "class") +
  guides(fill = "none") +
  theme_minimal()

Alternatively, you calculate the percentages yourself first.

Code
counts <- mpg %>%
  count(class) %>%
  mutate(pct = n / sum(n))

ggplot(counts, aes(x = reorder(class, n), y = pct, fill = class)) +
  geom_col(alpha = .85) +
  geom_text(
    mapping = aes(label = percent(pct)),
    vjust = -0.5,
    color = "grey20",
    size = 3
  ) +
  labs(x = "class", y = "count") +
  guides(fill = "none") +
  scale_y_continuous(labels = percent_format(accuracy = 1)) +
  theme_minimal()

The code of this latter approach seems to be a lot simpler, so could be recommended on that basis.

2.2.2 Stacked bar plots

Stacked bar plots can be used to add additional information to a bar plot, but they generally are difficult to interpret. It’s difficult to assess the varying sizes of the stacked bar charts. Plotting Likert responses as a stacked bar char may be an exception, as long as it is accompanied by labeled percentages and rotated horizontally. See the Likert chapter for more information.

Usually the better way to add additional information to a bar plot is to use faceting.