Creating A Dual-Axis Plot using R and ggplot

A brief tutorial on plotting two y-axes in ggplot

May 20, 2022

Setting the Scene

You have collected data on two variables, both changing with time. The variables are hypothesized to be interconnected, where when one changes, you see a corresponding change in the other. Since this is a PKPD oriented blog, let's imagine our variables are drug concentrations and change in a biomarker (although the techniques in this post can be applied for other types of data).

Goal

We want to generate a plot which shows the change of the two variables over time, highlighting their correlated relationship. Like many other pharmacometricians and data scientists, our plotting weapon-of-choice is ggplot, but unfortunately there aren't any out-of-the-box options to generate a plot with two independent y-axes. Mildly frustrating, but this is by design. Dual y-axes can sometimes be used to generate plots that are confusing (at best) or intentionally misleading (at worst), and the developers of ggplot do not want to encourage undisciplined data visualizations. But... sometimes it makes sense and provides valuable insight into the data (like when we know there is a biologic explanation to the correlation).

Generate the PKPD data

We're going to use mrgsolve to generate some PKPD data from an indirect response model (specifically inhibition of production).

Simulating the data

library(mrgsolve)
library(dplyr)

mod <- mread("irm1", modlib())
dosing <- ev(amt = 25, time = 0, ii = 24, addl = 6)
res <- mod %>%
  data_set(dosing) %>%
  param(CL = 0.5, KOUT = 0.25) %>%
  obsonly() %>%
  mrgsim(end = 336) %>%
  select(time, CP, RESP)

Take a peek at the dataset structure

print(head(res,10))
timeCPRESP
00.000000040.00000
10.738108638.41351
20.936277536.09367
30.953334234.10246
40.916660932.58596
50.869723531.49649
60.826018330.74640
70.788678530.25182
80.757511229.94408
90.731457029.77050

Plot the data

In an initial step to visualize our data, let's just plot the two variables in independent plots and stack them.

library(ggplot2)
library(patchwork)

theme_set(theme_bw())

pk <- ggplot(res, aes(x = time, y = CP)) +
  geom_line() +
  scale_x_continuous(breaks = seq(0, 336, 24)) +
  labs(x = "Time (hr)", y = "Concentration (mg/L)")

pd <- ggplot(res, aes(x = time, y = RESP)) +
  geom_line() +
    scale_x_continuous(breaks = seq(0, 336, 24)) +
  labs(x = "Time (hr)", y = "Biomarker (IU)")

print(pk/pd)

PK/PD Data

As expected, we see that when concentrations go up, the biomarker decreases due to inhibition of production of the biomarker.

Creating the Dual Axis Plot

Next, we'll make a plot with dual y axes, with the left axis showing the concentration scale and the right axis showing the biomarker scale. Although ggplot doesn't allow creating a separate independent y-axis, it does allow creating a second y axis that is a one-to-one transformation of the first. With some clever manipulation, we can use this feature to create our second axis.

Plotting both series on the same figure

We'll start by plotting both variables on the same figure. We'll modify our PK plot slightly to add the biomarker time series layer.

pkpd <- ggplot(res, aes(x = time, y = CP)) +
  geom_line(aes(color = "Drug Concentration (mg/L)")) +
  geom_line(aes(y = RESP, color = "Biomarker (IU/mL)"))+
  scale_x_continuous(breaks = seq(0, 336, 24)) +
  labs(x = "Time (hr)", y = "Concentration (mg/L)", color = "") +
  scale_color_manual(values = c("orange2", "gray50"))

print(pkpd)

PK/PD Data

Doesn't look terrible, but since the values of the two variables are so different, there is a lot of empty space in the plot and also the y-axis is only labelled for concentrations.

Adding a second y-axis

To get rid of all the empty space, we need to get the values of the two variables in the same range. Roughly, the range of our response variable is about 15 times higher than the drug concentrations. So, instead of plotting the biomarker response directly, we'll plot it scaled by a factor of 15 (RESP/15). When adding the second y axis, we need to remember to apply that same factor scaling to the axis to maintain the correct values being shown. To add the second axis and scale it by 15, we can include the following line in the ggplot code

...
scale_y_continuous(sec.axis = sec_axis(~.*15, name="Biomarker (IU/mL)"))
...

To make sure we use the same scaling factor for both the variable transformation and the axis transformation, we'll define a scale variable up front, and use it in both places that we need to.

scale = 15

pkpd <- ggplot(res, aes(x = time, y = CP)) +
  geom_line(aes(color = "Drug Concentration")) +
  geom_line(aes(y = RESP/scale, color = "Biomarker (IU/mL")) +
  scale_x_continuous(breaks = seq(0, 336, 24)) +
  scale_y_continuous(sec.axis = sec_axis(~.*scale, name="Biomarker (IU/mL)")) +
  labs(x = "Time (hr)", y = "Concentration (mg/L)", color = "") +
  scale_color_manual(values = c("orange2", "gray30"))

print(pkpd)

PK/PD Data

And voila... just like that we have our two variables on the same plot, and we can nicely see an increase in PK concentrations drives a decrease in the biomarker, with a return to baseline upon washout of PK.

Automatically defining the scaling function

Sometimes we'll need a more objective way to choose the scaling function to make the plotting routine more general. A common example is when we have multiple subjects in our dataset and we need to loop through and produce an individual plot for each one. Since drug concentrations and response can vary between subjects, we'll need a smarter way to choose the scaling function based on the individual subject's data.

We'll define the plot axis limits upfront, and use some geometry based transformation fucntions to scale our second variable.

max_first  <- max(res$CP)   # Specify max of first y axis
max_second <- max(res$RESP) # Specify max of second y axis
min_first  <- min(res$CP)   # Specify min of first y axis
min_second <- min(res$RESP) # Specify min of second y axis

# scale and shift variables calculated based on desired mins and maxes
scale = (max_second - min_second)/(max_first - min_first)
shift = min_first - min_second

# Function to scale secondary axis
scale_function <- function(x, scale, shift){
  return ((x)*scale - shift)
}

# Function to scale secondary variable values
inv_scale_function <- function(x, scale, shift){
  return ((x + shift)/scale)
}

pkpd <- ggplot(res, aes(x = time, y = CP)) +
  geom_line(aes(color = "Drug Concentration")) +
  geom_line(aes(y = inv_scale_function(RESP, scale, shift), color = "Biomarker (IU/mL")) +
  scale_x_continuous(breaks = seq(0, 336, 24)) +
  scale_y_continuous(limits = c(min_first, max_first), sec.axis = sec_axis(~scale_function(., scale, shift), name="Biomarker (IU/mL)")) +
  labs(x = "Time (hr)", y = "Concentration (mg/L)", color = "") +
  scale_color_manual(values = c("orange2", "gray30"))

print(pkpd)

PK/PD Data

Using this method, we can get perfectly scaled axes every time. If you want to change one of the axis limits (i.e. if you always wanted a lower limit of 0 for both variables) you could just edit the min_first and min_second variables to a value of 0.

Bonus: Styling the Secondary Y axis

We can apply styling to the second Y axis if we want to emphasize which axis corresponds to which variable. Since our drug concentration is colored gray and biomarker is colored orange, we can use those same colors for the axis titles.

scale = 15

pkpd <- ggplot(res, aes(x = time, y = CP)) +
  geom_line(aes(color = "Drug Concentration")) +
  geom_line(aes(y = RESP/scale, color = "Biomarker (IU/mL")) +
  scale_x_continuous(breaks = seq(0, 336, 24)) +
  scale_y_continuous(    sec.axis = sec_axis(~.*scale, name="Biomarker (IU/mL)")) +
  labs(x = "Time (hr)", y = "Concentration (mg/L)", color = "") +
  scale_color_manual(values = c("orange2", "gray30")) +
  theme(
    axis.title.y = element_text(color = "gray30"),
    axis.title.y.right = element_text(color = "orange3")
  )

print(pkpd)

PK/PD Data

Thanks for reading.
Like the post? Give it a thumbs up.
Have some feedback? Leave us a comment.