Automatic Time Binning for Population Pharmacometric Analyses in R

Using the Ckmeans.1d.dp package to perform automatic 1-dimensional data binning.

June 20, 2022

Why the Need for Time Binning in Pharmacometric Analyses?

For in-vitro experiments performed in the laboratory, it is not too difficult to schedule sampling times and take samples at near exactly the scheduled times. However, for data collected in patients, especially studies conducted in an outpatient settings, it is not always possible or practical to collect pharmacokinetic/efficacy samples at exact time points. This can make it difficult to visualize summary level-data across patients since one can no longer simply generate summary statistics within specific timepoints. Fortunately, we can use automatic time binning methods to group measured timepoints across patients into distinct bins using R.

Overview: 1-Dimensional Data Clustering

1-dimensional data clustering can be defined as the assignment of values in a data vector to n clusters so that the values within a cluster are optimally homogenous. The Ckmeans.1d.dp algorithm accomplishes this by binning values into groups as to minimize the sum of squares within the groups. Unlike heuristic algorithms, the CKmeans algorithm uses a dynamic programming approach that is guaranteed to find the optimal solution. You can find more information on the methods of the algorithm here: https://journal.r-project.org/archive/2011-2/RJournal_2011-2_Wang+Song.pdf

Automatic Time Binning of Pharmacokinetic Data and Generating Summary Statistics

In this post, we'll be simulating pharmacokinetic data. Then, so that we can visualize the mean and standard deviation of concentrations across time, we'll perform automatic time binning to group our timepoints.

Generate the pharmacokinetic data

We're going to use mrgsolve to generate some PK data from a 1-compartment PK model.

Simulating the data

library(mrgsolve)
library(dplyr)

mod <- mread("popex", modlib())

dosing <- ev(ID = 1:40, amt = 50, time = 0, dose = amt)

res <- mod %>%
  data_set(dosing) %>%
  obsonly() %>%
  carry.out(dose) %>%
  mrgsim(end = 168, tgrid = c(0.5, 1, 2, 3, 4, 6,
  8, 10, 12, 14, 16, 18, 21,
  24, 28, 32, 36, 48)) %>%
  select(time, DV, ID, dose)%>%
  rowwise() %>%
  mutate(time = time + rnorm(1, 0, time/50)) ## Adding some variability to time

Take a peek at the dataset structure

print(head(res,10))
timeDVIDdose
0.48428160.5696984150
1.01766700.9896524150
1.99629461.5227234150
2.98890291.8016925150
3.95162641.9393919150
5.92991982.0146338150
8.14208371.9858405150
9.75981111.9266638150
12.45245631.8599063150
14.14314471.7925812150

Plot the data

Let's first visualize our raw data.

library(ggplot2)

theme_set(theme_light())

plot1 <- ggplot(res, aes(x = CP, y = RESP, group = ID)) +
 geom_path(aes(colour = as.factor(dose))) +
 geom_point(aes(colour = as.factor(dose))) +
 scale_color_viridis_d() +
 facet_wrap('dose', nrow = 1) +
 theme(legend.position = "top") +
 labs(x = "Drug Concentration", y = "Response (% of Baseline)", color = "Dose (mg)")


print(plot1)

PK Data

We can see that naturally the data timepoints are clustered around the scheduled sampling times. Our goal will be to use a statistical-based method to identify the clusters.

Using the Ckmeans.1d.dp package to automatically bin our data

library(Ckmeans.1d.dp)

# First argument is data to bin
# Second is the number of bins. Can also be range such as c(1,20)
bin_data <- Ckmeans.1d.dp(res$time, 16)

# The bin_data$cluster object has the stored values
# of the cluster number of each timepoint from the
# results of the Ckmeans function. We'll add these
# cluster numbers as another column in our dataset.
res$cluster <- bin_data$cluster

Visualising the Clustered Data

Next, we'll take a look at how the automatic binning performed.

plot2 <- ggplot(res, aes(x = time, y = DV)) +
  geom_point(aes(color = as.factor(cluster))) +
  scale_x_continuous(breaks = seq(0,52, 4)) +
  scale_color_viridis_d() +
  guides(color = "none") +
  labs(x = "Time (hours)", y = "Concentration (mg/L)")

print(plot2)

PK Data

It seems the data were binned into logical groups.

Calculating Summary Statistics

Next, we'll use group_by and summarize to get the summary statistics, and overlay them onto the plot.

res_sum <- res %>%
  group_by(cluster) %>%
  summarise(meanCP = mean(DV),
            meantime = mean(time),
            stdev = sd(DV))

plot3 <- ggplot(res_sum, aes(x = meantime, y = meanCP)) +
  geom_point(data = res,
             aes(x = time, y = DV, color = as.factor(cluster)),
             alpha = 0.1) +
  geom_line(size = 1) +
  geom_errorbar(aes(min = meanCP - stdev, ymax = meanCP + stdev)) +
  geom_point(aes(color = as.factor(cluster)), size = 2) +
  scale_x_continuous(breaks = seq(0,52, 4)) +
  scale_color_viridis_d() +
  guides(color = "none") +
  labs(x = "Time (hours)", y = "Concentration (mg/L)")

print(plot3)

PK Data

Finally, we'll remove the cluster colors.

plot4 <- ggplot(res_sum, aes(x = meantime, y = meanCP)) +
  geom_point(data = res,
             aes(x = time, y = DV),
             alpha = 0.1) +
  geom_line(size = 1) +
  geom_errorbar(aes(min = meanCP - stdev, ymax = meanCP + stdev)) +
  geom_point( size = 2)  +
  scale_x_continuous(breaks = seq(0,52, 4)) +
  labs(x = "Time (hours)", y = "Concentration (mg/L)")

print(plot4)

PK Data

Visualizing Data Summary Plots in Finch Studio

Wish you had a way to create these mean plots of your NONMEM dataset on the fly? Automatic data binning is built into Finch Studio without the need for any external dependencies. Contact us to learn more about Finch Studio and schedule a demo.

Thanks for reading.
Like the post? Give it a thumbs up.
Have some feedback? Leave us a comment.