NEW BOOK: A Guide to Uncertainty Quantification in Drug Discovery
A Concise and Practical Resource for Decision-Makers and Scientists
I am excited to publicly announce and introduce my new book titled: “A Guide to Uncertainty Quantification in Drug Discovery” which is slated for release Q1 2022!
The book is a concise and practical resource for decision-makers and scientists who need better tools to analyze experimental data. Topics covered include finding reliable data, experimental design and why it matters, assessing uncertainty in measurements with statistical estimation, scatter plots vs. correlation coefficients, fitting models to experimental data with regression analysis for linear and non-linear relationships. In addition, this book focuses on key equations that are useful in a drug discovery setting. The goal is to give the reader a solid foundation in concepts and practical tools used when analyzing their experimental datasets.
In this guide, I describe many approaches you can use when conducting experiments, so you have the option of picking a strategy that will work best for your specific circumstances. I explore pitfalls when undertaking these studies, such as not having enough prior information about your system or how quickly concentrations change over time with respect to ligand concentration. Finally, I provide methods for scientists to analyze experimental data and compare this with standard curve-fitting approaches. This will help prevent you from an incorrect conclusion, premature termination of a study, or data quality concerns.
Drug discovery projects rely on the quality of your data, so it is essential to understand what can go wrong and if/when your data may be error-prone. In addition, scientists must develop experiments in a statistically sound way, so you have reliable results to make decisions.
I hope this guide helps answer most questions regarding uncertainty quantification for drug discovery problems by providing both introductory information as well as more advanced topics relevant to industry practitioners interested in how to use their own datasets, particularly those related to the determination of lead molecules, early screening studies for high throughput biology, and chemical characterization studies.
This book is primarily focused on concepts relevant to data scientists in the pharmaceutical industry; however, it can also be of great interest to “Artificial Intelligence” practitioners interested in this field or related areas (i.e., drug discovery) who want to get an idea about some of the issues, limitations, and realities faced when analyzing wet-lab experimental data.
Uncertainty quantification becomes more important nowadays because of the need for new tools to make sense of big datasets. It is only by using these statistical methods that we will discover if what you measure is statistically significant or not so you don't waste time analyzing your experimental data incorrectly. But how do you know if your data is "clean" or not? Why do issues arise with the analysis? This book will help answer these questions so that you will have greater insight into experimental biology and drug discovery by the end of this guide.
Quantifying uncertainty is an art and a science because we can use no single method in all situations. However, there are many methods for estimating uncertainties available to scientists that they may not know about or understand how to use appropriately. Uncertainty quantification becomes even more complicated when combined with molecular modeling techniques. Scientists need to be aware of these issues before embarking on their next project to minimize mistakes and ensure success.
The pharmaceutical industry relies on experimental data for decision-making. Commonly, companies have significant resources invested in collecting large amounts of data, but these resources are often underutilized because the data are not analyzed optimally.
Uncertainty quantification can be used with both kinetic experiments (i.e., continuous measurements taken over time) and discrete measurements, where concentrations are measured at specific times points or sampled periodically during an experiment. Thus, the methods described here apply equally to both types of studies. However, this book will only focus on kinetic studies and the analysis of concentration-time profiles obtained from preclinical research aimed at finding lead compounds for development into new drugs.
A key feature of this book is that it shows you how to use your own experimental data. This is important because you may need to fit a model to your own experimental data for the purposes of hypothesis testing or model selection. In this book, I will demonstrate various examples using both simulated and real datasets from the literature.
The first part of this book will introduce the concepts covered in this field by giving an overview of some of the topics relevant to drug discovery workflows. I begin with a brief introduction to kinetic studies. I then cover areas such as designing experiments, assessing goodness-of-fit, non-linear regression methods, and ways to cross-validate models.
After going through these general statistical ideas, I transition into more practical topics required for performing uncertainty quantification, such as likelihood functions, probability distributions, and Bayesian inference. The following section focuses on determining the uncertainties in a model by using bootstrapping and resampling methods. I also cover the use of surrogate models for parameter estimation when experimental data are not available.
This book focuses on nonparametric bootstrap methods that work well with few samples or measurements in your dataset. These can be applied successfully when you have "small" datasets (i.e., less than 100 measurements) but may need to switch to other more sophisticated bootstrap approaches if you have a more significant number of observations in your dataset.
The second half of the book uses real drug discovery examples from literature where uncertainty quantification is required to formally demonstrate the concepts covered in the first part of this book. Again, I use a wide range of preclinical and clinical research examples, including complete concentration-time profiles, partial concentration-time profiles, and pharmacokinetic parameters.
In this book, I wanted to share statistics knowledge with a broader audience who do not have a background in mathematics but are curious about how scientists could use it to tackle their problems. For that reason, I have provided many code snippets from software that can both simulate and fit models using these techniques.
This guide is intended for anyone interested in uncertainty quantification in drug discovery studies where concentration-response profiles need to be measured over time or when modeling drug interactions with proteins or cells. It provides a background on relevant issues, including experimental design, using statistical models for hypothesis testing and model selection, the bootstrap method, and Bayesian inference.
The methods presented in the book are applicable regardless of your role in drug discovery research or what type of data you have at hand. These concepts are equally important regardless of whether you are interested in learning how to fit a pharmacokinetic parameter using simulated data or analyzing an entire binding interaction experiment using partial concentration-response profiles. For this reason, I hope that all readers find value in these explanations regardless of their position or role within the drug discovery process.
I've been writing A Guide to Uncertainty Quantification in Drug Discovery for over a year, and I'm excited to say that the first draft is now complete! The book should be available for preorder Q1 2022; find out how you can use it to improve your data science skills in order to make better decisions on drug discovery projects.