Winning sports teams have long inspired business leaders, but now their strategies are influencing pharmaceutical researchers. The Oakland A's upended baseball recruiting in 2002 by forgoing conventional wisdom for an objective numbers analysis called sabermetrics - made popular by the film "Moneyball." Inspired by the "Moneyball" approach, a study published in Cell Chemical Biology has gone beyond conventional wisdom in pharmaceutical research to develop an objective, machine-learning program called PrOCTOR to predict drug toxicity in humans.

Scientists typically turn to a handful of rules-based comparisons of a drug's molecular structure to bet on whether an untested drug is safe or toxic. But despite this industry convention, nearly one-third of drugs that fail clinical trials do so because of intolerable side effects.

"People had feelings about certain factors being important in drug toxicity, and there wasn't much science behind these judgement calls," says Olivier Elemento, senior author of the paper and associate professor of physiology and biophysics and of computational genomics in computational biomedicine at the HRH Prince Alwaleed Bin Talal Bin Abdulaziz Al-Saud Institute for Computational Biomedicine at Weill Cornell Medicine.

When Elemento and his co-authors crunched the numbers on the conventional rules in a test model, they found that the common "Veber Rule" incorrectly predicted that more than 75%of FDA-approved drugs would have been too toxic for clinical trials. Lipinski's Rule of Five incorrectly judged 73% of drugs that failed clinical trials due to toxicity as safe enough to pass.

To create their tool PrOCTOR (predicting odds of clinical trial Outcomes using Random-forest), the researchers used a decision-tree machine-learning model and tested whether overlooked data might be equally or more important to safety predictions than the conventional structure-based rules. PrOCTOR incorporates data from 48 different features, including descriptors of a drug's structure such as molecular weight, as well as a host of details about the drug's targets (the molecules in the body to which drugs bind to be effective).

The researchers trained PrOCTOR on a large dataset of 784 FDA approved drugs and 100 drugs that failed clinical trials with toxicity concerns; they then tested the model on hundreds of drugs approved in Europe and Japan and on an even larger database of 3,236 drugs not included in PrOCTOR's training set of data.

Overall, PrOCTOR accurately predicted drug toxicity in test models and even flagged approved drugs that were later monitored for reports of serious side effects.

"We're trying to speed up the drug discovery process," says Elemento. "Many drugs look promising initially, then once they reach clinical trials they fail because they are toxic. We are trying to give researchers an early warning."

However, researchers found that a PrOCTOR score should be assessed in context. Several FDA-approved drugs in the study were flagged as potential failures, but on closer investigation, most of these drugs were life-saving cancer treatments with an understandably high bar for toxic side effects.

The PrOCTOR model worked best when data about the drug's target was included; however, the authors note that this information is not always available during drug development. Moreover, many drug companies don't release details about why a particular drug failed a clinical trial.

"For us, the more data the better," says first author Kaitlyn Gayvert, a PhD student in the Tri-Institutional Computational Biology and Medicine Program, a partnership of Weill Cornell Medicine, Cornell University, and Memorial Sloan Kettering Cancer Center. "If better clinical trial data is reported in the future, we'll be able to make better predictions."

Because PrOCTOR is a machine-learning-based tool, it provides an opportunity to predict more than just a toxicity score.

"One of the big questions we would like to tackle is predicting the specific types of toxicity," says Gayvert. "We'd like to see if we can not only predict that the drug will be toxic, but be able to tell you what specific toxicity types to expect."

This work was supported by the National Science Foundation, the National Institutes of Health, the Starr Cancer Foundation, as well as by startup funds from the Institute for Computational Biomedicine. Additional support was provided by the PhRMA Foundation Pre Doctoral Informatics Fellowship and by the Tri-Institutional Training Program in Computational Biology and Medicine.

Article: A Data-Driven Approach to Predicting Successes and Failures of Clinical Trials, Kaitlyn M. Gayvert, Neel S. Madhukar, Olivier Elemento, Cell Chemical Biology, doi: 10.1016/j.chembiol.2016.07.023, published 15 September 2016.