Unreported Drug Side Effects Found In Web Search Data

User behavior on the internet is increasingly being recognized as a valuable source of health information. Now a team from Stanford University School of Medicine and Microsoft Research has shown how mining rich seams of data from users’ search histories yields important information on the unreported side effects of drugs.

They report their findings in the 6 March online issue of the Journal of the American Medical Informatics Association.

Co-author Russ Altman is professor of bioengineering, of genetics and of medicine at Stanford. He says in a statement:

“Seeking health information is a major use of the Internet now. So we thought people are likely typing in drugs they are taking and the side effects they are experiencing and that there must be a way for us to use this data.”

Need to Boost Drug Safety Surveillance

The authors note in their study background that bad drug side effects cause a high level of illness and deaths, and are often only discovered after a drug comes to market.

So there is an urgent need to find fast and accurate ways of discovering whether drugs either on their own or in combination have unexpected side effects.

In the US the Food and Drug Administration runs a scheme called the Adverse Event Reporting System (AERS) where doctors can report side effects. But the scheme is voluntary and does not necessarily capture all instances of where patients or doctors notice an unusual side effect.

Mining Internet Search Histories Already Yielding Results in Medicine

Mining search histories of internet users has already been shown to be an accurate way to track flu outbreaks. In 2008, Google launched a tool called Flu Trends that estimates the level of flu in each state of the US in nearly real time by keeping track of certain Google search queries.

A paper published in 2010 showed that looking at the location and frequency of internet searches related to flu and its symptoms followed the spread of flu in the US as accurately as the hospital-tracking method used by the Centers for Disease Control and Prevention.

And in 2012, two researchers took this a stage further when they revealed a new flu forecasting model using Google’s Flu Trends that predicts regional peaks in flu outbreaks more than 7 weeks ahead.

Inspired by examples like these, Altman and colleagues were interested in discovering if mining internet search data could detect drug interactions.

Altman’s lab had already developed some automated tools to mine FDA reports for drug-drug interactions.

Study Mines Data from 82 Million Internet Searches on Drug-Drug Interaction

So with the help of the Microsoft team, they adapted the tools to analyze 12 months of search history in 2010 from 6 million internet users. The users had consented to share logs of their web searches anonymously for research, via a browser plug-in.

The total number of drug, symptom and condition searches came to 82 million.

The researchers decided to mine this huge data pool for searches about a side effect that occurs when two drugs, paroxetine and pravastatin, are taken together that was not known about in 2010.

Paroxetine is an antidepressant medication, and pravastatin is a cholesterol-lowering drug.

The side-effect is that the risk of developing hyperglycemia (high levels of blood glucose) is higher from taking both drugs than from taking either of them on their own.

The team used the enhanced mining tools to identify searches for information on either or both drugs, and to work out the likelihood that the users doing those searches would also search for hyperglycemia, or phrases that internet users might use to describe its symptoms.

Important to Consider Non-Medical Ways of Describing Symptoms

Altman says, “We really had to take into consideration this difficulty in predicting people’s language,” which is how they came up with nearly 80 terms for symptoms or descriptors of hyperglycemia, for example “high blood sugar”, “dehydration“, “blurry vision”, or “frequent urination”.

“We could miss things because, through no fault of their own, the public doesn’t know medical jargon,” Altman explains.

The results showed that among internet users who searched for paroxetine or its brand names (eg Paxil) in 2010, around 5% also searched for one of the 80 terms to described hyperglycemia-related symptoms. For pravastatin and its brand names (eg Pravachol, Selektine), this figure was under 4 %.

But for internet users who searched for both drugs, and also searched for hyperglycemia-related symptoms or descriptors, the rate was much higher, at 10%.

To double-check the accuracy of their mining tools, the researchers did another analysis where they looked for 31 drug combinations already known to cause hyperglycemia and 31 known to be safe.

The new analysis found that the drug combinations with known interactions, like the results for the paroxetine-pravastatin mining analysis, yielded a higher rate of users searching for hyperglycemia-related symptoms.

High Rate of False Positives Could Be a Drawback

But the researchers also found that about 12% of internet users searching for drug-drug interactions known to have no side effects, also showed an unusually high rate of searches for hyperglycemia-related symptoms. These “false-positives” would have led nowhere had someone decided to follow them up.

Nevertheless, despite the false-positives, the researchers believe listening to “signals from the crowd”, or pursuing “pharmacovigilance” on the web, can yield accurate results.

They just need to work out how useful this mining method might be in continuous monitoring for side effects.

Mining Multiple Data Sources May Overcome Drawbacks of Working with “Messy” Data

It may be possible to reduce the rate of false-positives by combining the internet search data with that of other sources, such as social networking media, medical records, and patient supports forums.

Add to that the FDA AERS, and data form health professionals working on medical research programs, and there is potential for providing reliable lists of drug-drug interactions to investigate further in clinical trials.

Co-author Nigam Shah, assistant professor of medicine at Stanford, and his team are already looking into how to mine for drug interactions in anonymized electronic health records.

“If we cross-reference multiple data sources, then we can triangulate based on what doctors and patients are both concerned about,” says Shah.

Shah admits that data from internet searches will always be “messy”. It arises for so many different reasons: users could be searching for symptoms because they are taking the drugs, or someone else is taking them. And when there is high profile media coverage of a particular drug or symptom, then there will be excessive searches on those, inflating the results.

But Shah says you can work with messy data if you have enough of it, which is the case when millions of searches are available. Then results can inform directions for further investigation.

Altman believes patients are saying a lot about drugs, and “we need to figure out ways to listen”.

Mining internet searches is “just one way of listening and one application,” he adds.

Funds from the National Institutes of Health helped pay for the study.

Written by Catharine Paddock PhD