A team in Denmark has developed a powerful new search engine dedicated to finding well-sourced online information about rare diseases. In an evaluation study FindZebra outperformed Google, making the case for specialized search engines for specialized tasks.

Radu Dragusin, of the Technical University of Denmark, and colleagues, report the findings of their evaluation study in a February online issue of the International Journal of Medical Informatics.

Medical professionals are increasingly turning to the internet to help with diagnosis. The preferred tools are Google and PubMed, but while they are useful for finding published information on common conditions, they are harder to use to find good sources on rare diseases.

A rare disease is usually defined as one that occurs in fewer than 1 in 2,000 people.

The National Institutes of Health (NIH) in the US gives an alternative definition, “A rare disease is generally considered to be a disease that affects fewer than 200,000 people in the United States at any given time”.

There are more than 6,800 rare diseases. Altogether, rare diseases affect an estimated 25-30 million Americans.

Rare diseases, sometimes called orphan diseases, are by their very nature hard to diagnose. According to the European Organisation for Rare Disease, 1 in 4 diagnoses is delayed by between 5 and 30 years.

While for many rare diseases, the exact cause is unknown, for a significant portion, it can be traced to mutations in a single gene, and such diseases are referred to as rare genetic diseases.

However, environmental factors, such as diet, smoking or exposure to chemicals, also can play a role in rare diseases, either by directly causing disease, or through interaction with genes to trigger or worsen it.

The term “zebra” is often used to refer to a rare disease after a professor of medicine in the US in the late 1940s reputedly used it to describe unexpected diagnoses, “when you hear hoofbeats behind you, you don’t expect to see a zebra”.

It is not impossible to find specialized information on Google. But you have to go through a laborious process to locate reliable sources from a wealth of Google search results.

This is because Google ranks pages as important if they have many links to other pages and puts these at the top of the search results.

But if you insert a set of symptoms into Google as your keywords, by definition, rare diseases are unlikely to be at the top of the list when the search returns pages of results.

Plus, because Google is not optimized for specialized searching, it will also return a plethora of results from irrelevant sources.

So it’s not surprising that the medical world is keen to find a more effective tool for the job.

Dragusin and colleagues first set out to answer the question of how suitable tools like Google and PubMed are in the diagnostic setting.

So they developed a dedicated, easy to use search engine called FindZebra and benchmarked the existing web tools against it.

FindZebra uses the open source information retrieval tool Indri to search curated publicly available data. On the FindZebra website, the team says:

“We index over 31,000 medical articles focused on rare and genetic diseases from reputable sources on the internet.”

The sources include databases such as the NORD Rare Disease Database and Organizational Database, the m-Power Rare Disease Database of the Madisons Foundation, and Orphanet, an online rare disease and orphan drug data base.

They also include Wikipedia’s Rare Diseases category, the Swedish Information Center for Rare Diseases, and the US National Library of Medicine’s Genetics Home Reference and their National Center for Biotechnology Information.

In their evaluation study, Dragusin and colleagues explain how they used information from 56 difficult real life cases to benchmark Google against FindZebra.

The findings show that FindZebra is significantly better at returning relevant results.

One example is when they inserted the search terms “Boy, normal birth, deformity of both big toes (missing joint), quick development of bone tumor near spine and osteogenesis at biopsy,” into FindZebra: it returns the correct diagnosis “Fibrodysplasia ossificans progressiva”.

When they performed the search in Google, using the same search terms, none of the results mentioned the correct diagnosis.

The researchers suggest that the way Google ranks pages is not optimal for this kind of search.

As part of the evaluation they also compared searches in FindZebra against the same search on Google, restricting the Google search to the same limited dataset that they use in FindZebra, and they found the Google results were still significantly worse that the FindZebra ones.

They write:

“FindZebra outperforms Google Search in both default set-up and customised to the resources used by FindZebra.”

FindZebra is a research project, but the team has made it publicly available on the internet at FindZebra.

The website carries the following warning:

“WARNING! This is a research project to be used only by medical professionals.”

There is also an additional note for the lay user:

“Although the articles indexed by the system have been written by medical professionals or reviewed by medical associations, it is strongly recommended that, as a patient, you consult you local health care provider.”

Mobile versions of the tool are also available for smartphones and tablets and the developers invite feedback from users.

While Google may not be optimal for helping to diagnose rare diseases, in other areas it is proving very useful to medicine and health. For instance, in 2012, researchers reported how combining data that keeps track of Google searches about flu with the latest techniques used in weather forecasting, they could predict regional peaks in flu outbreaks more than 7 weeks ahead.

Written by Catharine Paddock PhD