Epidemiology 2.0

I spent some initial thoughts on the utilization of semantic technologies for health-related web services, a topic I consider very interesting. I have always been fascinated by projects mining data based on semantic algorithms, such as We Feel Fine (correlating “I feel…” sentences on blogs with the demographics of their authors, the weather data of their location etc.) or Active Inspire.

Search queries as an indicator?

A cooperation by the U.S. Centers for Disease Control and Prevention (CDC) and Google gained a lot of attention this winter, with Google Flu Trends analyzing Google’s search queries to produce a surprisingly accurate prediction of flu epidemias in the US. Based on flu-realted Google searches and the geolocation of the users, it proved to be possible to gain surprisingly reliable data on flu epidemias:

Google Flu Trends

During the 2007-2008 flu season, an early version of Google Flu Trends was used to share results each week with the Epidemiology and Prevention Branch of the Influenza Division at CDC. Across each of the nine surveillance regions of the United States, we were able to accurately estimate current flu levels one to two weeks faster than published CDC reports.

The results have even been published in Nature Magazine. Google’s search engine as a data source is of course unique, comparable data is almost impossible to get anywhere.

“Curated data” as official source

I found some services like the commercial Lysol Cold & Flu tracker that are obviously based on data published weekly by the CDC, so official statistics are another important source for this kind of service. Basically, they are based on the same logic: instead of analyzing the number of people googling for “flu”, the data comes from the health system networks.

WHO data is already involved

I found two services that use curated data in combination with data from web sources, EpiSPIDER (actually not only a web service, but an open source software) and HealthMap, a free web service run by a group of academics.

HealthMap screenshot

HealthMap brings together disparate data sources to achieve a unified and comprehensive view of the current global state of infectious diseases and their effect on human and animal health. This freely available Web site integrates outbreak data of varying reliability, ranging from news sources (such as Google News) to curated personal accounts (such as ProMED) to validated official alerts (such as World Health Organization). Through an automated text processing system, the data is aggregated by disease and displayed by location for user-friendly access to the original alert. HealthMap provides a jumping-off point for real-time information on emerging infectious diseases and has particular interest for public health officials and international travelers.

These services use, in addition to curated data, semantic methods of crawling the web – in this case news feed aggregators – to create a map with the processed data. The EpiSPIDER blog also has a post on some technology background: Web services for event-based surveillance.

Possible “semantic scenarios” for the WHO

When thinking about semantic technologies for the WHO, I can come up with two approaches that might be of interest for our project:

On the one hand, publishing data (on the web) in semantic formats enables other service providers to utilize it for preventive and informative purposes, while at the same time rising awareness for the work and role of the WHO.

On the other hand, the utilization of semantic data could be of importance for the work of the WHO itself, be it the mining of relevant data from the internet or the internal re-use of data created by other parties within the WHO and/or its partners (the latter meaning the use of semantic data “in the field”).

This entry was posted in Reading/Benchmark. Bookmark the permalink.