Seasonal influenza outbreaks and pandemics of new strains affect people all over the world. However, current systems used to estimate the spread of such illnesses are too slow, delivering predictions with one or two weeks delay. Tobias Preis and Helen Susannah Moat, the researchers behind the study published today, worked out a way of improving the speed and accuracy of such estimates by using data from search engines like Google.
Large technological systems are a part of our everyday lives. By interacting with these systems, for example by Googling symptoms of illness, we are unwittingly creating gigantic sets of data which can be used to investigate human behaviour on massive scales.
Previous studies in the USA have shown that data on how frequently internet users searched for influenza related terms correlated with the percentage of doctor visits in which patients presented with influenza-like symptoms. The scientists behind that work built an analytical tool called Google Flu Trends which monitors searches and delivers results on the frequency of such searches with a delay of just a day.
Usually estimates are based on simple models using only historic levels of flu but the team wanted to see if adding Google Flu Data could improve the accuracy of forecasts of influenza levels in the USA.
Using data retrieved from a national database on the percentages of visits to doctors due to influenza like illness in the US between 2010 and 2013 and data for the same period from Google Flu Trends, the team designed a forecasting model which could estimate current levels of influenza.
The challenge for the team was to design a model which, using only previous data values, would be able to make accurate forecasts of current influenza levels, which the team call a ‘nowcast’.
Compared to estimates of influenza levels based only on past doctor visits, adding the up-to-date Google Flu data into the mix, the team was able to increase the accuracy of estimates of current spread of influenza. Adding the search engine results to information on doctor visits from weeks preceding improved the accuracy of estimates. Depending on how far back the team’s model used data from, it slashed errors in estimates by 16.0% to 52.7%.