We should strive to make data fundamentally open and freely accessible to the public wherever possible.

Iterations of a Generative Adversarial Network AI learning to create abstract art

Towards the end of November 2021, two separate teams of scientists in South Africa and Botswana made a vital decision. They had detected an unusually heavily mutated genetic sequence from virus samples taken from patients infected with Covid-19. Before they had even looked at what these mutations might mean, they uploaded the genetic sequences to a database that made them freely available to scientists all over the world. What those southern African researchers had done was effectively give the world an early warning of what would become known as the Omicron variant of Sars-CoV-2. Within a day of the mutated sequences being shared, steps were already being taken around the world to control the variant’s spread and to understand more about how it might alter the course of the pandemic. But throughout the Covid-19 pandemic, research teams all over the globe have been sharing their data as a matter of routine. This unparalleled global exchange of scientific data has allowed treatments to be developed and quickly disseminated, helped speed up the development of vaccines and allowed health authorities to watch and react in real time as the virus has evolved. It demonstrates just how powerful the open sharing of data can be. But should it really take a crisis on the scale of a global pandemic to prompt it?

One of the central ideas to emerge from the work by the Royal Society's Working Group on the online information environment is that, as a society, we should strive to make data fundamentally open and freely accessible to the public where-ever possible. To do this we need to create platforms – much like those Covid-19 variant databases – where it can be reliably shared and accessed. 

While I don’t argue that all data should be made open, the exercise in making as much as possible freely accessible brings many benefits. For research purposes, it becomes a powerful resource that allows us to examine important questions about society. Take the early days of the Covid-19 pandemic, when researchers were given access to anonymised mobile phone location data, they were able to very quickly build models about how the virus would spread to different countries based on movement patterns. But this also needs to flow in the other direction too, with the content of scientific models, assumptions and code of algorithms that are used in research being openly available too. It can make research more scrutable and claims easier to fact check, but it also gives us a powerful way of catching genuine errors or assumptions that might have been overlooked.

The scientific community should not be afraid of this, but rather see it for the benefits it can bring. There have already been some significant steps in this direction, with academic literature being opened up and sharing of data. Even commercial entities such as pharmaceutical companies have warmed to the idea of making their own clinical trial data more openly available and sharing information about adverse effects. But we are still in the early days of this process. Much more can be done to make data more accessible, searchable and interoperable. There is also still so much data collected that few people know it even exists outside of the organisations that accumulate it. 

Of course, there will always be people who will willfully misinterpret data, but this is where another important recommendation from our report on the Online Information Environment comes in – promoting data literacy and digital literacy. This isn’t about providing people with a perfect 2020 vision of the world, but rather giving them the methods they need to apply a critical mind to data, helping them be clear about the intrinsic quality of different classes of data and sources of it.

A world inhabited by people who are better able to interrogate and analyse information for themselves can only be a good thing. In many areas of life, people look to scientists, researchers and engineers to offer them the answers to questions they have. But it has become apparent to me that as scientists and engineers, we are not always brilliant at getting over the fundamental fact that we don’t have all of the answers. To borrow a phrase by my colleague Professor Frank Kelly, “science stands on the edge of error”. This is because the scientific method is about trying to manage and reduce uncertainties using a systematic approach of testing ideas. For this reason, our understanding of the physical, social and biological processes that shape our world can change as we gather more information about them. It is why science cannot deliver facts written on tablets of stone. The key, however, is getting everyone – politicians, journalists, policy-makers and the wider public – comfortable with this idea of uncertainty.

We will need tools to help us navigate a data rich environment. How, for example, might we know that the data we are looking at is what it says it is? How do we detect sophisticated deep fakes or identify data that might have been poisoned in some way? An important set of technologies will help us be confident about the data we have, digitally encoding certification and chains of provenance. But there is also a major role for trusted institutions as repositories of data that has come through a process of accreditation. 

We have seen in the pandemic, just how important it can be to get information from organisations that we agree we can trust – from media outlets and universities to government and intergovernmental bodies. That is something we can build upon. And with a more open, questioning, and analytical society, I’m optimistic we can have a far healthier online information environment.

Further reading

This blog is one of a series of perspective pieces published to support the Royal Society's Online information environment report, which provides an overview of how the internet has changed, and continues to change, the way society engages with scientific information, and how it may be affecting people’s decision-making behaviour.

Authors

  • Sir Nigel Shadbolt FREng FRS

    Sir Nigel Shadbolt FREng FRS

    Sir Nigel Shadbolt is Professor of Computer Science at the University of Oxford and Principal of Jesus College, Oxford. He undertakes interdisciplinary research in computer and engineering science, including artificial Intelligence (AI), computational neuroscience, human-centred computing and the emerging field of web science. He has also researched and promoted the use of open data. He is Chairman and Co-Founder, with Sir Tim Berners-Lee, of the Open Data Institute.