Publishers have an important role in ensuring the research community gains the maximum benefit from published research – as much today as 350 years ago. We sometimes hear criticism that traditional publishing, even online, is not fit for communicating research in the digital age and that the reliability and reproducibility of research is suffering as a result. Problems of irreproducibility – experimental design, training, selective publication – arguably originate in the lab, field or clinic but publishers have a choice. Publishers can compound or help alleviate the problems in a variety of ways, especially in how and what information is reported and shared.
So, what does tackling these problems mean – practically and pragmatically speaking – for the publishing and peer-review systems available today, and tomorrow? Some possible solutions cover several areas:
The amount, format, and type of content published can affect reproducibility. Research should be reported such that a researcher with similar expertise could carry out the same experiment, but poorly reported research methods – either through word limits in methods sections or poor enforcement of reporting standards, such as CONSORT guidelines – can leave readers guessing. A ‘telephone based community support intervention’ for patients with a mental health disorder would tell us little about the frequency of contact with the patient or the content of the discussion. And for computational researchers who use the literature as a resource for research, the availability of articles in the right, machine readable format with sufficient metadata – ideally as open XML – is important. Indeed, this was one of the original goals of building the free repository of biomedical articles PubMed Central – a ‘GenBank for the published literature‘.
A journal indicating it has a policy on deposition of data in a particular repository as a condition of publication is one thing, but actively enforcing such policies in the editorial and peer-review process is another – and the latter has proven to be far more effective. Enforcing and policing policy takes time and effort as well as community support – but better supports reproducibility by demanding positive changes in behaviour. When Nature and Science together began enforcing deposition of microarray data in 2002, the repositories of the data reportedly struggled initially to cope with the demand. Enforcement of editorial measures to increase reproducibility often require checklists, for authors and editors, such as those introduced at Nature in 2013.
People respond to incentives, and credit for contributions to research in the form of publications and citations remain the main currency in research communication. Publishers can incentivise the sharing and publication of more materials that enable reproducible research, namely data, protocols and source code. Novel article types and journals – such as data journals for describing datasets – software journals, and protocol articles and journals, provide formal publication outlets for more of the digital products of research.
- Legal tools
Certain types of content licence can drive efficiency in the reuse and advancement of research. This means licensing content for reuse: as liberally as possible, while maintaining a sustainable business model. Creative Commons licenses for papers enable rapid reuse and building upon of work by other researchers when writing their next article, and ensuring these licences are readable in content by software means more efficient reuse by machines. For data, maximum potential for integration and reuse across hundreds or thousands of data sources without legal barriers can be achieved by putting in in the public domain via Creative Commons CC0. There is no legal requirement to attribute derivative works to creators of data made available under CC0. However, the long-established cultural norm, in scholarship, of citation ensures researchers still gain credit for their contributions.
Access not just to papers, but accessibility – and connectivity – of all digital content can aid discoverability and reproducibility. Ideally, research protocols, source code and raw and analysis ready datasets should be linked from articles. Also, formal citation of these materials in reference lists and better integration of data into articles are important, lowering the barrier to reuse through connections made possible online. Data citation is increasingly common in journals, facilitated by the assignment of persistent identifiers – such as Digital Object Identifiers (DOIs) traditionally assigned to paper-based scholarly works – for data.
- Reliability and re-use
By this I mean peer review, and ensuring we ask the right questions of the right experts on the right content. Peer review and validation of data is a nascent area but is crucial if we are to achieve the goal of ‘intelligently open data‘. Data must be assessable as well as accessible, intelligible and reusable – which means better metadata. By focusing peer-review processes and integrating review of data with article review, this is possible however, and is already happening on journals such as Scientific Data.
There remains much work to do to better connect the process of doing research with process of communicating and discovering research. But publishers are increasingly collaborating with or developing software for research. To quote novelist William Gibson, the future is already here, it’s just unevenly distributed.
A number of the concepts in this blog are discussed in a chapter co-authored by IH, in the book ‘Implementing Reproducible Research’.
This post was first published on the Royal Society’s In Verba blog on 21 April 2015, and relates to our Future of Scholarly Scientific Communication events (#FSSC). These events are bringing together stakeholders in a series of discussions on evolving and controversial areas in scholarly communication, looking at the impact of technology, the culture of science and how scientists might communicate in the future.