Data sharing and mining

To allow others to verify and build on the work published in Royal Society journals, it is a condition of publication that authors make available the data, code and research materials supporting the results in the article. This policy can be cited by DOI via FAIRsharing.org. It is not permitted to state that data will be available from the authors upon request.

Why do I need to make my data available?

We require supporting data and information, including source code and other digital research materials, to be made publicly available on publication of articles, as well as at submission for verification/review purposes. This is in line with our policies to promote greater openness in scientific research. What are the benefits?

  • It can increase citation levels and draw attention to your work
  • Verification of results – readers can replicate studies and identify statistical or methodological errors
  • Allow others to build on your work, find new uses for your data and use in meta-analyses (and reduce effort in data collection)
  • Preserve your full scientific contributions (beyond published articles) in an organised, citable system
  • Take advantage of professional curation services
  • Providing data at submission means that accidental errors or problems with analysis may be picked up prior to publication

Learn more in our video.

Back to top

Where can I submit my data?

There are two options for archiving data, code and other materials: in a publicly accessible repository (preferred) or as supplementary material in the published paper.

Repositories

Our preference is for authors to archive their raw data with an external repository, rather than providing this as supplementary material, since it will then be correctly curated. For example, a curated data repository will check that:

  • All required materials have been received
  • The data have no ethical, legal or rights issues which might prevent sharing
  • The condition and format of the data are suitable for use and long-term preservation
  • Documentation is sufficient to enable researchers to use the data

Authors should deposit research data in a FAIR-aligned repository, with a preference for those that explicitly follow the FAIR Data Principles and demonstrate compliance with international standards for data repositories (e.g. CoreTrustSeal).

Your chosen repository should:

  • be publicly available
  • retain data under an open license (CC0 or CC-BY) (clearly visible on the landing page of your dataset)
  • provide files with a DOI
  • make versioning/changes clear
  • have provisions for permanent access
  • have an English-language translation
  • be curated

Use of Google drives, Dropbox, or similar services is prohibited for final publication but may be acceptable during the review process (check with the journal’s Editorial Office).

To encourage best practice in data sharing, several Royal Society journals have Dryad data deposition integrated into the journal submission system. For all its science journals, the Society will cover the cost ($120) of depositing data with Dryad. We have provided a list of other example repositories below – this list is not exhaustive; authors are encouraged to use the most appropriate repository for their field.

Supplementary material

Data files may alternatively be provided as a supplement to the paper, which we will host alongside the published article. In addition, we deposit all supplementary material into the Figshare repository on the author's behalf on publication. Our preference is that raw data is archived in an external repository (who will curate it properly as described above), and supplementary material is used for supporting figures, videos and other small files.

Common repositories

General repositories

Where no appropriate subject-specific repository exists, data should be deposited in a general repository such as Dryad or Zenodo.

Biological Sciences

Nucleotide sequence data

Genbank

European Molecular Biology Laboratory

DNA Data Bank of Japan

Accession numbers must be provided in the data accessibility section of your manuscript.

Phylogenetic data

TreeBASE

Please ensure that alignments as well as phylogenies are deposited.

Microarray data

ArrayExpress

Gene Expression Omnibus (GEO)

Protein sequences

Genbank

European Molecular Biology Laboratory

DNA Data Bank of Japan

Protein Information Resource

Accession numbers must be provided in the data accessibility section of your manuscript.

Proteomics data

We recommend that all proteomics data, including mass spectrometry and protein interaction data is deposited via the EBI PRIDE website.

Physical Sciences

Chemical data

Chemical structures and bioassays should be deposited in PubChem.

Earth, space and environmental science data

A useful list of repositories can be found on the AGU website.

Back to top

What are the policies around code?

Please provide access to all code used to generate statistics and generate figures, along with any (processed) data required as inputs, along with details of what software it requires (program and version). Analysis code (such as R scripts) must be made available at the point of submission, as well as any previously unreported algorithms. Any restrictions on or reasons for prohibiting the sharing of important code or algorithms must be discussed with the Editors before submission.

Source code should be made available under an open source licence and deposited in an appropriate repository such as Zenodo or Dryad. Small amounts of source code can be included in the supplementary material.

Back to top

When do I submit my data? 

Data files and other supporting material (such as details of code) must be provided at the point of submission for our Editors and reviewers for peer-review, and then made publicly available before publication. Files must be provided either by hosting them in an external repository with an accessible link included in the data accessibility section (you will be prompted for this during submission) or uploaded as supplementary material via the electronic submission system. For some of our journals, material may be provided via GitHub, Google drives, Dropbox, or similar services for the review stage, but they must be moved to a permanent, publicly accessible repository during revision. This must be finalised before the submission of a final version of the article.

Back to top

What level of data needs to be made available?

It is a condition of publication that authors make the primary data, materials (such as statistical tools, protocols, software) and code publicly available. As a minimum, sufficient information and data are required to allow others to replicate all study findings reported in the article. Data and code should be deposited in a form that will allow maximum reuse. Studies that do not rely on data, code or other material (e.g., theoretical studies) to generate their conclusions, and so do not require data etc for replication attempts, may be exempt from our open data policy, but this must be clearly and explicitly stated in the cover letter and data access question in our online submission form.

All files, and all data columns within files, should be clearly labelled and readily interpretable. Provide a 'read-me' information file if necessary.

Authors do not need to submit the raw data collected during an investigation if the standard in the field is to share data that have been processed (e.g. CSV files recording response to stimuli rather than the electrical signals on which they were based). If processed data are supplied, rather than raw data, this should be stated in the data accessibility section during submission.

Raw image data for digital morphology should be provided with processed 3D data; e.g., modern field standards are to share such data in museum-linked repositories such as morphosource.org.

Back to top

What licence should apply to datasets?

Please ensure that the licence applied to your dataset is clearly visible on the repository landing page of your data record. All data deposited to Dryad through the integrated submission system will be published under a Creative Commons BY 4.0 licence; as will all supplementary files. Data which do not explicitly have an open licence are not open data. Wherever possible, we ask that authors ensure that the license accompanying their data record is given as an open data license of either CC0 or CC-BY.

Exceptions to the above may be made for authors dependent on the circumstances (for example, due to ethical considerations, or if data are obtained from a third party where re-use restrictions may apply) but we ask that authors please query this with the editorial office prior to submission to the journal.

Back to top

How do I prepare the data accessibility section?

Authors of all papers that report primary data will be required to provide a statement in the manuscript submission form that states where the article's supporting data, materials and code can be accessed.

If these have been deposited in an external repository this section should list the database, accession number/DOI and any other relevant details to clearly identify the dataset(s). Datasets included here must also be listed in the reference section. Citing datasets and code ensure effective and robust dissemination and appropriate credit to authors.

For example:
DNA sequences: Genbank accessions F234391-F234402 [REF#]
Phylogenetic data, including alignments: TreeBASE accession number S9123 [REF#]
Climate data and MaxEnt input files: Dryad doi:10.5521/dryad.12311 [REF#]

If supporting data, materials or code have been included in the article’s supplementary material, this should be stated here, for example:
The datasets supporting this article have been uploaded as part of the supplementary material.

It is not permissible to state that data will be available upon request to the authors.

How do I reference third party data or code in the data accessibility section?

Please provide details about the previous published article, with a link where possible e.g. [X] data are available from Smith et al. [2021]: [URL XXX].

Where the third-party material isn't covered by an open licence, please provide evidence that you have permission to use the data.

Back to top

How do I cite datasets and code in the references?

Citing datasets and code ensures effective and robust dissemination and appropriate credit to authors. Therefore, we strongly encourage authors to include datasets and code in the reference list as well as in your data accessibility section.

Citations in Royal Society journals are in the Vancouver style, for example:

1.Torres-Campos I, Abram PK, Guerra-Grenier E, Boivin G, Brodeur J. 2016 Data from: A scenario for the evolution of selective egg colouration: the roles of enemy-free space, camouflage, thermoregulation, and pigment limitation. Dryad Digital Repository https://doi.org/10.5061/dryad.5qt2k

Source code or the commercially available software used should be referenced in an appropriately formatted citation – this article provides guidance.

Back to top

What do I do if there are restrictions on accessing my data due to ethical and/or legal reasons?

If data are restricted e.g., for ethical and/or legal reasons, you should make provisions for them to be available upon request to a Data Access Committee or Ethics Committee. In your data accessibility statement you should state the reason for restriction (e.g. identifiable patient data), the name of the Data Access Committee or Ethics Committee and details for the point of contact.

Back to top

What are the embargo restrictions?

The general policy is that data, code and materials must be made publicly available at the time of publication. Exceptions to this policy are rare and can only be approved at the journal’s discretion. In some circumstances, embargoes on data sharing of up to one year may be granted.

Back to top

Can I mine data content?

We support the stance that the right to read is the right to mine. We believe that the ability to use computers to extract information from scholarly material is one of many tools available to researchers, and we support this activity on our journals.

Members of subscribing institutions have our permission to mine journal content for either commercial or non-commercial purposes. We ask that you respect the copyright of the original papers, and where possible cite original works when you reuse them.

Text and data mining (TDM) is an exception to the usual copyright restrictions which researchers can benefit from. The exception (s.29A of the Copyright, Designs and Patents Act 1988 (CDPA)) allows copies to be made of any copyright material for the purpose of computational analysis.

If a researcher wants to share TDM results that contain some copyright-protected element from the original work, for example in a publication, then that is possible in certain circumstances.  For example, the Quotation exception (s.30 of the Copyright, Designs and Patents Act 1988 (CDPA)) allows the copying and use of portions of someone else’s work to illustrate a point being made.

Please also bear in mind that our servers have finite capacity, and to help us manage the system load we ask that you let us know when you intend to carry out any mining activity. Our technology provider sets a limit on downloads from our sites, beyond which an automatic lock-out is triggered. By working together, we can help you to complete your project and achieve your research goals without being blocked by technical restrictions.

Back to top