This Open Letter is the outcome of the Sept 2016 conference at Janelia Research Campus on Collaborative Development of Data-Driven Models of Neural Systems. The signatories of the letter commit to making important parts of their research output open source and will require the same level of openness in work they are asked to review.
This initiative has been described in a recent NeuroView article in Neuron: A Commitment to Open Source in Neuroscience, Gleeson et al. 2017.
Dear colleagues,
Neuroscientists are increasingly relying on custom built software to help analyze their experimental data and to construct and run models. Packages for extracting, translating, analyzing and visualising data, as well as scripts for modelling and simulating the mechanisms underlying the examined phenomena, are an essential part of the work behind many publications in the field today. These scripts can often be complex and involve many processing steps which can’t be fully described in the accompanying publications. Making these publicly available would increase the reproducibility and scientific rigour of the results described. At present though, releasing these is generally not a prerequisite for publication.
A distributed, open and freely available network of tools, databases, and related resources for data analysis and model development will also facilitate reuse of code, making neuroscience research more flexible, efficient and reliable. Ensuring the public availability of all computational models and analysis tools used in publications is crucial for establishing and maintaining such a global infrastructure.
Towards that end, we pledge to release promptly, completely, and freely all computer code, model scripts, and parameters necessary to reproduce the analyses and simulations from any of our new publications. We will make all software applications (tools, libraries, etc.) we develop for experimental data analysis or model construction open source at time of publication, whether or not the application is the main subject of the paper. Furthermore, if and when asked to serve as peer-reviewers, we will henceforth ask authors about the availability of any code they have developed for data analysis and modelling which is essential to reproducing the results of their paper and require that this be shared publicly upon acceptance. We invite all like-minded scientists, developers, users, and peers to join us in this pledge.
Sign the Letter
Please note that we are no longer accepting signatures to the letter.
We have sent you an email confirmation. Your signature will be reviewed and added soon.
Frequently Asked Questions
What type of scripts will be expected for a typical experimental paper?
It would be good to release the key scripts for transforming raw experimental data into the figures used to present/interpret results. At least one of the panels of the figures in the paper should be reproducible from the scripts, ideally more. For analysis of experimental data this will often mean that some of the raw data or partially analyzed results will need to be included, so that the scripts can be successfully run on another machine. Journal requirements for data availability are currently stricter than for code and many journals already require data availability statements from authors. Distributing the relevant subset of data with the figure (e.g. in the same GitHub repository) would greatly facilitate use of the scripts.
Note that scripts should not need to produce the final (formatted) figures themselves, just the relevant traces/graphs to demonstrate that the analysis pipeline can be locally reproduced.
What about analysis of experimental data which requires proprietary/commercial software to run?
Many commercial applications for acquiring and processing experimental data use proprietary formats and it can often be difficult or impossible to reproduce such analyses without access to these applications. There is not a requirement that code released by labs should be executable on freely available software, but it would be ideal if files were released which would make it possible to reproduce the analysis steps by someone with access to the same version of the application used.
A key reason to make these scripts available is the transparency which comes from having access to the algorithms/parameters used in the code, which may not be outlined in full in the paper. Releasing scripts in MATLAB or IgorPro for example would allow someone to see the processing steps without the need to run the scripts in the original application.
Some proprietary applications allow export of files in their native formats to more open (or graphical) formats and making these available (in addition to the proprietary files) is encouraged.
What about custom code developed to drive hardware?
Labs regularly need to develop code to run and manage (custom) hardware for acquiring data. This type of code is not covered by this pledge, as such software would be highly dependent on the hardware involved and would not necessarily advance the scientific interpretation of the results. The point at which acquired data is saved (“raw” experimental data) and custom scripts are used to further process it, is the point at which this pledge applies. Nevertheless, developers of such code for controlling hardware could well benefit from releasing it (in terms of encouraging reuse/testing, code/bugfix contributions) if the hardware is widely used in the field.
Will this apply to all previous publications by labs which sign this pledge?
No. This commitment to produce analysis scripts only covers papers published after the letter has been signed. However, if labs reuse and further develop code and models between publications/projects, releasing the code openly can lead to better reproducibility of their past publications also.
Will the labs which release code be obliged to maintain the code?
No. There is no implied suggestion that the scripts are useful for any other purpose than reproducing the results of the referenced publication. While someone should be able to get the scripts running on their own machine and produce a figure from the relevant paper, there is no requirement that the scripts should be generic or structured/commented sufficiently that they can be repurposed for use with other data, unless, of course, the publication is specifically about a general purpose analysis pipeline. If someone has trouble running the scripts for the stated purpose, they can reasonably get in contact with the authors for clarification, but they should not expect support for modifying them for other purposes.
Note however that the lab releasing the scripts can certainly claim that they are general purpose and can potentially get bug fixes/tips/extensions from others trying out the code. The README accompanying the scripts should make it clear about the general usefulness of the code and whether such contributions are encouraged.
Won’t extra time/funding be required to get the code into a releasable form?
It shouldn’t. If the scripts are well written enough to be confidently used for generating the results presented in the paper, they should be sufficiently well written to be released. The step of assembling a consistent/minimal/stable set of scripts from the various versions used during the paper writing process is an important quality control step, and should not be seen as additional work.
Does this apply to all software developed in labs which sign the pledge?
No. Labs can be involved in multiple projects, some unpublished, some commercial, and not all code being developed is appropriate for release under an open source licence. This pledge only covers scripts for data analysis and computational models which are related to published works, and which are essential to reproducing the data processing and modelling/simulation stages of those scientific results.
What licences should be used?
There is no stipulation on what types of open source licence should be used for the scripts, but it is highly recommended to include details of a licence when distributing the code. See choosealicense.com which discusses the various options related to open source software.
Note that funding agencies increasingly have detailed requirements on licensing of publications funded by them (e.g. Wellcome Trust now specifies CC-BY for publications) and specific requirements on the licences for sharing of data and code will not be far behind. Checking the latest details with your with funding agency is advised.
Where should scripts be deposited?
There are many user friendly, free code sharing websites available for hosting open source software, including GitHub and BitBucket. ModelDB is a well established archive for sharing models in computational neuroscience. Open Source Brain is a resource specifically designed to allow collaborative development of models in neuroscience, and links to the source code on public repositories like Github. Figshare provides a location to share large volumes experimental data, and associated scripts can be placed here too, if they are unlikely to change much over time. More information on this can be found here.
Aren’t the journals’ own data/code sharing requirements sufficient to ensure these scripts should be made available?
While an increasing number of journals actively encourage authors to make these types of scripts available, few make publication dependent on providing them. Many journals will now require a data availability statement which should state whether these scripts are available. See Nature's discussion on this.
Will the signatories be required to reject manuscripts/grants they review which do not make any code available?
This should be on a case by case basis and will depend on the judgement of the reviewer about how essential the scripts are to the key results discussed in the manuscripts. It will ultimately be up to the editors to decide whether such requests are reasonable.
The desired effect is of this pledge is that researchers will be aware that there are a body of potential reviewers out there who will ask about code availability. Planning to ultimately release code related to manuscripts or making specific commitments to open source in grant proposals will help avoid problems at a later stage.
If we are collaborating with a lab not signing this pledge, should we feel bounded by this commitment?
This should be dealt with on a case-by-case basis. As a general rule of thumb, if you are the corresponding author and/or you’re a main developer of the scripts, it is reasonable to insist on the associated scripts being available/open; if your role on the paper is minor and not related to the scripts, then actively encouraging your fellow authors in this direction is advised.