This is the third part of Peter Cameron‘s post on Open Publication, Open Data and Open Software (the fourth part of the trilogy, called Open Research Funding, appeared later). If you would like to leave comments for the author, please leave them here.
My thoughts about this were sparked by reading the presidential address to the Royal Statistical Society by Peter Diggle. The title was “Data science and statistics”, and the address covered much more than open software. But one of the points he makes is particularly relevant:
Principally, we learn that a published article is no longer a complete solution to a practical problem. We need our solutions to be implemented in software, preferably open source so that others can not only use but also test and, if need be, improve our solutions. We also need to provide high quality documentation for the software. And in many cases we need to offer an accessible, bespoke user interface.
At a time when some research organisations (even high-status ones) are dispensing with statisticians on the grounds that every researcher has on her desk a computer running Excel, it is necessary to be very clear about what researchers can add. One area where expertise matters is experimental design.
The aim of good experimental design is to extract as much information as possible from an experiment in terms of the amount and cost of resources used. Every researcher knows that using too many resources wastes money, while using too few may result in results which are not significant and also wastes money. A statistician can not only advise on the amount of resource required, or the information that can be extracted from a given amount of resource; but also suggest more efficient designs which, by reducing the variance of parameter estimators, make it more likely that significant effects will be noticed. This can result in improvements in agricultural productivity, healthcare, or many other fields: it is literally a matter of life and death.
If statisticians are replaced by computers, better that these computers run well crafted and documented programs than that they force researchers to use standard (and possibly inefficient) design. Scientists will not read a theoretical result in a statistics journal, but if a well-documented R package solves their problem, they might use it.
Another possible benefit from a more serious appreciation of open software would be that its creation is valued more highly by research assessors. In the UK research assessments, sofware has always been an allowable reseach output, but my impression is that it has been regarded as a second-class activity compared to the real business of proving theorems.
I will end with something more personal. My friend and collaborator Donald Preece died last year, and left me with a fairly large package of unpublished material, mainly on two topics: primitive lambda-roots (units of maximal order in the group of units of the integers modulo n), and expressions of the group of units as a direct product of cyclic groups whose generators form an arithmetic progression.
Donald produced a large amount of data on both of these topics, entirely by hand. Indeed, I wrote GAP code for computing with primitive lambda-roots, but I could never persuade Donald to use it; that wasn’t his way. Nevertheless, without him, if I am to publish this material in any reasonable form, it will be necessary for me to check the data computationally. As far as I know, my GAP code has never been used by anyone except me, although I made an effort to document it carefully. For the other problem, I have various one-off bits of code, which I have no intention of re-using; better to start again from scratch.
This is probably not such an uncommon situation for a pure mathematician. We prove a theorem, which involves checking a few small cases by computer; so we write code to do that, and in the paper we simply announce that we have checked these cases, probably with very little detail.
Electronic publishing should make it easy to include commented code as an appendix to a paper.