Taking your code out for a walk

January 6th, 2009

When I was in college, a friend of mine told me he liked to take his code out for a walk every now and then. By that he meant recompiling and running all of his programs. At the time I though that was unnecessary. If a program compiled and ran the last time you touched it, why shouldn’t it compile and run now? He simply said that I might be surprised.

Even when your source code isn’t changing, the environment around it is changing. When I was in college, computers didn’t have automatic weekly updates, but they changed often enough that taking your code out for a walk now and then made sense. Now it makes even more sense. See Jon Claerbout’s story along these lines.

CiSE special issue on reproducible research

January 6th, 2009

Computing in Science and Engineering has just come out with a special issue on reproducible research.  (When you first visit the link, you need to click on “vol 11.” The page is doing some fancy JavaScript that makes it impossible to link directly to the issue.)

The following articles on RR are included.

Guest Editors’ Introduction: Reproducible Research

Sergey Fomel, University of Texas at Austin
Jon F. Claerbout, Stanford University

Reproducible Research in Computational Harmonic Analysis
David L. Donoho, Stanford University
Arian Maleki, Stanford University
Inam Ur Rahman, Apple Computer
Morteza Shahram, Stanford University
Victoria Stodden, Harvard University

Python Tools for Reproducible Research on Hyperbolic Problems
Randall J. LeVeque, University of Washington

Distributed Reproducible Research Using Cached Computations
Roger D. Peng, Johns Hopkins Bloomberg School of Public Health
Sandrah P. Eckel, Johns Hopkins Bloomberg School of Public Health

The Legal Framework for Reproducible Scientific Research: Licensing and Copyright
Victoria Stodden, Harvard University

Irreproducible results in neuroscience

December 28th, 2008

See Andrew Gelman’s post Suspiciously high correlations in brain imaging studies.

BioMed Critical Commentary

December 15th, 2008

I just found out about BioMed Critical Commentary. Here’s an excerpt from the site’s philosophy statement.

The current system of scientific journals serves well certain constituencies: the advertisers, the journals themselves, and the authors. It is the underlying philosophy of BioMed Critical Commentary to serve the readers in preference to any other constituency.

In particular, this site could serve as a public forum for criticism that journals are not eager to publish. It could be a good place to discuss specific examples of irreproducible analyses.

Three reasons to distrust microarray results

December 10th, 2008

Even when lab work and statistical analysis carried out perfectly, microarray experiment conclusions have a high probability of being incorrect for probabilistic reasons. Of course lab work and statistical analysis are not carried out perfectly. I went to a talk earlier this week that demonstrated reproducibility problems coming both from the wet lab and from the statistical analysis.

The talk presented a study that supposedly discovered genes that can distinguish those who will respond to a certain therapy from those who will not. On closer analysis, the paper actually demonstrated that is it possible to distinguish microarray experiments conducted on one day from experiments conducted another day. That is, batch effects from the lab were much larger than differences between patients who did and did not respond to therapy. I hear that this is typical unless gene expression levels vary dramatically between subgroups.

The talk also discussed problems with reproducing the statistical analysis. As is so often the case, data were mislabeled. In fact, 3/4 of the samples were mislabeled. Simply keeping up with indexes is the biggest barrier to reproducibility. It is shocking how often studies simply did not analyze the data they say they analyzed. This seems like a simple matter to get right; perhaps people give little attention to it precisely because it seems so simple.

So, three reasons to be skeptical of microarray experiment conclusions:

  1. High probability of false discovery
  2. Statistical reproducibility problems
  3. Physical reproducibility problems

Distributing Reproducible Research

December 2nd, 2008

Most people would agree that reproducible results are important in all areas of science.  I think reproducibility is particularly important in areas of science where replication of an experiment or study—where a similar question is addressed using independent investigators, data, and methodology—is highly unlikely.  Such studies are typically difficult to replicate because of time, money, ethics, or perhaps all three.  In these cases, all we are left with are the data at hand and being able to reproduce the published results from these data is critical.

Much heat has been generated over the question of whether scientists should be forced to make their data and methodology public.  Journals such as Science and Nature have adopted data dissemination policies; the National Institutes of Health requires data sharing plans for some of its grants; and the Office of Management and Budget Circular A-110 requires that data generated under federally sponsored research be made available upon request if those data were used in developing a government agency action.  While the debate over such dissemination policies is highly relevant, I think it can obscure and cause people to overlook an important question related to reproducible research.

One way I sometimes think of this question is as follows: Suppose a collaborator comes to you and says “I desperately want to make my research reproducible.  What should I do?”  I don’t mean to frame this as purely a hypothetical question—I have actually had people ask me this before.

The problem right now is that I don’t think proponents of reproducible research (myself included) have a good answer to this question.  A typical response might be “make the code and data available”.  Yes, but how?  If we cannot come up with a concrete and coherent answer to this question for people who are willing to make their work reproducible, we cannot realistically expect to change the minds of people who are currently unwilling to make their research reproducible.

I think there are two important roadblocks that make it difficult to publish reproducible research.  The first is the lack of a broad toolset that a wide range of researchers can use to assist them in publishing their data and methodology.  There are a number of efforts out there to develop tools, but many of these tools either have important limitations or are only accessible to more sophisticated users (the Sweave/LaTeX combination comes to mind, although it is a great contribution).  A related problem involves getting people to use tools that are already out there.  For example, I believe the use of version control software is a critical aspect of reproducible research and there are many high-quality software packages available for all operating systems.  I personally use git but many others would also fit the bill.  I must say I’ve had limited success convincing people they need to use version control systems.  I think the basic problem is that it involves learning Yet Another Software Package.

The second roadblock for reproducible research is distribution.  Suppose I carefully keep track of all the code I use to analyze my data and am happy to give the code and data to others.  How do I do that?  Many knowledgeable people will setup a web site for themselves and post code and data on their own web pages.  But demanding that everyone create a web site for distributing reproducible research is in my opinion a steep demand.  Many researchers do not have this capability and even if they did, it is not clear to me that web pages are the ideal medium for disseminating reproducible research.  How much data analysis is done in your web browser?

The distribution problem can be addressed by creating some basic infrastructure.  Analogous infrastructure already exists in other domains.  Users of the R statistical system have the Comprehensive R Archive Network (CRAN) which is used to disseminate R packages (add-on functionality) to anyone around the world.  In practice there is no need to interact with the web site with a browser because R itself can fetch the packages from the Archive and install them without the user ever having to change applications.  Similar facilities exist for Perl (CPAN) and TeX (CTAN).  Of course, we cannot expect such resources to appear out of thin air.  Developing a useful archive requires hardware and administrative time.

I have been trying to develop a system for R users that can be used to distribute reproducible research via a central repository.  The software is an R package called ‘cacher‘ and the associated repository is what I call the Reproducible Research Archive.  The basic idea of the ‘cacher’ package is to take code that represents a data analysis and cache the code and associated data in a series of key-value databases.  This “data analysis cache” can then be packaged and uploaded to the Archive.  Each cache package is given a unique ID (via SHA-1) so that it can be referenced by others in a global fashion.  On the other side of things, the ‘cacher’ package can download an available cache package and a user can run the code in the package to reproduce the results.

Not all of the abovementioned functionality is complete but many aspects of the ‘cacher’ package are available.  There is also a paper in the Journal of Statistical Software that describes the package in greater detail.  The advantage of the ‘cacher’ system is that R users have relatively little to learn—just a few functions.  Of course, the disadvantage is that it is only available to R users, who are a minority of people conducting data analysis in the world.

There are of course other challenges that I haven’t mentioned that will need to be solved before reproducible research goes mainstream. I think the development of the necessary infrastructure (software and distribution media) is just one important challenge that is critical to its adoption because less technical users need to be able to easily “plug-in” to an existing framework without having to build a piece of it themselves.  By learning from experiences in other domains I think we can successfully build this infrastructure and bring reproducible research to a much wider audience.

Roger D. Peng
Department of Biostatistics
Johns Hopkins Bloomberg School of Public Health

Seven presentations on RR

November 29th, 2008

Sergey Fomel just told me about a special session on reproducible research at the “Berlin 6 Open Access Conference” in Dusseldorf, Germany. Presentations from the conference are available online.

Sergey Fomel and Sünje Dallmeier-Tiessen gave presentations in geophysics. Patrick Vandewalle and Jelena Kovacevic gave presentations in signal processing. Mark Liberman, Kai von Fintel, and Steven Krauwer gave presentations related to language and technology.

Video of the presentations is available here.

The Fastware project

November 26th, 2008

Thomas Guest has a new blog post Books, blogs, comments and code samples discussing the challenges of writing a book that contains code samples, may be rendered to multiple devices as well as paper, etc. He points to a project by author Scott Meyers called Fastware that explores ways of meeting these challenges. I haven’t had time to explore Fastware yet, but it sounds like it is concerned with some of the same problems that come up in reproducible research.

Biggest barrier to reproducibility

October 31st, 2008

My previous post discussed Keith Baggerly and his efforts as a “forensic bioinformatician.”

In that article, the reporter asks Keith to name the biggest problem he sees in trying to reproduce results.

It’s not sexy, it’s not higher mathematics. It’s bookkeeping … keeping track of the labels and keeping track of what goes where. The thing that we have found repeatedly in our analyses is that it actually is one of the most difficult steps in performing some of these analyses.

I’ve seen presentations where Keith discusses specific bookkeeping errors. Quite often columns get transposed in spreadsheets, so researchers are not analyzing the data they say they are analyzing.

Forensic bioinformatics

October 30th, 2008

The October 2008 issue of AMSTAT News has an article entitled “Forensic Bioinformatician Aims To Solve Mysteries of Biomarker Studies.” The article is about Keith Baggerly of M. D. Anderson Cancer Center and his efforts to reproduce analyses in bioinformatics papers.

The article quotes David Ransohoff, professor of medicine at UNC Chapel Hill, saying this about Keith Baggerly.

I think Keith is doing a wonderful and needed job … But the fact that we need people like him means that our journals are failing us. The kinds of things that Keith spends time finding out — what [the researchers] actually do — that’s what methods and results are supposed to be for in journals. … We need to figure out how to do science without needing people like Keith.

One of the reasons for lack of reproducibility is that journals press authors for space and so statistics sections get abbreviated. (Why not put the full details online?) Another reason is that bioinformatics articles are inherently cross-disciplinary and it may be that no single person is responsible for or even understands the entire article.

Embedding .NET code in Office documents

October 26th, 2008

I recently heard out about some interesting tools from Blue Reference. I haven’t had a chance to try them out yet, but they look promising.

Sweave has received a fair amount of attention with regard to reproducibility because it lets you embed R code in LaTeX. Code stays with the presentation document, reducing the chance of error and increasing transparency. However, the number of people who use R and LaTeX is small, and asking people to learn these two packages before they can reproducible research is not going to fly. The number of people who use C# and Microsoft Word is orders of magnitude larger than the number of folks who use R and LaTeX.

It looks like Blue Reference’s product Inference for .NET lets .NET programmers do the kinds of things Sweave lets R programmers do, embedding .NET code in Microsoft Office documents. The also make a product Inference for MATLAB for embedding MATLAB code in Office documents.

Python developers who don’t think of themselves as .NET developers might want to use Inference for .NET to embed Python code in Word documents via Iron Python. Ruby developers might want to use Iron Ruby similarly.

Programming is understanding

October 8th, 2008

Bjarne Stroustrup’s book The C++ Programming Language begins with the quote

Programming is understanding.

Many times I’ve thought I understood something until I tried to implement it in software. Then the process of writing and testing the software exposed my lack of understanding

One thing that can make reproducible research difficult is that you have to deeply understand what you’re doing. Making work reproducible may require automating steps that you do not fully understand, and don’t realize that you don’t understand until you try automating them.

Stated more positively, attempts to make research reproducible can lead to new insights into the research itself.

Related post: Paper doesn’t abort

Musical chairs and reproducibility drills

October 1st, 2008

In a recent interview on the Hanselminutes podcast, Jeff Web said that if he were to teach a computer science class, he would have the class work on an assignment, then a week later make everyone move over one chair, i.e. have everyone take over the code their neighbor started. Aside from the difficulty of assigning individual grades in such a class, I think that’s a fantastic idea.

Suppose students did have to take over a new code base every week. People who write mysterious code would be chastised by their peers. Hopefully people who think they write transparent code would realize that they don’t. The students might even hold a meeting outside of class to set a strategy. I could imagine someone standing up to argue that they’re all going to do poorly in the class unless they agree on some standards. It would be fantastic if the students would discover a few principles of software engineering out of self-defense.

I had a small taste of this in college. My first assignment in one computer science class was to add functionality to a program the instructor had started. Then he asked us to add the same functionality to a program that a typical student had written the previous semester. As the instructor emphasized, he didn’t pick out the worst program turned in, only a typical one. As I recall, the student code wasn’t terrible, but it wasn’t exactly clear either. This was by far the most educational homework problem I had in a CS class. I realized that the principles we’d been taught about how to write good code were not just platitudes but survival skills. Later my experience as a professional programmer and as a project manager reinforced the same conclusion.

In some environments, it’s not practical to have people switch projects unless it is absolutely necessary. Maybe the code is high quality (and maybe it’s not!) but there is a large amount of domain knowledge necessary before someone could contribute to the code. But at least software developers ought to be able to build each other’s code, even if they couldn’t maintain it.

When I was managing a group of around 20 programmers, mostly working on one-person projects, I had what I called reproducibility drills. These were similar to Jeff Webb’s idea for teaching computer science. I had everyone try to build someone else’s project. These exercises turned out to be far more difficult than anyone anticipated, but they caused us to improve our development procedures.

We later added a policy that a build master would have to extract a project from version control and build it without help from the developer before the project could be deployed. The developer was allowed (required) to create written instructions for how to build the project, and these instructions were to be in a location dictated by convention. The build master position rotated so we wouldn’t become too dependent on one person’s implicit knowledge.

Having a rotating build master is great improvement, but it lacked some of the benefits of the reproducibility drills. The build master procedure only requires a project to be reproducible before it’s deployed. That is essential, but it could foster an attitude that it’s OK for a project to be in bad shape until the very end. Also, some projects never actually deploy, such as research projects, and so they never go to  the build master.

Medieval project management

August 31st, 2008

I wrote a post on my personal blog recently called Medieval software project management. The post compares software project management to the medieval practice of “beating the bounds,” having young boys walk the perimeter of a parish to memorize the boundaries in order to preserve this information for their lifetimes. Many research projects use a similar strategy, assigning one person to a project for life, depending on that person’s memory rather than capturing project information in prose or in software.

Johanna Rothman wrote a response to the medieval project management post in which she gives good advice on how businesses can avoid such traps. Here’s an excerpt.

Here’s what I did when I was a manager inside organizations, and what I suggest to clients now: make sure a team works on each project. That means no single-person projects, ever. A team to me contains all the people necessary to release a product. Certainly a developer and a tester. Maybe a writer, maybe a release engineer, maybe an analyst. Maybe a DBA. Whatever it takes to release a product, everyone’s on the team. Everyone participates. If they can automate their work and explain it to other people, great. But it’s not a team unless the team can release the product. (Emphasis added.)

It would be terrific progress if more scientific programming were done this way. In theory, science strives for a higher standard. Not only should your team of colleagues be able to reproduce your work, so should anonymous scientists around the world. But in practice, science often has lower standards than business with regard to software development.

Provenance in art and science

August 29th, 2008

Here’s an excerpt from Jon Udell’s interview with Roger Barga explaining the idea of provenance in art and science.

JU: Explain what you mean by provenance.

RB: Think about it in terms of art. For a given piece of art, we’re able to establish through authorities that it’s original, where it came from, and who’s had their hands on it through its lifetime. Provenance for a workflow result is the same thing. Minimally we want to be able to establish trust in a result. If you think about how that happens, it often starts by considering who wrote the workflow. So with Trident you can click on a result and interrogate the history of the workflow: who wrote it, who reviewed it, who revised it, when it first entered the system.