August 19th, 2008
Test-driven software development has much in common with reproducible research. Here’s an excerpt from a talk by Kent Beck, one of the most visible proponents of test-driven development. He says test-driven development isn’t about testing.
Testing really isn’t the point. The point here is about responsibility. When you say it’s done is it done? Can you go to sleep at night knowing the software that software you finished today works and will help and isn’t going to take anything away from people?
You could say similar things about reproducible research. RR is about responsibility, really finishing a project rather than sorta finishing it. Can other people build on top of your work with confidence? Can you build with confidence tomorrow on the work you did today?
Software unit tests exist not only to verify that code is correct, but to insure that the code stays correct over time. These tests act as tripwires. The hope is that if a future change introduces a bug, a unit test will fail. Again similar remarks apply to RR. With RR, you’re not just interested in producing a result. You’re also giving some thought to producing a variation on that result with minimum effort and maximum confidence in the future when something changes.
Posted in Uncategorized | No Comments »
August 13th, 2008
Last week Greg Wilson asked me what I would do if I had one hour to teach a group about reproducible research. He said to assume that the group is already convinced of the need for reproducibility.
First here are some thoughts on what I’d say if the group had not given much thought to reproducibility. I would start impersonal and then become more personal. I’d start by relating some horror stories of how someone else’s work was impossible to reproduce and contained false conclusions. It’s easy to gang up on some third party researcher, griping about how sloppy someone not in the room was in their research. This plants the idea that at least some people need to think more about reproducibility. Then I’d transition by talking about times when I’ve had difficulty reproducing my own work. Then I would try to convince them that their own work is probably not reproducible or at least not easily reproducible. So my outline would be they have problems, I have problems, you have problems.
I believe that convincing people of the need to be concerned about reproducibility is most of the problem. If people are highly motivated, they will come up with their own ways to make their work easier to reproduce and they will gladly take advantage of tools they are introduced to.
To Greg’s original question, now what? First I’d expound the merits of version control systems. You can’t possibly reproduce software if you can’t put your hands on the source code, and you can’t reproduce software as it existed at a particular point in time without revision history. Then I’d emphasize that version control is necessary but not sufficient. When people first understand version control, they tend to think it takes care of all their reproducibility problems when in fact it’s just the first step. I’d share some war stories of projects that have taken many hours to build even though we had all the source code. (If I had a semester rather than an hour, I’d let them experience this for themselves rather than just telling them about it by bringing in some outside projects for them to rebuild.) I’d also emphasize that it’s not enough to put code in version control: data needs to be versioned as well.
Once they grok version control, I’d discuss automation. When a process is 99% automated and 1% manual, the reproducibility problems come from the 1% that is manual. The principle behind many reproducibility tools is automating steps that are otherwise manual, undocumented, and error-prone. (See Programming the last mile.)
Finally, I’d emphasize the need for auditing. As I pointed out in an earlier post “You cannot say whether your own research is reproducible. It’s reproducible when someone else can reproduce it.” Again if I had a semester rather than an hour, I’d let them experience this by having them reproduce each other’s assignments. I could hear it now: “What do you mean you can’t reproduce my homework? It’s all right there!”
Posted in Uncategorized | 2 Comments »
August 7th, 2008
Greg Wilson and I have been discussing the importance of tools in reproducible research lately. Would more people use reproducibile research practices if tools made doing so more convenient? Would better tools appear if more people cared about reproducibility?
I believe both statements are true, and I believe Greg does as well. However, he and I have different emphases. Greg says “In my experience, most people won’t adopt a programming practice unless there is at least some basic support for it.” I agree, but I think the biggest obstacle to more widespread reproducibility is that few people realize or care that their work is irreproducible. I think that when more people care about reproducibility, some percentage of them will develop and give away tools and we’ll have enough tool support.
We are not in a chicken-and-egg scenario. It’s not as if Greg is saying first we need tools and I’m saying first we need users. We have both tools and users. There are people who care about reproducibility, and some of them have produced tools that make it easier for others to follow. But not many of these people know each other or know about their tools. I hope that the ReproducibleResearch.org web site and this blog will change this.
It help to look at the early history of object oriented programming. Some people were writing object oriented programs before there were (popular) object oriented languages. For example, some people were writing object oriented C before C++ baked support for OO into the language. This was painful, but some pioneers did it. To Greg’s point, the number of programmers writing OO programs took off once there were OO languages with good tool support. To my point, first there were programmers wanting to write OO code; these were the folks who developed the tools and the early adopters of the tools.
Posted in Uncategorized | No Comments »
July 31st, 2008
I heard a line the other day something like this:
When you’re working with your hair on fire, if you see anything that doesn’t look like a bucket of water, you’re not interested.
I think I heard this on the PowerScripting podcast. The context was a discussion about the design of Microsoft’s PowerShell, a shell and scripting environment targeted for system administrators. The idea was that since many sys admins are working with their hair on fire, PowerShell was designed to look like a bucket of water, something that will bring relief rather than yet another thing to learn.
How can we make reproducible research look like a bucket of water? In the long term, even in the not so long term, reproducibility habits can improve productivity and reduce stress. But many people will not be receptive unless they also see short term benefits, the shorter the better.
I think templates are one way reproducibility can look like a bucket of water to someone with their hair on fire. You’ve got an analysis to do? Here’s a template. Fill in your specifics at the top, compile it, and out comes a beautiful report. Along these lines, at M. D. Anderson we’ve created some Sweave templates for microarray data analysis. One of the things I’d like to see happen on the ReproducibleResearch.org web site is a collection of Sweave templates for statistical analysis. If you have anything to contribute, please send a note.
Tags: Sweave
Posted in Uncategorized | No Comments »
July 25th, 2008
Yesterday I wrote about my experience trying out Beamer for writing presentations in LaTeX. Some of the images that I’m wanting to include in my presentations are plots produced in R, so one simplification would be to combine Beamer with Sweave. That way I could include code for producing the images directly in my presentation file rather than referencing external image files. Any change to the R code would be automatically reflected in my presentation.
One problem I had when turning a LaTeX Beamer file into a Sweave file was image sizes. When an Sweave file has \documentclass{article}, plots are modestly sized and centered. But when I tried including a plot with an Sweave file with \documentclass{Beamer}, the image was so large that it covered up other material on the slide. The solution was to include the following line immediately after the \begin{document} command:
\setkeys{Gin}{width=0.6\textwidth}
(See section 4.1.2, page 14 of the Sweave manual.) This command made the image the size I wanted, but the image was no longer centered. To center the image, I added \begin{center} before and \end{center} after the Sweave figure command. This worked. A sketch of the code is included below.
\documentclass{Beamer}
\begin{document}
\setkeys{Gin}{width=0.6\textwidth}
\begin{frame}
\frametitle{…}
Slide verbiage …
\begin{center}
<<&fig=TRUE, echo=FALSE>>=
# R code …
@
\end{center}
Tags: Beamer, LaTeX, Sweave
Posted in Uncategorized | No Comments »
July 24th, 2008
Greg Wilson points out in his badge of reproducibility post a couple things I missed in my previous post announcing the signal processing article.
The first is the reproducible research badge, the green check mark logo on the preprint site. I don’t know who owns that or what their rules are. Perhaps it belongs to EPFL.
The other is the user evaluation form. Users can select one of the following four options:
- I have tested this code and it works
- I have tested this code and it does not work (on my computer)
- I have tested this code and was able to reproduce the results from the paper
- I have tested this code and was unable to reproduce the results from the paper
That is huge. You cannot say whether your own research is reproducible. It’s reproducible when someone else can reproduce it.
Posted in Uncategorized | 1 Comment »
July 23rd, 2008
Patrick Vandewalle, Jelena Kovacevic, and Martin Vetterli have submitted a new paper to IEEE Signal Processing Magazine entitled What, Why and How of Reproducible Research in Signal Processing.
Posted in Uncategorized | 1 Comment »
July 23rd, 2008
The ReproducibleResearch.org web site has added a description of how reproducible research is practiced at Vanderbilt University’s Department of Biostatistics.
Posted in Uncategorized | No Comments »