Thursday, October 04, 2007

Science is About Mistrust

I have mixed feelings about the proliferation of the term "Open Notebook Science".

I started using the term a year ago to describe our UsefulChem project because it had no hits on Google and so it offered an opportunity to start with a fresh definition. There are currently over 43 000 hits for that term and it is nice to see that the first hit is still the post with the original definition.

The first part of the term, "Open Notebook", is meant to be taken literally. It refers to the ultimate information source used by a researcher to record their work. The fundamental philosophy of ONS is that of "no insider information". That means linking to all raw data associated with experiments and making available all experiments relating to a project under discussion. This includes failed and aborted experiments. (see recent conference on ONS)

I think that if ONS is practiced in this way it is potentially better than our current system of peer-reviewed article publication.

Here is why:

A key aspect of the scientific revolution a few centuries ago was moving from trust in an authority to mistrust of everything and everybody.

Galileo wrote (as reproduced on p. 25 of John Gribbin's "The Fellowship"):

It appears to me that they who in proof of any assertion rely simply on the weight of authority, without adducing any argument [that is, experimental evidence] in support of it, act very absurdly.

In principle, articles reporting on experimental science are supposed to contain enough information for a reasonably competent peer to repeat. Speaking from experience in my own field of organic chemistry, experimental sections are often highly condensed. When space was limited in paper journals this may have made some sense. But now with electronic storage being cheap that is not an issue (at least in organic chemistry).

Of course, journals now usually have a supplementary section available online to address this. But the past few instances that I have tried to debug a reaction using this resource I have found it insufficiently detailed.

What I really needed was access to the researcher's lab notebook and all associated files to follow specific instances of reactions, not abstracted general procedures. That way I can see what the researcher did and did not do without making so many assumptions. For example: Were the starting materials checked for purity? How exactly was the reaction monitored and what do those spectra or TLC images look like? Was there any solvent left in the product when it was weighed that might account for those impressive yields?

A major flaw in the current scientific publication system is that there is still too much trust. Readers are expected to trust editors to choose appropriate anonymous peers to review submissions. Reviewers trust primary authors when reporting the summarizing of their research results. Primary authors trust their collaborators, students and postdocs to give them accurate information when writing papers.

If we make the laboratory notebook and all associated raw data public we can significantly reduce the amount of trust required to keep this house of cards standing.

The main problem is not so much that people will completely fabricate data, although this does happen. It is more that mistakes get made and corners are cut to get the paper out the door under pressure. And once these errors are in print it is very difficult to get people to correct them, if they are ever discovered.

As a researcher I don't even trust myself. And I shouldn't. Students who haven't yet mastered the discipline required to keep a good detailed timed lab notebook log as they execute and observe will likely be humbled quickly when they realize how poor human memory reveals itself to be for these tasks.

Time-stamped video and digital photographs can go a long way to reduce the burden on the researcher to record details of their experimental set-up and observation of the reaction over time. This has proved very useful in the past in my group and I would like to see even more systematic use.

There is currently tremendous skepticism associated with publishing scientific results on the web using social software. If people start citing blog posts that do not link to primary raw data for support, in a manner implying that there is strong support, then it is going to be difficult to do good science with social software. There is already plenty of that type of thing going on in the peer-reviewed literature, with one article citing another citing another citing "unpublished results".

Now there is nothing wrong with discussing some interesting aspect of ongoing research without linking to primary raw data, but that is not Open Notebook Science in the sense that I have been using it. These could be called "teaser posts" and might be useful for finding collaborators and initiating discussions but they cannot be used as an alternative to traditional peer-reviewed literature.

I know that there are many who think that peer review is needed to legitimize scientific blog posts. I think that the ability to comment on posts is useful to continue discussions with the community. But expecting any kind review system to completely validate research published using any vehicle is not realistic.

The only people truly qualified to judge a piece of research are those who have actually looked at the raw data to see if everything adds up and that takes time, assuming they have access to it. It is unlikely than anyone will do that without being properly motivated - generally only other researchers trying to reproduce the experiment for their own purposes will have a good reason to invest the time. Now comments from these individuals would be valuable but that applies to a very small proportion of all recently published work.

And going forward, I see Open Notebook Science as being a natural way for the scientific process to become more automated, with machines reporting executed protocols on the open web. We have a ways to go to reach that point but I think this is a more likely scenario than expecting machines to learn how to write human-reviewed articles.

The point of science is generating actionable information - at least in a field like synthetic organic chemistry. We have a great opportunity to use these new web tools to do this so much more efficiently than ever before.

(By the way, we still intend to publish our work through conventional channels. Not so much to communicate new information to the community but for all the other reasons that researchers are under pressure to publish.)



At 10:59 PM, Anonymous Anonymous said...

I think academic journals still have a role to play in open notebook science. I'm not thinking of the potential benefits to the authors such as promotion, tenure, etc. but of the journal's role of presenting new results in a condensed, critiqued and edited form (thank God for editors). Of course with ONS the results will be new only to those who haven't been following the research, but I think that will actually be the majority of readers. I know I read articles about all sorts of interesting things in many different academic fields and there is no way I'd be following everyone's notebooks in those areas. I wouldn't have the energy or the necessary expertise.

So is ONS better than peer-reviewed article publication? Well, I wouldn't say that one is better than the other but they are definitely better acting together, at least for the immediate future.

I will say however, that as new tools that make ONS easier become available, the number of researchers utilizing ONS techniques can only grow (in all fields, not just chemistry) and journals that don't adapt will get left behind.

On a slightly different note, The Galileo quote that I like to give to my students is:
“I do not feel obliged to believe that the same God who has endowed us with sense, reason, and intellect has intended us to forgo their use.”

At 5:54 AM, Blogger Jean-Claude Bradley said...

Thanks for the feedback!

I don't expect people to read the lab notebook as a narrative. The point is that people who are trying to do similar experiments will find relevant information by searching for it on the open web.

For example, a few hours ago, someone found EXP036 using this search.

There is sufficient information there to understand what the student did, including the IR spectra that were being searched. This may or may not be useful to the person who was searching for that information but at least they can make that call with few assumptions.

This is an example of an experiment that will never make it into a peer-reviewed article but was potentially useful to someone because it was indexed in Google.

When I say that ONS can be "better" than the current peer-review system, I speak from the standpoint of an experimental chemist trying to repeat procedures from the literature - there is usually not as much detail as I would like.

For condensed summaries of research I use our blog and link back to relevant experimental pages on the wiki. For example, see this post about a mechanistic discussion.

At 4:15 PM, Blogger J said...

Excellent post!

I agree that open notebooks must link to the raw data otherwise the notebook really isn't even useful to the author of the notebook itself. Presumably, the author of this type of sorta-open-notebook has a second non-public notebook where they keep their raw data, otherwise it's hard to even say they're doing science.

andy/hiro, in my experience, journal editors aren't really doing any "editing". They serve more like gatekeepers, umpires, and referees. I do think that reviewers clean up a paper a little by telling the authors to "tone down" some of the hype that was included in the paper to get it past the editor. I agree that book editors contribute quite a bit the final product.

I agree with andy/hiro that we need academic journals (particularly open access ones) - certainly for the foreseeable future. First because they force people to stop and clarify the question they're really trying to answer. But more importantly, because they are archival. I could read the Watson/Crick double-helix paper 40 years ago, and I'll be able to do so 40 years from now. But if Watson/Crick published their paper as a blog post, the technologies would change a hundred times from then to now and unless someone was constantly updating the Watson/Crick blog to the format of the day, the paper could easily be lost. Web-based technologies have a much greater tendency to disappear. I do think that in the future an open notebook combined with condensed summaries on a blog could replace the current peer review publication system. But the internet needs a little robustness first. Or at least scientists would need some sorta pubmed-like central repository to make sure knowledge isn't lost when some internet company (e.g. Netscape) folds.

At 5:12 AM, Blogger Jean-Claude Bradley said...

I don't think you can compare the publication system 40 years ago with the current one, and certainly not about the situation in the future.

First, with zero-cost publication (including publisher allowed self-archiving) you have the safety of redundancy. It is very easy to back-up your entire Wikispace to a zip or html format. Also you can publish key documents to archives like Nature Precedings, the Internet Archive, etc. Your recorded talks can be archived on YouTube, Google Video, your own server, etc. All this is easy as long as you don't give away your copyright.

Also even if you only use one provider like Wikispaces, if the company does close down eventually, it is unlikely to happen overnight without warning, deleting all their users data. There will likely be plenty of time to relocate.

But on a more fundamental level, scientific information loses its value quickly. Aside from the historical value, the Watson/Crick article currently has little worth from a pragmatic experimental standpoint. In organic chemistry, I have found that articles that are several decades old are essentially useless - they don't provide key characterization data (like NMR) and modern purification techniques (like chromatography).

Similarly, I suspect that all of the experiments from my group over the past year will be easily reproduced in a weekend by high-throughput automated systems in 10 years, with far superior characterization techniques.


Post a Comment

<< Home

Creative Commons Attribution Share-Alike 2.5 License