Thursday, November 29, 2007

Swarthmore Talk on Open Notebook Science

On Tuesday November 27, 2007 I had the pleasure of speaking at Swarthmore on our UsefulChem project and Open Notebook Science more generally.

Liz Evans and Cheryl Grood from the Swarthmore Sigma Xi Chapter did a wonderful job in rounding up people to have discussions both before and after my talk at dinner. This gave us an opportunity to share teaching experiences with new technologies (blogs, wikis, Second Life, etc.) - something I didn't really get into too deeply during my talk.

The timing was also quite fortunate because I was able to discuss an important new result from our lab (EXP148) obtained just a few days ago. (More on this shortly in a separate blog post).

I had some very thought-provoking conversations with both students and faculty. One of the recurring questions was what format Open Notebook Science would take in various scientific fields. Some disciplines, like mathematics, don't have formal laboratory notebooks like synthetic organic chemistry. But there are still ways of reporting daily progress.

The recording of the talk is now available here.

Labels: , ,

Monday, November 26, 2007

Support for Cameron's Proposal

Cameron has requested support for his Open Science networking proposal. His deadline is today so here is mine:

With a growing number of Open Science advocates across the world, there is certainly a need for funding to facilitate interaction and collaboration. My research group at Drexel University could certainly make use of such a program and I strongly support the Open Practises E-science Network initiative.

Jean-Claude Bradley
Associate Professor of Chemistry
Drexel University
Philadelphia, PA

Thursday, November 22, 2007

Networking for Open Science Proposal

Cameron Neylon is looking for interest in his idea to fund networking for Open Science:

The UK Engineering and
Physical Sciences Research Council
currently has a call out for proposals to fund ‘Network Activities’ in e-science. This seems like an opportunity to both publicise and support the ‘Open Science’ agenda so I am proposing to write a proposal to ask for ~£150-200k to fund workshops, meetings, and visits between different people and groups. The money could fund people to come to meetings (including from outside the UK and Europe) but could not be used to directly support research activities. The rationale for the proposal would be as follows.


‘Open Science’ has the potential to radically increase the efficiency and effectiveness of research world wide.


The community is disparate and dispersed with many groups working on different approaches that do not currently interoperate - agreeing some interchange or tagging standards may enable significant progress


Many of those driving the agenda are early career scientists including graduate students and postdocs who do not have independent travel funds and whose PI may not have resources to support attending meetings where this agenda is being developed

There is significant interest from academics, some publishers, software and tool developers, and research funders in making more data freely available but limited concensus on how to take this forward and thus far an insufficient committment of resources to make this possible in practice

Wednesday, November 21, 2007

The Scientist Best Lab Web Site

Time to vote for the best laboratory web site from the top 10 selected by The Scientist's judging panel (of which I was a member). All of the finalists have strong sites but there can only be one winner....

Labels:

Tuesday, November 20, 2007

Experimental Uncertainty Principle

Most of us are familiar with the mantra of how science progresses:

A hypothesis can never be completely proved by any finite set of experiments but it can be falsified by a single result.

In mathematical proofs, clear cut algorithms can usually be applied to prove unequivocally the falsehood of a theorem (notwithstanding Godel's incompleteness theorems :)

But in real research in the physical sciences, that is not exactly how scientists process reports of experimental results. And an important reason is the way results are reported.

Lets pick an example from the Open Access Beilstein Journal of Organic Chemistry.

Here is the full description of the experiment from the supplementary materials page:

To a solution of 5a (196 mg, 0.433 mmol) in CH2Cl2 (1.8 mL) was added p-toluenesulfonic acid (19 mg, 0.11 mmol). After stirring for 0.5 hours at 0 oC, the mixture was concentrated under reduced pressure and purified by flash chromatography on silica gel (eluent: ethyl acetate: P. E. = 1: 3) to provide 7 (152 mg, 100%) as a colorless oil.

I have omitted the characterization information. Lets assume for the moment that it is completely correct.

The question is : if 10 chemists follow this procedure as described, will they get 100% yield of pure product?

I think that it is quite possible that the results will vary wildly, including many complete failures. Here is why:

1) The reaction is carried out at 0 deg C for 30 minutes but the conditions of the work-up are completely unspecified. We don't know the pressure, the temperature of the bath or the duration of the solvent evaporation. The temperature of the rotovap bath will vary wildly from lab to lab, depending on vacuum pressure and personal preference. This is key because the conditions of the work-up (warmer and more concentrated) are much harsher than the reported reaction condition. My guess is that when this gets indexed in a database the reaction conditions will be further stripped of detail and likely end up as 0 C, 30 min.

2) The chromatography step does not specify how much silica to use, the dimensions of the column, the number of fractions, the TLC images of the fractions, the amount of solvent used to load the reaction mixture, etc. It may even be the case that the ratio of solvents was changed over the course of the chromatography - in a situation like this some would use a good solvent like methylene chloride to load the mixture then chase it with a solvent mixture containing a lower ethyl acetate/petroleum ether ratio.

3) A 100% isolated yield after chromatography means that not a single milligram was lost during transfer to the column and that all fractions containing the product were very pure. Ethyl acetate is notorious for increasing apparent product yields because it is sometimes difficult to remove on the vacuum pump. I would like to see the NMRs of the fractions.

This last point also brings up the issue of what the researcher does when confronted with an apparent 101% yield - since this is not chemically plausible it cannot be reported as such. Does the researcher state an assumption that there is a bit of extra solvent and slice off a milligram in the report? We can't tell from the information given in journals.

I want to make it clear that I am not picking on the authors for reporting in this way. Within the current norms of the organic chemistry community, this is an acceptable way to report laboratory procedures in peer reviewed journals.

Of course all (or most) of these details should have been recorded in the laboratory notebook. I understand that initially protocols in papers were abbreviated to save on space. But now with unlimited online supplementary materials associated with papers, researchers could scan their notebooks and all associated documents. But that is not required by the chemistry journals that I know and I have not seen it done.

Keep in mind that this is not new work - researchers already have (or should have) all of this as a routine part of doing research. This is one big advantage of Open Notebook Science - very little extra effort required. (Cameron Neylon also has a very nice recent summary of his thoughts on this.)

Any chemist will tell you (if they are honest) that there is almost always a mistake, however small in every experiment. By everyone agreeing to report experiments in a highly abbreviated form, it makes it convenient to get done more quickly and get that all important paper out the door. Do you completely start over an experiment because you measure 101% apparent yield? Or do you realize that you just don't have time and take a "shortcut" of some type to get that paper out.

All of this would go away if we came clean about our experiments - the good, bad and the ugly. Lets stop pretending that we did the reaction EXACTLY as stated in published abbreviated protocol and we might start to get out of this quagmire.

We don't have to change the way we abbreviate experiments - just link to the relevant pages in the laboratory notebook in the supplementary sections of papers.

As chemists try to make sense of the physical world and process results from other researchers, they have to evaluate the meaning of experiments published like this. Instead of processing the information algorithmically, they apply fuzzy logic: more weight is given to results with more proof.

With the limited information provided in this particular experimental description, I would expect that it is possible to get this reaction to work in good yield but I would not question the fundamental laws of nature if some chemists report that it fails completely. If I had access to the laboratory notebook and all raw data, including how the reaction was monitored, I would weigh the evidence of each report quite differently.

The more information one has about an experiment the more confidence one can place in the results. But it would never be possible to have complete confidence in any result, no matter how much information is provided. And because providing more information costs more in terms of time and money, a balance has to be struck.

We might call this the experimental uncertainty principle:

All experimental results are uncertain to some degree. Uncertainty can be reduced with more information but then fewer experiments can be performed with the same resources.

For example, an experiment like EXP064 provides extensive links to monitoring runs after each step in the reaction and provides evidence of the purity of the starting materials. By contrast EXP134 records 4 parallel reactions with only photographs as results. The purpose of the first experiment was to understand the Ugi reaction, while the second aims to quickly identify Ugi reagents that lead to easily purified products. When these reactions get compared, the second carries far less weight than the first - but we only know that by looking at the details in the notebook.

If we expect autonomous agents to contribute to the process of doing science (for example formulating and testing hypotheses), information has to be tagged in such a way that it incorporates a measure of uncertainty.

I suspect that it will be easier in many cases (like organic chemistry) to simply redo the experiment under known conditions rather than attempt to get hold of the original notebook.

Labels: , ,

Sunday, November 18, 2007

Cameron Neylon's ONS Proposal to Collaborate

Cameron is looking for an Open Notebook Science collaborator - any takers in the molecular biology world?

We will send the reagents to anyone who would like to do the experiments along with any further information required. In principle people ought to be able to figure out everything they need from the lab book but this will probably not be the case in practise. The idea here is to see whether this notion of a loose collaboration of groups with different resources and expertise that is driven by the science can work and whether it is a competitive way of doing science.

My criteria in accepting collaborators will be as follows:

Willingness to adopt an Open Notebook Science approach for this experiment (ideally using our lab book system but not necessarily)
Interest in and willingness to engage in the development of the published paper
(including proposing and/or carrying out any new experiments that would be cool to include)
Ability to actually carry out the experiment in reasonable time (ideally looking for a couple of months here)

So this is notionally a win-win situation for me. We will be getting on and doing our own thing as well but by working with other groups we may be able to get this paper out more efficiently and effectively. Maybe others will come up with clever experiments that would add to the value of the paper. The worst case scenario is that someone comes along and sees this, copies the results, and publishes ahead of
us. The best case scenario is that someone else already working in a similar direction may come across this and propose working together on this.

Sunday, November 04, 2007

Cameron's ONS Talk

Cameron Neylon gave a very thoughtful talk at Drexel on Friday about using blogs to capture the science going on in his group then deciding to open his laboratory notebooks to the world.

He was refreshingly honest about his progress and motivations. For example, at one point he noted that a gel image was missing on one of the posts. Instead of glossing over it, he pointed out how this just makes transparent how difficult it is to properly maintain a laboratory notebook. As long as you don't have to show it to anyone, it is tempting to claim that your lab notebook is better maintained than it really is.

And this is a positive thing - science is messy and even through the human failings of ideal record keeping, science gets done. Now if we finally admit to that and are willing to work transparently, we have an opportunity and an incentive to set a higher standard.

That is one of the tangible benefits of Open Notebook Science.

Cameron's talk was recorded and is available here.

Labels:

CombiUgi Update: the Master Table

It has been a while since my last update on the CombiUgi project. Those who have been following the UsefulChem wiki and mailing list will be aware of all the experiments and discussions but it helps to take a look at the big picture periodically.

My last post described our new focus on trying to make falcipain-2 inhibitors and Rajarshi was kind enough to do some docking runs for us. To keep costs low, we started using a library of Ugi products that we can make just from starting materials that we have in abundance (at least 5 g) in our lab.

Four undergraduate students have joined the group in the past few weeks and have been trained to perform the Ugi reaction in 1 dram vials. We initially started doing these in 1.5 ml Eppendorf tubes but they didn't seal perfectly with methanol and leaked during vortexing.

We are currently focusing on Ugi products which precipitate within a few days from the reaction mixture. We were very fortunate in that our first Ugi products crystallized from methanol. It turns out that not all Ugi products behave that way but we are hoping that enough will from the first few hundred falcipain-2 hits to have a handful of compounds to test.

Those reactions that generate pure products as precipitates are tremendously easier to run and scale up compared to requiring a chromatography purification step. If we can provide a list of Ugi products that can be easily prepared this way, I think that would be useful for other researchers with other applications in mind. Maybe we will identify patterns in different solvents that will enable us to understand (or at least empirically predict) how to induce precipitation for a given Ugi product. This kind of practical laboratory advice would add to the knowledge base of information on the Ugi reaction provided in a recent Nature Protocol report by Stefano Marcaccini and Tomás Torroba. (unfortunately not Open Access)

To keep track of all of this, results are pooled in a GoogleDoc Master Table on the CombiUgi page. Look on the experiment number to locate the experimental details from the experiment list page.

Labels: , ,

Creative Commons Attribution Share-Alike 2.5 License