A few days ago Andrew Lang
suggested to Dustin Sprouse
that he submit his thesis to the Reaction Attempts
database. Like many undergraduates Dustin put in a lot of time and effort in doing experiments and writing up his results but didn't have quite enough time to obtain all that would have been required for a traditional publication.
A thesis is an unusual document within the context of scientific communication. Unlike a peer reviewed paper, it may contain a large number of "failed experiments" and a substantial amount of speculation. Although it is not quite as detailed as lab notebook, there is often plenty of raw data and details about how failed or ambiguous experiments proceeded.
In Dustin's case we felt that there was enough information provided to include his thesis in Reaction Attempts. In addition, his thesis was accepted by Nature Precedings
, thus providing a convenient means of citation.
The first component of the Reaction Attempts project is to quickly abstract the most basic information from synthetic organic chemistry reactions. This includes the ChemSpiderIDs
from the reactants and target products and brief notes about conditions and outcomes. We are especially interested in failed or ambiguous experiments because these have almost no chance of being communicated and indexed in the traditional systems. When attempting to carry out a reaction, it can be just as useful to know what doesn't work - and more specifically how
it doesn't work.
The second component of the project is dissemination. Because the information is encoded semantically, it can be automatically converted to both human and machine readable formats
One human interface consists of a PDF book
(also as a hard copy
), with the option of selected reactions specified by listing CSIDs of reactants in the URL. For example Dustin's reactions can be presented selectively here
. We also have a Reaction Explorer
, where reactants or products can be selected from a dropdown menu or via a substructure search.
We also provide live XML feeds
so that others can create applications easily from machine readable data. For example one could create reaction chains automatically, which will occur whenever we enter reactions from multi-step syntheses like Dustin's - based on the synthesis of resveratrol
I know that Peter Murray-Rust
has been very active in automatically abstracting information from chemistry theses. It would be interesting to see how that approach would work for this thesis, especially with the failed experiments. Reducing a page or two of text into only the most salient bits of information manually required a level of judgement that I imagine would be tricky to do automatically.
Labels: reaction attempts, resveratrol, thesis