Saturday, January 05, 2008

Tracking Results with Workflow Tables

Following my post about shifting the storage of chemistry experiments to a results-centric model, I received lots of good feedback.

Egon pointed out an ambiguity in specifying the addition of a compound and that is now fixed in RESULT0001, RESULT0002 and RESULT0003. Instead of
ADD methanol (InChIKey=OKKJLVBELUTLKV-UHFFFAOYAX, volume=1 ml)
we now have:
ADD compound (common name=methanol, InChIKey=OKKJLVBELUTLKV-UHFFFAOYAX, volume=1 ml)
Peter demonstrated some related work of his using CML to represent reactions taken from experimental sections of published articles. This looks tricky because there is usually a lot of missing information in journal articles but I definitely think it is worth doing. We're using our laboratory notebook (specifically the log sections) so we have reasonably complete information in most cases.

I certainly am interested in using CML to represent our result modules and I appreciate Peter's help in trying to translate some of our modules into CML. Hopefully everything can be specified with the existing components of CML and CMLReact.

But representing the information in machine-readable format is just one half of the equation. Being able get information back out with powerful queries is just as important.

Antony's comments about workflows got me to rethink the problem from a slightly different angle. Although the result files that I have been constructing are very flexible, until someone actually populates a database with the data they describe, it will be difficult to get aggregate information back out. The main problem is that to compare two workflows requires lining up the corresponding actions. It is doable but requires some intelligent processing, only possible once a database is in place.

However, by sacrificing a bit of the generality, we can gain a lot in the short term. The vast majority of reactions that we've carried out in my lab are just variations on the Ugi synthesis. All Ugi syntheses have an amine, an aldehyde, a carboxylic acid, an isonitrile and a solvent. It turns out that with a series of tables, we can represent all the workflows leading to a result in a way that enables ready comparison and sorting.

The first table records the time of action initiation (normalized to minutes) for each workflow. Since these are in absolute times from the start of the experiment, the order of the columns is unimportant. If we were looking for experiments where the aldehyde was added after the amine, we would simply substract the aldehyde addition time from the amine addition time and look for positive values. Also reactions involving only the formation of an imine would be a subset of the Ugi reaction with blanks for acids and isonitriles.

The second table records the quantities of compounds (normalized to millimoles) and the third records the duration of time variable actions (normalized to minutes). Examples of the latter include vortexing and centrifugation durations.

Two additional tables record the identity of the compounds, one using the InChIKey for machine recognition and the other a common name for human use.

I have represented all of the workflows with documented results for EXP150. Links are available to the raw image data on Flickr or JCAMP-DX files for the NMR and IR spectra on our server.

Using GoogleDocs is very nice for this kind of thing. Right clicking on any cell offers a Google search, which is extremely convenient for the InChIKey. It is also easy to make the data public this way and invite collaborators. (Speaking of which, I need some help to complete the conversion from the wiki to these tables :)

Labels: , , ,


Post a Comment

<< Home

Creative Commons Attribution Share-Alike 2.5 License