Saturday, June 23, 2007

InChIMatic, ChemSpider and UsefulChem

Rich Apodaca wrote about using his InChIMatic service to track molecules in UsefulChem.

Because we use InChIs in blog posts and HTML pages generated automatically from the molecules blog, doing an InChI search in Google is a pretty good way to find molecules of interest to UsefulChem. However, Rich makes the valid point that these pages do not always point to the experiments where they are used.

I was aware of the limitations of using a blog to track molecules when I set it up. Because we were limiting ourselves to a few hundred molecules, the blog served its purpose much as I expected it would.

But now, as we move to the manipulation of tens of thousands (and soon to millions) of molecules, we need to transition to a true database.

I've been working with Tony Williams to use ChemSpider for this purpose. UsefulChem has been a supplier in ChemSpider for several weeks and most of our molecules from the molecules blog have been indexed. In the next few days the first 68,000 molecules from the CombiUgi project should be incorporated as well.

This effectively moves the indexing and searching burden to a free hosted service that is designed to handle it. This is the same logic that I used when choosing Wikispaces to act as our group laboratory notebook.

Lets take a look at an example of how this can work.

Click on the Search button of ChemSpider then hit "Advanced". Under "Search by Data Source" select UsefulChem. Scroll to the top of the page and select "Search by Structure" then "Draw". Select "Substructure" then draw a furan ring

You should get about 10 hits.

Click on the 5-methylfurfurylamine to see its record in ChemSpider.

This record can be curated or annotated. I'm hoping we can use this interface to annotate with links to spectra, references, etc. But for now just click on its InChI and you'll get a Google search finding that molecule on UsefulChem blogs, Chemical Blogspace and an experiment page (EXP086) where its was used.

In order for that to work well, we need the InChIs to be generated for every molecule in every experiment. We've been putting the InChIs in the Tags section of each experiment page and it is now on the highest priority of our Experimental Format page to make sure that it gets done quickly.

Note that these InChI's could be scraped fairly easily from every UsefulChem experiment because of the standard format for specifying the experiment page.

The only issue left to really complete the process is an automated way to add new molecules to ChemSpider. Tony says that will be done soon.

Chemical Blogspace Tags


Labels: , ,


Post a Comment

<< Home

Creative Commons Attribution Share-Alike 2.5 License