Saturday, June 18, 2011

Google Apps Scripts for an intuitive interface to organic chemistry Open Notebooks

Rich Apodaca recently demonstrated how Google Apps Scripts can be added to Google Spreadsheets to enable simple calling of web services for chemistry applications (gChem). Although we have been using web service calls from within a Google spreadsheet for some time (solubility calculation by NMR link #3 and misc chem conversions link #1), the process wasn't as intuitive as it could be because one had to find then paste lengthy urls.

Rich's approach enables simply clicking the desired web service from a menu on Google Spreadsheets and these functions have simple names like getSMILES. Andrew Lang has now added several web services from our ONS projects and the CDK. There are now 3 menus to choose from: gChem, gCDK and gONS.


To demonstrate the power of these tools consider the rapid construction of a customized interface to an experiment in a lab notebook (in this example UC-EXP263).

1) Because Andy has added a gONS service to render images of molecules from ChemSpider, consistent reaction schemes can now be constructed from this template by simply typing the name of the reactants and products then embedding in the wiki.



2) Planning of the reaction to calculate reactant amounts and product yield can then be processed by simply typing the name of the chemicals. Services calling molecular weight and density are automatic based on the chemical name as input.


3) Typing the name of the solvent then allows easy access to the solubility properties of the reaction components. The calculated concentrations of the reactants and product can be directly compared with their measured maximum solubility. In this experiment the observed separation of the product from the solution is consistent with these measurements.

4) Both experimental and predicted melting points (using Model002) can then be lined up for comparison. A large discrepancy between the two would flag a possible error - in this case good agreement is found. Noting that the product's melting point is near room temperature (53 C) explains why two layers were were observed to form during the course of the reaction and cooling to 0 C induced the product to precipitate. Links to the melting measurements are also provided in column N for easy exploration.

5) Column O provides a quick link to the ChemSpider entries for all compounds and column P provides links to the Reaction Attempts Explorer where, for example, one can explore other reactions where the product was involved. Finally columns Q and R provide one click access to an interactive NMR spectrum of the product, powered by ChemDoodle.

The last few columns still use our older code to call web services but over time these should be added to the gONS collection for convenience.

The easiest way to experiment with this interface is probably to just make a copy (File -> Make a Copy from the Google Spreadsheet menu). The sheet can then be customized for other applications.

Labels: , , ,

Wednesday, April 28, 2010

Reaction Attempts Book Edition 1 and UsefulChem Archive

I am pleased to report that Andrew Lang and I have published the first edition of the Reaction Attempts book. It currently contains most of the Ugi reactions from the UsefulChem project and is associated with an April 27, 2010 snapshot archive of the entire UsefulChem project, including NMR spectra, spreadsheets, images and the entire lab notebook from Wikispaces.


At 582 pages the printing cost from LuLu amounts to $26.28. Not meant to replace electronic searches, it should prove to be a handy reference book for the lab to quickly browse through what was attempted for a given reactant, what the outcome was and the researcher involved.

We are hoping to include reaction attempts from other groups in future editions. More details can be found in the preface, reproduced below:

Reaction Attempts First Edition

Data Source: the UsefulChem project

Introduction

Open Notebook Science (ONS) refers to the practice of making the full contents of a laboratory notebook and all associated raw data files available in near real time.[1] This represents an opportunity for everyone to benefit from work in progress in an open research group. However, in order to make use of the information, it must be easily discoverable. A simple strategy to increase discoverability is redundancy over multiple communication platforms.

In another project - the Open Notebook Science Solubility Challenge[2] - we published non-aqueous solubility data in the form of physical and downloadable (PDF) books.[3] Although it is possible to search the solubility database using web query interfaces, exploration of a Google Spreadsheet, an XML feed, etc.[4], having a physical copy in the laboratory has proved to be very convenient in several instances. A similar format for reactions will also be useful.

The UsefulChem Project

UsefulChem started in 2005 as an organic chemistry Open Notebook Science project with a main goal of discovering new anti-malarial agents that can be prepared by simple and cheap syntheses.[5] Most of the reactions on UsefuChem are Ugi reactions, which involve the mixing of an amine, aldehyde, carboxylic acid and isonitrile in a solvent at room temperature generally for a few hours to days.[6] The multicomponent design of the Ugi reaction and the simple reaction conditions make it ideal for exploring large virtual libraries and selecting compounds of interest to make.[7]

Isolation of the Ugi products can be immensely simpler, cheaper and readily scalable if they precipitate in pure form from the reaction mixture. To this end, much of the research in the UsefulChem project focuses on reaction conditions that lead to this outcome.[8] This is in fact the origin of the ONS Solubility Challenge discussed above.[9]

The Reaction Attempts Database

In order to look for patterns in the reaction conditions which led to Ugi product precipitation, the CombiUgiResults Google Spreadsheet was set up.[10] Reactions indexed there can be sorted by precipitation outcome, solvent, reactant, concentration, etc. and links to the laboratory notebook pages can be followed for full details. However, this sheet is designed specifically for Ugi reactions and contains columns specifically for the aldehyde, amine, carboxylic acid and isonitrile.

In order to enable the tracking of other types of reactions, the information in the CombiUgiResults sheet was reformatted into two other sheets: ReactionAttempts[11] (containing reagents and reactants) and RXIDsReactionAttempts[12] (containing reaction conditions and results, such as solvent, concentration of limiting reactant, appearance of a precipitate, yield, etc.). The two sheets are connected via the use of a common ReactionID. This format permits the representation of any type of reaction, with an unlimited number of reactants and products.[13]

By definition, any Open Notebook Science project in a work in progress. The listing of a reaction in this database only means that the researcher attempted or is in the process of attempting it. Whatever the situation, a link to the laboratory notebook page is provided, where the most recent information is available. The philosophy used here is that partial information is always better than no information at all. Thus a researcher investigating the prior use a particular reactant in a Ugi reaction might find the report that a precipitate was obtained in methanol helpful for designing their own reactions, even if the characterization of the precipitate is still pending. At the very least, knowing that a certain researcher has at least attempted a similar reaction is enough information for initiating a discussion, which may lead to valuable insights.

Reaction Attempts on Chemspider

Although SMILES[14] are provided in the spreadsheets, the primary key to identify compounds is the ChemSpider ID (CSID)[15]. This allows us to render molecule images in the book automatically. In the case of the ONS Solubility Challenge book[3], use of the CSID enables a convenient way to calculate various descriptors for displaying values in the book.

In addition, the compounds in the Reaction Attempts database are indexed on ChemSpider as two Data Sources: ReactantsAttemptedReactions and ProductsAttemptedReactions[13]. In this way a substructure search for either reactants or products will identify indexed molecules. Clicking on the Syntheses tab in the ChemSpider record for a selected molecule will then reveal a list of hyperlinks to the relevant laboratory notebook pages.

Organization of the Book

In keeping with the layout of the ONS Solubility Challenge Book, the reactants are listed in alphabetical order. Each entry displays the list of reactions where the reactant was used. This includes a scheme with all reactants and product as well as key metadata: the researcher, reaction type, solvent, limiting reactant concentration, observation of a precipitate, comments and a reference (links to the laboratory notebook page).

In this edition, only Ugi reactions are included. The reaction schemes are laid out in the following order: carboxylic acid, amine, aldehyde and isonitrile. This should allow for easy comparison between schemes within a given record. Reactions where the Ugi product was isolated and characterized are marked with a green check and the percent yield is noted. Since the Ugi products do not have simple common names, they are not included as separate entries. However, all reactions where the synthesis of a specific Ugi product was attempted can be found by looking up the entries for any of the four reactants.

Although this compilation is not exhaustive, it does cover the vast majority of reactions in the UsefulChem project to date. Future editions will include other reactions from UsefulChem and other sources.

Archive

This edition is linked to the UsefulChem data archive (ZIP)[16], (DVD)[17] and interactive hosted archive format[18], ReactionAttempts (XLS)[19] and RXIDsReactionAttempts(XLS)[20] taken on 2010-04-27.

References

1. Open Notebook Science Wikipedia Entry http://en.wikipedia.org/wiki/Open_Notebook_Science
2. Open Notebook Science Solubility Challenge Wiki http://onschallenge.wikispaces.com
3. Bradley, J.-C. First Edition of ONS Solubility Challenge Book UsefulChem Blog (2009)
http://usefulchem.blogspot.com/2009/12/first-edition-of-ons-solubility.html
4. Open Notebook Science Solubility Challenge List of Experiments page http://onschallenge.wikispaces.com/list+of+experiments
5. UsefulChem Wiki http://usefulchem.wikispaces.com
6. Ugi Reaction Wikipedia Entry http://en.wikipedia.org/wiki/Ugi_reaction
7. Dömling, A., & Ugi, I. (2000). Multicomponent Reactions with Isocyanides. Angewandte Chemie International English Edition, 39(18), 3168-3210. http://www3.interscience.wiley.com/journal/73500473/abstract.
8. UsefulChem List of Experiments http://usefulchem.wikispaces.com/All+Reactions
9. Bradley, J.-C. Open Notebook Science Challenge UsefulChem Blog (2008)
http://usefulchem.blogspot.com/2008/09/open-notebook-science-challenge.html
10. CombiUgiResults Google Spreadsheet http://spreadsheets.google.com/ccc?key=plwwufp30hfpUERhse9y5Kw
11. ReactionAttempts Google Spreadsheet
http://spreadsheets.google.com/ccc?key=0Ak1R8T6wt4YQdG9NejNLcDNUMkVBVURGM01TR0NxdXc
12. RXIDsReactionAttempts Google Spreadsheet
http://spreadsheets.google.com/ccc?key=0Ak1R8T6wt4YQdGVENVFMWjdzaGd2REJTTnA4RG5vblE
13. Bradley, J.-C. Reaction Attempts on ChemSpider UsefulChem Blog (2010)
http://usefulchem.blogspot.com/2010/03/reaction-attempts-on-chemspider.html
14. SMILES Wikipedia Entry http://en.wikipedia.org/wiki/Simplified_molecular_input_line_entry_specification
15. ChemSpider Web Site http://www.chemspider.com/
16. UC archive Drexel server (ZIP) http://showme.physics.drexel.edu/usefulchem/archives/usefulchem2010-04-27.zip
17. UC archive on lulu.com (DVD) http://www.lulu.com/product/dvd/usefulchem-archive/10791847
18. UC interactive hosted format http://showme.physics.drexel.edu/usefulchem/archives/usefulchem2010-04-27/All%20Reactions.html
19. Bradley, J.-C.; Lang, A.. Reaction Attempts Reactants and Products. UsefulChem. 2010-04-27.
(Archived by WebCite® at http://www.webcitation.org/5pIsFEbT9)
20. Bradley, J.-C.; Lang, A.. Reaction Attempts RXIDs. UsefulChem. 2010-04-27.
(Archived by WebCite® at http://www.webcitation.org/5pIs2eh62)

Labels: , , , , ,

Tuesday, April 20, 2010

ONS Books Wiki

I recently reported on our use of Nature Precedings to archive different editions of the ONS Solubility Challenge book. One of the advantages is that Precedings automatically alerts visitors if more recent editions exist.

However, today I learned that there is a glitch to this system: it is not possible to link individual versions on Precedings to a corresponding book edition on LuLu. That means that if you find yourself on the Nature Precedings entry and want to order the book from LuLu it isn't obvious at all how to do so.

To resolve this issue once and for all I just created a wiki page (ONSbooks.wikispaces.com) to track every edition of the book. This is actually better because I can also provide links to all the available data archives and blog posts corresponding to each edition.

This is also the page where we will keep track of every edition of other Open Notebook Science books. The next one to be published shortly is for the UsefulChem project.

Labels: , , ,

Saturday, March 27, 2010

Education 2.0: Leveraging Collaborative Tools for Teaching

On March 25, 2010 I presented at the Drexel E-Learning 2.0 Conference on "Education 2.0: Leveraging Collaborative Tools for Teaching". It was an opportunity to update my slides with what I did and learned from the Chemical Information Retrieval course I taught over the Fall 2009 term.

I described using a wiki to organize course content and to allow students to contribute useful resources. Their assignments were also designed to be useful to other students in the class as well as to the general library and chemistry community.

I covered using wikis and other collaborative tools to mentor students doing laboratory research with Open Notebook Science. At the end I provided a quick overview of using games and Second Life for educational purposes.

Labels: , , , , , , , ,

Saturday, March 20, 2010

Reaction Attempts on ChemSpider

Just as we have done with the Open Notebook Science Solubility Challenge, we are adding more structure to the UsefulChem project.

This is a little bit more difficult because the UC notebook represents mainly chemical reactions, while the ONSC data are simply solubility measurements. Since most of the UC reactions are Ugi reactions, we have been keeping summary data in the CombiUgi Google Spreadsheet, which is completely specialized for this reaction and variations in our reaction conditions. This lets us search or sort by reactant, concentration, solvent, etc. However, we cannot do substructure searching directly using the CombiUgi sheet and we cannot add other types of reactions.

In order to enable substructure searching and add other reactions, Antony Williams has created 2 new data sources in ChemSpider: Attempted Reactions - Reactants and Attempted Reactions - Products. The data represented in the CombiUgi sheet has been restructured into 2 new Google Spreadsheets: RXIDs Reaction Attempts and Reaction Attempts.

Both of these sheets use a common Reaction ID to tie together an unlimited number of reactants and products (Reaction Attempts) and other pertinent reaction conditions (RXIDs Reaction Attempts), such as the concentration of the limiting reagent, the solvent, yield, notes, etc.

Currently only the data in the Reaction Attempts sheet has been imported into ChemSpider. But this alone gives us new functionality: we can perform substructure searches for either reactants or products.

For example lets say we want to search for all reaction attempts using aromatic carboxylic acids. First we simply do a substructure search on ChemSpider drawing benzoic acid and selecting Attempted Reactions - Reactants as the Data Source.


This pulls up 8 compounds that were used as a reactant at least once.


Clicking on one of these hits brings us to the ChemSpider entry. Selecting the Syntheses tab in the Data Sources shows links to the lab notebook pages where this compound was used.


The system is configured to accept reactions with fully characterized products to reactions where products were not isolated or even reactions in progress. I'm not using the term "failed reaction" because the term has no meaning without the context of the objective of the reaction. In our Ugi reactions we are typically looking for the product to precipitate out. By our criteria, reactions where no precipitate was observed after a few days would be classified as "failed". However it may well be that product was formed but did not precipitate. Even when product is obtained, some might consider 30% isolated yields to be failures, while others would not. Context is everything in qualifying success.

But even with a clear definition of success, many reactions are simply neither successful or failures. Reactions in progress fall into that category. The student may have even completed the reaction but not yet analyzed the results. But that doesn't matter so much if the raw monitoring data has been provided.

The general structure of this database means that we can add not only our reactions but those of anybody. Even in cases where someone does not have an Open Notebook, just providing a link to contact information of the researcher could be very useful to start a conversation. In that case the system would function more as a social networking platform - connecting researchers who work on similar molecules.

I don't think people are willing to do extensive write-ups for what they consider to be "failed experiments". However, if all that is requested is the list of reactants and target products that may not be such a burden if it potentially means connecting up with another researcher who can help or even start a new collaboration.

Currently ChemSpider does not take into account the information in the RXIDs Reaction Attempts sheet but we hope to be able to make use of that at some point. That would let us do more sophisticated searches like - search for any reaction attempt where an aromatic carboxylic acid was reacted with an aliphatic amine in methanol.

Andrew Lang has also provided the information of the 2 spreadsheets as XML:
http://showme.physics.drexel.edu/onsc/Services/OData.svc/Reactions/
http://showme.physics.drexel.edu/onsc/Services/OData.svc/ReactionCompounds/
[Note: if viewing on FireFox select View Source to see all the XML]

We will likely use these live feeds for performing more sophisticated queries and we welcome others to use them for any purpose.

Labels: , , , ,

Thursday, April 16, 2009

NASA Open Notebook Science Talk April 09

On April 15, 2009 I had an opportunity to give a talk at the NASA Goddard Space Flight Center. I talked about Open Notebook Science and all of the Web 2.0 tools that we use to operate. There were no chemists in the audience but hopefully the overall patterns of how all the components interconnected made enough sense to be useful.

I had a full hour so this talk is a pretty comprehensive summary of our projects, including the most recent work on the Spectral and ChemTiles games and the automated backing up of Google Spreadsheet documents and semi-automated solubility calculations using web services called from within Google Spreadsheets. All of this work was only possible because of Andy Lang's rapid development efforts. Tony Williams also assisted greatly with the Spectral Game.


slides

We had a very nice conversation over lunch with a few NASA people. I found it interesting that many apparently very different user environments (librarians, educators, molecular biologists, cosmologists, etc.) share very similar needs for Web2.0 technologies. For example delicious was lauded as a very convenient alternative to email for sharing content. The distribution of personalities seems to be similar everywhere: a few early adopters within a larger more skeptical population.

After lunch Emma Antunes gave me a tour of the facilities. Despite the annoying rain to get between buildings, it was well worth it. Here are some of the cool things that I saw:

An enormous room housing very large speakers for testing the effect of vibrations on spacecraft and equipment. Emma stands next to one of the several speakers.



A huge centrifuge for testing the robustness of instruments. Emma said that they were able to put an SUV on there to how much force was required to tip it over.



I saw one of the satellites for the Solar Dynamics Observatory under construction. The idea of this project is to use the different perspectives from satellites at different positions in orbit around the sun to calculate the direction of solar flares and other potentially detrimental activity on the sun.






Next to the largest clean room in the world, there is a display of the guts of the Hubble telescope. Apparently the astronauts had to fix some components in there that were not designed to be accessible so they had to do a lot of practice on a duplicate before attempting the task in space.

Labels: , , , , ,

Sunday, August 31, 2008

UsefulChem and CML in Cambridge

This is turning out to be a very productive trip to the UK. I'm currently in Cambridge at Peter Murray-Rust's house with Cameron Neylon and Egon Willighagen. We're in the process of converting one of the Ugi reactions from our recent optimization paper to CML.

Here is the document. We're using the tag UgiChem2CML to discuss it.

Gotta catch a train to London .... more updates later

Labels: , , ,

Creative Commons Attribution Share-Alike 2.5 License