Back from ACS San Francisco Meeting 2010

Last week I co-hosted a symposium with Noel O'Boyle and Andrew Lang on "Visual Analysis of Chemical Data" at the American Chemical Society meeting in San Francisco (March 22-23, 2010). The ACS recorded almost every talk in our symposium and I'll provide the link here when available (they tell me mid April).

Liz Dorland kicked off the symposium with a great keynote presentation covering effective visualization in a number of fields and the special challenges faced in chemistry. There were several talks about QSAR and I particularly enjoyed Edmund Chapness who incorporated the visualization of confidence in predictions with an intuitive colored molecule map. Geoff Hutchison gave an informative overview of Avogadro.

Perhaps the biggest revelation was the "iTunes for Cheminformatics" project by NIH researcher Ajit Jadhav (leading the team which includes Rajarshi Guha). The alpha version will be available for testing on April 5, 2010 and many of us are eagerly anticipating being able to give it a spin. From what I understand the system will automatically be able to identify scaffolds (fragments) in a collection of molecules and make it easy to search for and filter assay results.

Carmen Drahl covered in minute by minute detail announcements about new drug candidates on Twitter. Following the FriendFeed feed for the conference flagged a very interesting post about a Cold Fusion Symposium that was being held. In spite of the notorious lack of wireless availability at ACS conferences, attendees seem to be making due with accessing their social networks via their cell phone devices.

Andrew Lang and I spoke about Visualizing Chemistry in Second Life - our slides are here - hopefully I'll be able to post the recordings as well soon.

Education 2.0: Leveraging Collaborative Tools for Teaching

On March 25, 2010 I presented at the Drexel E-Learning 2.0 Conference on "Education 2.0: Leveraging Collaborative Tools for Teaching". It was an opportunity to update my slides with what I did and learned from the Chemical Information Retrieval course I taught over the Fall 2009 term.

I described using a wiki to organize course content and to allow students to contribute useful resources. Their assignments were also designed to be useful to other students in the class as well as to the general library and chemistry community.

I covered using wikis and other collaborative tools to mentor students doing laboratory research with Open Notebook Science. At the end I provided a quick overview of using games and Second Life for educational purposes.

Reaction Attempts on ChemSpider

Just as we have done with the Open Notebook Science Solubility Challenge, we are adding more structure to the UsefulChem project.

This is a little bit more difficult because the UC notebook represents mainly chemical reactions, while the ONSC data are simply solubility measurements. Since most of the UC reactions are Ugi reactions, we have been keeping summary data in the CombiUgi Google Spreadsheet, which is completely specialized for this reaction and variations in our reaction conditions. This lets us search or sort by reactant, concentration, solvent, etc. However, we cannot do substructure searching directly using the CombiUgi sheet and we cannot add other types of reactions.

In order to enable substructure searching and add other reactions, Antony Williams has created 2 new data sources in ChemSpider: Attempted Reactions - Reactants and Attempted Reactions - Products. The data represented in the CombiUgi sheet has been restructured into 2 new Google Spreadsheets: RXIDs Reaction Attempts and Reaction Attempts.

Both of these sheets use a common Reaction ID to tie together an unlimited number of reactants and products (Reaction Attempts) and other pertinent reaction conditions (RXIDs Reaction Attempts), such as the concentration of the limiting reagent, the solvent, yield, notes, etc.

Currently only the data in the Reaction Attempts sheet has been imported into ChemSpider. But this alone gives us new functionality: we can perform substructure searches for either reactants or products.

For example lets say we want to search for all reaction attempts using aromatic carboxylic acids. First we simply do a substructure search on ChemSpider drawing benzoic acid and selecting Attempted Reactions - Reactants as the Data Source.

This pulls up 8 compounds that were used as a reactant at least once.

Clicking on one of these hits brings us to the ChemSpider entry. Selecting the Syntheses tab in the Data Sources shows links to the lab notebook pages where this compound was used.

The system is configured to accept reactions with fully characterized products to reactions where products were not isolated or even reactions in progress. I'm not using the term "failed reaction" because the term has no meaning without the context of the objective of the reaction. In our Ugi reactions we are typically looking for the product to precipitate out. By our criteria, reactions where no precipitate was observed after a few days would be classified as "failed". However it may well be that product was formed but did not precipitate. Even when product is obtained, some might consider 30% isolated yields to be failures, while others would not. Context is everything in qualifying success.

But even with a clear definition of success, many reactions are simply neither successful or failures. Reactions in progress fall into that category. The student may have even completed the reaction but not yet analyzed the results. But that doesn't matter so much if the raw monitoring data has been provided.

The general structure of this database means that we can add not only our reactions but those of anybody. Even in cases where someone does not have an Open Notebook, just providing a link to contact information of the researcher could be very useful to start a conversation. In that case the system would function more as a social networking platform - connecting researchers who work on similar molecules.

I don't think people are willing to do extensive write-ups for what they consider to be "failed experiments". However, if all that is requested is the list of reactants and target products that may not be such a burden if it potentially means connecting up with another researcher who can help or even start a new collaboration.

Currently ChemSpider does not take into account the information in the RXIDs Reaction Attempts sheet but we hope to be able to make use of that at some point. That would let us do more sophisticated searches like - search for any reaction attempt where an aromatic carboxylic acid was reacted with an aliphatic amine in methanol.

Andrew Lang has also provided the information of the 2 spreadsheets as XML:
[Note: if viewing on FireFox select View Source to see all the XML]

We will likely use these live feeds for performing more sophisticated queries and we welcome others to use them for any purpose.

RSC Sponsors Open Notebook Science Challenge

I am very pleased to report that the Royal Society of Chemistry is sponsoring 5 new $500 awards for the Open Notebook Science Solubility Challenge.

The previous round of 10 awards was sponsored by Submeta, Nature and Sigma-Aldrich. With the final award of that round having been made in December 2009, this is very good timing.

The criteria and rules for the contest have not changed. Students from the US and the UK are generally eligible to participate. See the Rules and Application Form for full details:

All of the solubility measurements will continue to be compiled and distributed in several formats, including a book where biographies and pictures of all the award winners can be found. The most recent edition - with all 10 previous winners - is available here:

I am very grateful to Antony Williams for being instrumental in making this happen.

Peer Review and Science2.0 Talk

On March 15, 2010 I spoke on "Peer Review and Science2.0: blogs, wikis and social networking sites" as a guest lecturer for the “Peer Review Culture in Scholarly Publication and Grantmaking” course at Drexel University. The main thrust of the presentation was that peer review alone is not capable of coping with the increasing flood of scientific information being generated and shared. I make arguments to show that providing sufficient proof for scientific findings does scale and weakens the tragedy of the trusted source cascade.

The students were mainly in a technical writing program. Lawrence Souder runs the course and has set up an impressive list of guest speakers this term. I think that these topics are at the core of what it is going to be like to write about science in the next few years. Communication channels and information sources are only going to multiply even more and learning how to navigate this evolving system will require effort and skill.

If they took anything away from my talk, I hope they question all their information sources - even those labeled by a particular group as a "trusted source".

Open Notebook Science Tips

Beth Ritter-Guth just wrote an article for "How To Get Started With Open Notebook Science". She had asked me to list a few tips for doing Open Notebook Science. I didn't quite make her deadline so I thought I would post them here. These are ideas that I have discussed in different talks and documents but never put together in one place.
1) Accept that reporting science in real time is not always pretty. Do your best to avoid and correct mistakes as soon as possible but mistakes and ambiguous results will happen on the way to completing any scientific project. Just be honest about your level of certainty when discussing preliminary results.

2) Provide as much raw data as is reasonable and frame it in such a way that other researchers can understand what you have done and follow your conclusions based on your data without having to ask you questions.

3) Don't wait for the perfect technological solutions before starting to share. General purpose wikis can serve as an excellent starting point for an Open Lab Notebook.

4) Don't wait for the perfect data structuring scheme before starting to share. First share for human readability - you can always restructure the data later for machine readability.

5) Periodically write summaries of your research progress in the form of milestones or significant challenges in a format that non-specialists can understand. A blog is a good platform for this. If you link to specific lab notebook pages from your summaries, experts can always click through to dig deeper.

6) Create snapshot archives of your notebooks and supporting raw data files. You can use these as backups and as a convenient way to cite a particular version of your entire research project.

7) Cite specific lab notebook pages and archives when publishing in peer-reviewed journals.

Updated Chemistry Web Services - now with Density

I mentioned a while back the web services that Rajarshi Guha had set up for us. We are often in need of molecular weight and density data for both solutes and solvents since we rely on an assumption of volume additivity when calculating concentration.

Since Rajarshi moved to the NIH, the location of the services has changed. We now have the CDK installed on a Drexel server so some of the simple services like MW and SMILES generation are still available there.

However density has been challenging to provide as a service. Experimental density values for solvents are commonly available but the calculated densities of solids is hard to find. ChemSpider is one of the few sources where calculated densities of solids and liquids are freely available. Unfortunately there are currently no ChemSpider density web services.

As an interim solution for the UsefulChem and ONSChallenge projects we have set a look-up table as a Google Spreadsheet (SolventLookUp) for most solvents of potential interest. Solutes added to our SolubilitiesSum sheet are automatically added to a SoluteLookUp SQL database running at Oral Robert University and the ChemSpider densities are added there via an automated but slow process.

Andrew Lang has used these resources to provide web services returning densities and other properties or descriptors. These data sources are especially important for the nearly automated production of new editions of the ONS Challenge Solubility Book. This is not a general solution since it only includes compounds of interest to our group and would not scale (at least for licensing reasons) to millions of compounds.

But it does come in handy for us because we can quickly call these services within a Google Spreadsheet to do a variety of useful calculations, minimizing the possibility of error by copy and pasting.

As an example see the following ChemServices sheet. Enter the common name for a solvent or solute and the number of millimoles and the sheet will automatically calculate the corresponding number of milligrams or microliters. [Note that Google Spreadsheets can only handle a maximum of 50 web service calls at a time - a useful trick is to highlight cells after the calculations then copy and "paste as values". Make sure to keep some cells with the web service calls in case you need to do more calculations in the future]

Nature Precedings as an Archiving Tool for ONS Solubility Book

The issue of archiving and citation is a topic that is usually raised whenever I give a talk about Open Notebook Science. We have recently tried to address this using several complementary strategies.

The publication of a book containing a snapshot of all the values obtained from the Open Notebook Science Solubility Challenge has turned out to be a convenient mechanism. By using LuLu, the book can be either downloaded for free as a PDF or ordered as a physical copy for just the printing and shipping charges.

However, Lulu does not have a convenient method of keeping track of different editions of the book and it is unclear how to best cite them.

Nature Precedings solves both of these problems quite nicely. I have uploaded the PDF of each book edition to NP and the versions are automatically linked to each other. In fact if you try to access an older edition, NP pops up a warning that a more recent version is available with the corresponding link (see image below).

Precedings also provides information about how to cite the document, including a DOI for each version. Unfortunately it appears that it can take some time for the DOIs to resolve. Links to different versions can also be formatted like this:
Links to the Lulu version of each book are also provided, which is convenient for anyone who might want to order a physical copy.

At this time Precedings does not accept zip files containing the full archive of the source files for each book version - although a link to the archive is provided in the preface of the book. We have found that our library's DSpace repository is a convenient location for these.

