Thursday, March 26, 2009

Wendy Warr report on ONS from Fall 08 ACS meeting

Wendy Warr has provided a thorough report from the Fall 2008 American Chemical Society meeting in Philadelphia. My talk "Processing drug discovery raw data collaboratively and openly using Open Notebook Science" is summarized on p.10 as part of a special extract of talks and companies.

Wendy has covered a lot of material related to cheminformatics in the full report and is worth looking over if you missed the conference.

Labels: , ,

Wednesday, March 18, 2009

Chemistry publication - making the revolution

Steven Bachrach has just published a very interesting commentary "Chemistry publication - making the revolution" on March 17, 2009 in the new Journal of Cheminformatics (I am on the editorial board).

The article does a great job of highlighting the current state of affairs in chemistry which is creating a chasm between researchers and the data they could be using. There are suggestions of steps that can be taken by researchers, reviewers and editors. Although Open Notebook Science is mentioned, there are several less intensive steps that can be taken to still benefit the community. Steven mentions the submission of JCAMP-DX files instead of PDFs when submitting spectra as supplementary materials for publications. I agree that this is a very low barrier step that chemists can take to gain immediate benefit - whether they share their spectra or not. At that point it also becomes trivial to contribute spectra to databases like ChemSpider that supports the open JCAMP-DX format - hopefully as Open Data (so that it can be used in applications such as our SpectralGame)

Labels: , , ,

Friday, March 13, 2009

CDD's Pay for Privacy Model

Barry Bunin and co-authors Moses Hohman, Kellan Gregory, Kelly Chibale, Peter J. Smith and Sean Ekins just published an article in the March 2009 issue of Drug Discovery Today: Novel web-based tools combining chemistry informatics, biology and social networks for drug discovery.

This is a must-read for the Open Science community involved with drug discovery. Barry reviews several relevant online resources and details the workings of the Collaborative Drug Discovery system.

The point that I thought was most interesting is that Barry reports that CDD now has a policy of allowing unlimited uploading of data by non-subscribers. Researchers pay for privacy, which is a model that I think works well for a number of Web2.0 applications. For example, Wikispaces works that way. Since most users will likely want to limit their data sharing between selected collaborators, this is a good way to get the word out by groups who don't mind sharing publicly.

There are a few case studies in the article, including our UsefulChem project and our collaboration with Rajarshi Guha at Indiana University and Philip Rosenthal at UCSF to find new anti-malarial compounds. Barry illustrates the point with the assay results for one of our Ugi products:


Thursday, March 12, 2009

Spectral Game Winners

Following up on my last post about the SpectralGame, I have given out 2 molecular model kits to the highest scoring students in my CHEM242 class.

The first winner was Scott Beaudoin with 24 points followed by HaeJi Choi with 17 points.

The current high scores for the Drexel students can be accessed here. High scores from everyone in the world can be found here. The top score is currently 40 by VK.

Andy and Tony continue to fine tune the operation of the game. The recent introduction of a "Reload Spectrum" button below the molecules prevents the game from stopping prematurely if the spectrum won't load for any reason.

We still welcome contributions of spectra (NMR, IR, UV, MS, etc.) and players!

Labels: , , ,

Andy and Shirley's new ONS Logos

Some new ONS logos are available to specify a researcher's intent in making their lab notebook available online. Options include full Open Notebook Science (All Content - Immediate) or more restrictive forms (Selected Content or Delayed).

Andrew Lang updated his logos, including options for multiple sizes:

Shirley Wu also contributed these:

The logos can be downloaded from the ONS claims website. Clicking on them should link either directly to the lab notebook or to the ONS claims page, where a brief description of the intended meaning can be found.

Labels: , , ,

Tuesday, March 03, 2009

Semi-automated measurement of solubility using NMR

Over the past few days Andrew Lang and I have been discussing ways of streamlining the measurement of non-aqueous solubilities using NMR. Inspired by David Strumfels' VBA code on Excel to automatically measure kinetics, Andy found a way to directly extract the integration values from the H NMR spectra hosted on our server in JCAMP-DX format.

We have set up a Google Spreadsheet (see ONSC-EXP062B for an example) that automatically calculates solubility based on information that the researcher provides. What is required:
  • A link to the NMR spectrum (the HTML file linking to the JCAMP-DX file)
  • Density and molecular weight of the solute
  • Density and molecular weight of the solvent
  • A range in the solute to integrate with the number of corresponding Hs
  • A range in the solvent to integrate with the number of corresponding Hs
The spreadsheet calculates the molar ratio of the solute to solvent then the molarity by making use of the assumption that the volumes of the two components are additive. The volume of solids is typically not available experimentally but ChemSpider gives a reasonable prediction. We have been using this assumption to convert published solubility values from g solute/ 100 g solvent to molar. In order to prevent taxing the server, once the measurements are computed they are stored in a database so that the spectrum and calculations don't have to be performed again.

The beauty of this approach is that there are no volume measurements. A saturated solution is made then, generally diluted in a deuterated solvent. When using an internal standard the volume of the saturated solution and the volume (or weight) of the standard must be known exactly. It is often difficult to micropipette some solvents and there is always the possibility of making an error in the handling of the micropipette. In general the fewer variables there are the more likely the results will be reproducible.

This is method can save a lot of time but it is not as automated as it could be. The densities must be looked up manually, although the molecular weight is automatically calculated from the common name using a web service by Rajarshi Guha run directly from within Google Spreadsheets. It also requires students to define solvent and solute ranges manually. All of the input cells are colored green, the output red and the intermediate calculations are in yellow.

However, once a range for a solute or solvent (and corresponding number of hydrogens) has been determined it can be used as a handy default and we will be collecting these and storing them in this sheet.

Does this mean that students don't need to think anymore?

Used properly, this system should actually elevate the level of thinking, in much the same way that the calculator did not remove the need for thought in data analysis. It just removed a lot of the tedium of manually calculating square roots and all of the associated sources of error in manual calculations of that type.

Students should use this tool to handle more measurements - faster - and think about their results in aggregate form. The ability to detect systematic errors becomes an essential skill to be developed. Also students need to spot problematic results quickly, for example where solvent and solute peaks overlap - or where there are baseline anomalies.

Of course even these last issues of quality can control can probably be automated to a large extent and we will report on this as we go. For example, it is conceivable that the NMR of a solute and solvent can be predicted or looked up automatically (on ChemSpider for example) and probable peak overlaps could be flagged. Software could also probably detect a mislabeling error.

At this point, we are getting closer to scientific progress by machine-to-machine communication on the free open read/write web. All we would need are a few groups around the world who see the value in endeavors such as this and donate a part of their NMR autosampling time. I am sure we could come up with simple ways of automatically converting files on their local computers to JCAMP-DX format and automatically upload them. We also need people to make up saturated solutions - but with the decoupling of tasks in this new workflow - these don't necessarily have to be the same people who process the NMR spectra.

Labels: , , ,

Sunday, March 01, 2009

Cedric Tchakounte is March09 Submeta ONS Award Winner

Cedric Tchakounte, a Biological Sciences and Biotechnology undergraduate student working under the supervision of Jean-Claude Bradley at Drexel University, is the March 2009 Submeta Open Notebook Science Challenge Award winner. He wins a cash prize from Submeta.

Cedric is focusing on NMR techniques to measure solubility. He has also done several experiments to verify the miscibility of liquid solutes in methanol. See his experiments here:

Six more Submeta ONS Awards will be made during 2009. Submissions from students in the US and the UK are still welcome.
For more information see:

Labels: , ,

Spectral Game update

The end of February 09 has come and gone and nobody hit the 100 points to win the molecular model kit I announced earlier for the Spectral Game. The highest score in that time period was 75.

Since my CHEM242 class is having 2 tests and one exam in the next 3 weeks I thought I would make the next prize available to them exclusively. The student from that class who scores highest by 9:50 Wednesday March 4, 2009 will win a molecular model kit. That happens to be the end of class on that day. The scores have been reset.

The game has been improved considerably during the past few weeks. A few security flaws were fixed, including modifying what metadata can be viewed via JSpecView and preventing the refresh button from selecting a new set of molecules. The game play was also changed to get increasingly more difficult over time, including adding more molecules and a timeout after the first set of ten spectra. This work was a collaboration between Andrew Lang, Antony Williams, Robert Lancashire and myself.

We are very excited by what we have put together so far. There are currently 457 H NMR, 389 C NMR, 11 IR and 29 NIR spectra. This is only possible because of people who submitted their spectra to ChemSpider as Open Data - please keep uploading!

The game has been played 1,824 times, viewing the spectra a total of 8,652 times - with a lot of curation by users. (If you see something wrong with a spectrum you can write a note and that helps us clean up the database). We have had 612 unique visitors from 37 different countries - a total of 13,919 page views in just over two weeks!

We now have a wiki with key links relating to the game. I also added the NMR notes from my CHEM242 class and we'll keep collecting resources. This could become a helpful resource to learn about NMR and practice it by playing the game.

Labels: , , ,

Creative Commons Attribution Share-Alike 2.5 License