Thursday, January 29, 2009

The ChemSpider Journal and ChemMantis

The ChemSpider Journal of Chemistry is about to go live. This is not just another chemistry journal. Not only does it boast the option of an open peer-review in addition to Open Access, but it takes us tantalizing closer to the promise of Web3.0: the semantic web. This is achieved by a sophisticated mark-up system generated by ChemMantis. The automatic identification of molecules is impressive enough. But it also marks up functional groups, reactions, spectral data and even biological entities.

For an example consider this article, which was actually a proposal that I wrote with Rajarshi Guha and Tony Williams. Simply by hovering over the marked-up text "Ugi reaction", it pulls up a brief summary from Wikipedia. What makes this semantic is that it already knows that this is a chemical reaction and not a molecule or a virus.

When you hover over the name of a molecule it knows to render it accordingly and provide appropriate links. Consider this example:
This makes the experience of reading a chemistry article a lot richer. But another payoff is coming in what machines will do when they are able to associate concepts instead of just text. Most importantly it does not require authors to do any extra work.

Tony has more information on his blog - and new submissions are welcome.

Labels: , ,

Solubility on Google Books and the ONS challenge

Andrew Lang has brought something very interesting to my attention: there are some solubility measurements on Google's Library project, providing access to the full text of books that are old enough to be part of the public domain. This means that certain measurements can be referenced directly and openly, without excluding people (and machines) who don't have subscriptions.

For example, Atherton Seidell compiled extensive collections of solubility information in 1907 and 1919. Specific pages can be referenced directly, which makes it easy to track upon inclusion into our Open Notebook Science solubility challenge.

Consider the solubility of benzoic acid in methanol at room temperature. Either using Rajarshi's drop-down menus or Andy's direct URL method, we find that there are currently 5 measurements, 4 from the ONS project and one from the Seidell compilation.

Clicking through the last link takes you directly to the table in the book where the information was obtained.
It is comforting to see that all the values are consistent. But to view all the measurements expressed as molar concentrations required a bit of fiddling. The Seidell book reports the solubility of benzoic acid in methanol at 23C as 71.5g/100g solvent. In order to convert to molar we have to make two assumptions.

First we assume that the volume of the solute and solvent are additive. This is clearly not the case but until someone proposes a better model that's what we will use. Second we must estimate the density of the solute. For a liquid that is not a problem but experimental values of the density of solid organic materials is scarce. Luckily, ChemSpider provides such estimates - and for benzoic acid that turns out to be 1.197 g/ml.

We're performing all the calculations directly in the SolubilitySum spreadsheet (columns N-W) and making these assumptions explicit. If anyone has a better model let me know.

Optimally, we would like to provide links to the public domain as much as possible when comparing our results with the literature. But that probably won't be the case. For example, many of the values for 4-nitrobenzaldehyde are from toll-access sources and the best we can do freely is provide a text citation: Maccarone, E; Perrini, G. Gazzetta Chimica Italiana vol 112 p 447 (1982).

By the way this particular search nicely showcases Andy's graphical representation of the temperature dependence of solubility in various solvents. Just scroll to the bottom.

Labels: , ,

Monday, January 26, 2009

Carmen Drahl visits my lab at Drexel

On Wednesday Jan 21, 2009 I had the pleasure of spending the day with Carmen Drahl from Chemical and Engineering News. She showed up bright and early an hour before my class (CHEM242) started at 9:00. And this was helpful because we had to get her computer set up on the Drexel wireless system.

That morning I was running a Second Life "race", where students compete to complete a series of questions about basic material they should recall from last term's organic chemistry course (CHEM241). Nobody completed the quiz within the duration of the class so there was no actual winner. Regardless of the motivation, the exercise of actually attempting these quizzes certainly can't hurt for the upcoming test this week. Carmen is to my right in this picture and the others next to the quiz obelisks are students:

Carmen spent the rest of the day trying to get an idea of what my students and I do on a typical day. She looked at the equipment we use to measure solubility and we discussed the Open Notebook Science Challenge in detail. This was a good opportunity to demonstrate the value of the interactive ways we have of querying the solubility data, mainly thanks to Rajarshi Guha and Andrew Lang. Finally I showed her how I use FriendFeed to collaborate and rapidly share scientific information with colleagues. This should make for an interesting read when it comes out in a few weeks.

Labels: , , , ,

Friday, January 16, 2009

ONS Solubility Challenge in Teaching Lab

Last year Brent Friesen started to experiment with using his organic chemistry teaching lab at Dominican University to carry out new reactions in the pursuit of making new anti-malarial agents, as part of the UsefulChem project.

This year, Brent is involving his class in the Open Notebook Science Challenge to measure the solubility of a variety of organic compounds in non-aqueous solvents (CHEM254). He took the time to write up a comprehensive laboratory handout on the topic so I'm hopeful that this exercise will go smoothly.

As we've discussed repeatedly during the past year the potential for university teaching labs to contribute real new science while training students is immense. It only requires instructors like Brent to take the time to make appropriate changes in the curriculum.

The lab is scheduled to run Jan 21-23, 2009.

Labels: , , , , ,

Wednesday, January 14, 2009

Interactive Visualization of ONS Solubility Data

Rajarshi Guha and Andy Lang have been very busy during the past few weeks developing visualization interfaces for the Open Notebook Science Challenge solubility data.

Information in a database is only as useful as the tools to explore it. We are striving to create interfaces that enable synthetic organic chemists to find actionable information in as intuitive a manner as possible.

To search for specific solubility measurements for a given solute in a given solvent (or to search for all solvents for a given solute), we have been using Rajarshi's handy web browser interface providing drop-down menus of available selections. The query results directly link back to the relevant pages from the laboratory notebook for further analysis. (For details of the coding see Rajarshi's blog)

We now have to ability to look for patterns in the dataset. Solubility measurements can now be plotted on a surface using Andy's service. First select a solvent, 2 molecular descriptors, a solubility cutoff value and the maximum point size from drop-down menus:

Then explore the chemical space.

In this example the solvent is methanol and molecular weight is plotted on the x axis and ALOGP is plotted on the y axis. Up to a maximum defined above, the size of the points relates to the value of the solubility, averaged from all available valid measurements. Measurements that have been judged invalid as marked as DONOTUSE in the SolubilitySum spreadsheet. That way researchers can investigate for themselves the reason for rejection of the data point. As an example, insufficient mixing time has been a cause of invalidation.

As synthetic organic chemists, we are mainly interested in how our reagents and products will behave in a given solvent. The compounds in this chemical space are starting materials for Ugi reactions and are color-coded by functionality. Red points are aldehydes and blue points represent carboxylic acids. We selected 2M as the cutoff point for point shape because it is a convenient concentration to use to mix reagents participating in the Ugi reaction, especially when considering automation. (see JoVE article) With this selection, disks are below 2M and diamonds are above.

Some interesting insights can be gleaned quickly from this plot. There are three low solubility disks among a group of diamonds near the top of the plot. The diamonds in this region represent mainly highly soluble aromatic aldehydes. By positioning the mouse over each point we can discover the details of each compound and its solubility. As shown, 2 of the disks are aromatic nitroaldehydes.

We may infer from this that methanol may not be a good solvent choice for nitroaldehydes in general. At the very least we can formulate the hypothesis and follow it up with additional measurements or re-investigate outliers. Such a pattern is difficult to observe when measurements are stuck in tables - or worse - only in lab notebook pages.

One of the limitations of a 2D plot like this is overlap of points. We can control this to some extent by making the points smaller but that doesn't eliminate the problem in all cases. One trick that we can use is introduction of a third molecular descriptor to separate points in 3D space.

Andy has done this recently in Second Life - here is a picture we took yesterday with our friend Viv on Drexel island (SLURL). The balls we are sitting on are the solubility points. The larger the ball the greater the solubility. Clicking on the balls opens a browser window to the measurements in the laboratory notebook.

Our eventual goal is to provide robust quantitative models to predict non-aqueous solubility. To see where we stand on that front see Rajarshi's recent report.

In the meantime I think we can continue to provide intuitive tools to get non-theoretical organic chemists to play with the visualization of solubility. Almost all organic reactions are performed in non-aqueous solvents so solvent selection is a very important part of the process of doing chemistry. This is especially important if one wants to engineer non-chromatographic product isolation.

Labels: , , , ,

Tuesday, January 13, 2009

CBCnews Article on Science2.0

Grant Buckler's Jan 8, 2009 article on "Science 2.0: New online tools may revolutionize research" just appeared on
Quotes by Michael Nielsen, Corie Lok, Benoit Pirenne, Eva Amsen and Jean-Claude Bradley. Open Notebook Science gets a mention:
"What we do is we make our lab notebooks open and available to the public pretty much in real time," says Jean-Claude Bradley, an associate professor of chemistry at Drexel University in Philadelphia and a pioneer of open-notebook science who gets about 200 visitors a day to the wiki that contains his lab notes.

Open notebooks let scientists see what others are doing and sometimes help each other with problems, Bradley says.

Labels: ,

Sunday, January 04, 2009

Jan 2009 Submeta Open Notebook Science Award Winner Announced

Khalid Mirza, a Ph.D. student with Jean-Claude Bradley at Drexel University is the January 2009 winner of the Submeta Open Notebook Science Challenge Award, which includes a one year subscription to Nature magazine and a cash prize. Khalid's contributions included the measurement of non-aqueous solubilities using both evaporation and UV-vis techniques:

Eight more Submeta ONS Awards will be made during 2009. Submissions from students in the US and the UK are still welcome.
For more information see:

Labels: , , , , ,

Creative Commons Attribution Share-Alike 2.5 License