Monday, September 28, 2009

A First General Solubility Model from ONS Challenge Data

After about a year, the Open Notebook Science Solubility Challenge has resulted in over 680 measurements, with about an additional 100 from the literature. Taking into account averaged repeated measurements, discarding some erroneous results and considering only organic solids (so far all of our liquid solutes have proven to be miscible in our solvents), that leaves us with 244 unique values.

Andrew Lang has created a general model (Model003) to predict solubility based on molecular descriptors of both the solutes and solvents. Previous models, such as Rajarshi Guha's Model002 were built only for selected solvents.

Predictions can be made from this web page by entering the SMILES of the solute and optionally the SMILES, dipole moment and dielectric constant of any solvent (convenient sources for these are Wolfram Alpha and Wikipedia). Boc-glycine with diethyl ether as an optional solvent is shown here.
The prediction service then looks up the relevant molecular descriptors from the CDK and makes predictions for some common solvents and the optional one if requested.

If the name of the solute was entered, the service will also report all of the experimental measurements for that solute from the ONS Challenge with links to the lab notebook pages.

There are a few objectives in making this public.

First, we think that it might provide some ideas about possible good or bad solvents for a given solute. The dataset is certainly not large enough to provide a truly general prediction of solubility in absolute terms. However, comparing relative values might be helpful in many cases. In the example above for boc-glycine, the model predicts that toluene would be the poorest solvent, which matches the order of the experimental values, even if the absolute values are not a close match. DMSO, THF, methanol and ethanol are predicted to be good solvents and this is reflected in the measurements.

Second, we want to make the model and data public so that other researchers with experience in this area can contribute their own models. We have been working with Marcin Wojnars from TunedIT to make it much easier for models to be submitted. Andy has just converted our dataset to ARFF format and it is available here. We should have more to report on this shortly.

By using molecular descriptors from the solvents we should be able to do predictions for solvent mixtures as well. At some point perhaps we can even include temperature.

The current model fits measurement with this type of distribution:
If we are able to build models automatically in real time after the addition of each data point, we should be able to set up automatic solubility measurement requests to minimize the amount of work it takes to improve each model. This is a step in that direction.

Labels: , , , ,

Friday, September 25, 2009

Cheminfo Retrieval First Class FA09

I gave my first lecture yesterday (Sept 24, 2009) for my Chemical Information Retrieval course at Drexel. One of my main objectives for the course is to provide the most current information about how to best find and review chemical information.

To this end, I set up a wiki (http://getcheminfo.wikispaces.com) which should become considerably enriched over the course of the term. I invited students to help contribute useful links to the resource page - and even before I finished giving the first lecture they added several really good ones. I also invite any chemists or librarians to add links to resources we may have missed. Just request to join the wiki to contribute.

The wiki will also be used for students to write a report on a chemical topic making use of cheminfo resources. Right after the lecture I made sure the students joined the wiki and created two pages: one for their report and one for a "research log". The idea is that students will report significant steps in conceptualizing their projects and how they are searching databases. I can then comment directly on their log pages for quick guidance. I suppose anyone with helpful suggestions that I missed could also comment - again just request an account on the wiki.

This class has traditionally required a written report. This term I'm adding a twist: the minimum number of words can be reduced somewhat if students elect to incorporate a multimedia or other creative component. To provide examples of what that might look like I visited Drexel Island on Second Life and demonstrated 3D molecules, interactive NMR spectra and a chemistry museum (from Sandy Adam). There is a lot of chemistry possible on Second Life (see Lang & Bradley) At the end of the tour on the island we visited a wildlife area recently built by Robert Brulle for a project related to environmental science (more on this in a later post). I got a hug from a panda and got sprayed when I tried to pet a skunk - just to give a taste of what kind of fun things can be constructed in a virtual world. Other projects could involve screencasts, Jmol, games, Facebook, etc. As long as it requires students to access chemical information, I am pretty open to ideas. Students will work through their ideas on their log page and the final product will also be available on the wiki. These projects could provide interesting examples for others interested in the topic of chemical information.

At the end of my lecture I provided a brief overview of the NaH oxidation controversy. There really could not be a better example of the importance of staying on top of new communication channels to follow and participate in chemical research. This year the most important of these new tools are probably blogs, wikis and FriendFeed. Next year it might be something else - Google Wave?


Labels: , , , , , ,

Wednesday, September 02, 2009

Jenna Mancinelli is Sept09 Submeta ONS Award Winner

Jenna Mancinelli, working under the supervision of Jean-Claude Bradley at Drexel University, is the September 2009 Submeta Open Notebook Science Challenge Award winner. She wins a cash prize from Submeta.

Jenna used both NMR and the sequential precipitation technique to obtain solubility data. See her experiments here:
http://onschallenge.wikispaces.com/list+of+experiments

One more Submeta ONS Award will be made during 2009. Submissions from students in the US and the UK are still welcome.
For more information see:
http://onschallenge.wikispaces.com
http://onschallenge.wikispaces.com/submetaawards08

Labels: ,

Creative Commons Attribution Share-Alike 2.5 License