Saturday, February 26, 2011

ONS Solubility Challenge Book cited in a Langmuir nanotechnology paper

An interesting application of the data from the Open Notebook Science Solubility Challenge has recently been reported in Langmuir: "Enhanced Ordering in Gold Nanoparticles Self-Assembly through Excess Free Ligands" by Cindy Y. Lau, Huigao Duan, Fuke Wang, Chao Bin He, Hong Yee Low and Joel K. W. Yang (Feb 24, 2011).

The context is as follows, and the reference is to Edition 3 of the ONS Solubility Challenge Book.
Although to our best knowledge there lacks literature value of OA solubility in the two solvents, the 10-fold better solubility of 1-otadecylamine (sic), the saturated version of oleylamine, in toluene than hexane is in line with our hypothesis.(33) This increased solubility caused the OA molecules that were originally attached to the AuNPs to gradually detach from the AuNPs, which is supported by our observations in poor AuNP stability and surface-pressure isotherms.
This is a nice application of solubility to understand and control the behavior of gold nanoparticles. It is in line with some of the applications I discussed at a recent Nanoinformatics conference, where I think there is a place for the interlinking of information between solubility and nanotechnology databases.

I have to admit that it is somewhat ironic to see this citation in Langmuir, given the controversy about a year ago (post and FF discussion) regarding the citation of non-traditional literature.

Monday, February 21, 2011

Alfa Aesar melting point data now openly available

A few weeks ago, John Shirley - Global Marketing Manager at Alfa Aesar - contacted me to discuss the Chemical Information Validation results I posted from my 2010 Chemical Information Retrieval class. Our research showed that Alfa Aesar was the second most common source of chemical property information from the class assignment.
We explored some possible ways that we could collaborate. With our recent report of the use of melting point measurements to predict temperature solubility curves, the Alfa Aesar melting point data collection could prove immensely useful for our Open Notebook Science solubility project.

However, since we are committed to working transparently, the only way we could accept the dataset is if it were shared as Open Data. I am extremely pleased to report that Alfa Aesar has agreed to this requirement and we hope that this gesture will encourage other chemical companies to follow suit.

The initial file provided by Alfa Aesar did not store melting points in a database ready format - it included ranges, non-numeric characters and entries reporting decomposition or sublimation. One of benefits we could provide back to the company was cleaning up the melting point field to pure numerical values ready for sorting and other database processing. This processed collection contains 12986 entries. Note that these entries are not necessarily different chemical compositions since they refer to specific catalog entries with different purities or packaging.

For our purposes of prioritizing organic chemicals for solubility modeling and applications we curated this initial dataset by collapsing redundant chemical compositions and excluded inorganics (including organometallics) and salts. We did retain organosilicon, organophosphorus and organoboron compounds. Because the primary key for all of our projects depend on ChemSpiderIDs, all compounds were assigned CSIDs by deposition in the ChemSpider database if necessary. SMILES were also provided for each entry, as well as a corresponding link to the Alfa Aesar catalog page. This curated collection contains 8739 entries.

For completeness, we thought it would be useful to merge the Alfa Aesar curated dataset with other collections for convenient federated searches. We thus added the Karthikeyan melting point dataset, which has been used in several cases to model melting point predictions. This dataset was downloaded from Cheminformatics.org. Although we were able to use most of the structures in that collection, a few hundred were left out because of some difficulty in resolving some of the SMILES, perhaps related to the differences in algorithms used by OpenBabel and OpenEye. Hopefully this issue will be resolved in a simple way and the whole dataset can be incorporated in the near future. This final curated collection contains 4084 entries.

Similarly the smaller Bergstrom dataset was included after processing the original file to a curated collection of 277 drug molecules.

Finally, the melting point entries from the ChemInfo Validation sheet itself, generated by student contributions, is added to amount to a collection of currently 13,436 Open Data melting point values. We believe that this is currently the largest such collection and that it should facilitate the development of completely transparent and free models for the prediction of melting points. As we have argued recently, improved access to measured or predicted melting points is critical to the prediction of the temperature dependence of solubility.

In addition to providing the melting point data in tabular format, Andrew Lang has created a convenient web based tool to explore the combined dataset. A drop down menu at the top allows quick access to a specific compound and reports the average melting point as well as a link to the information source. In the case of an Alfa Aesar source, a link to the catalog is provided, where the compound can be conveniently ordered if desired.
In another type of search, a SMARTS string can be entered with an optional range limit for the melting points. In the following example 14 hits are obtained for benzoic acid derivatives with melting points between 0C and 25C. Clicking on an image will reveal its source. (BTW even if you don't know how to perform sophisticated SMARTS queries, simply looking up the SMILES for a substructure on ChemSpider or ChemSketch will likely be sufficient for most types of queries).

Preliminary tests on a Droid smartphone indicate that these search capabilities work quite well.

Finally, I would like to thank Antony Williams, Andrew Lang and the people at Alfa Aesar (now added as an official sponsor) who contributed many hours to collecting, curating and coding for the final product we are presenting here. We hope that this will be of value to the researchers in the cheminformatics community for a variety of open projects where melting points play a role.

Labels: ,

Sunday, February 13, 2011

Predicting temperature-dependent solubility for solvent selection

During the summer of 2010 I reported on the Solvent Selector web service that Andrew Lang and I constructed. The idea was to flag potential solvents with a high solubility for the reactants and a low solubility for the product, so that the work-up would require a simple filtration.

If available, the Solvent Selector service uses measured solubility values. If not available, it attempts to predict the room temperature solubility using one of two models based on Abraham descriptors.

We now have modified the Solvent Selector so that it takes into account temperature. Andrew has inserted a thermometer icon next to each solvent in the report. When clicked, a plot is displayed over the entire range of temperatures where the solvent is a liquid. Curves for each starting material and the product are provided - and hovering over a data point provides numerical values.

There are several ways this resource could be used by chemists.

For reactions where some starting materials are not soluble enough at room temperature, the reaction could be carried out at a higher temperature. A higher temperature might also be desirable simply to speed up the reaction. Being able to predict the solubility of the product at that higher temperature would allow the course of the reaction to be monitored by the appearance of a precipitate.

For reactions where the solubility of the product is too high at room temperature, the curves could be used to estimate how low one could cool the reaction mixture without any chance that one of the starting materials would precipitate out. For example, consider the following Ugi reaction.


An optimization study was performed and found that methanol and ethanol provided much better yields than THF.(see JoVE article) This makes sense from a solubility standpoint, where the Ugi product room temperature solubility is less than 0.05 M for methanol and ethanol but is 0.26 M for THF Solvent Selector results)


However, by clicking on the thermometer icon for the THF entry, one gets the following temperature curves for solubility.

By hovering over the curve for the Ugi product we find that the predicted solubility in THF at -78C (conveniently a dry ice in acetone bath) is 0.01M, while that for the starting material boc-glycine is 0.65 M, well above the 0.5 M concentration at the start of the reaction. This means that even if the reaction did not take place to a significant extent, we would not expect the starting material to precipitate at -78C. Any precipitate should be the pure product.

Of course another obvious application is for solvent selection for re-crystallization.

How it works

For some time now we have been collecting literature on the temperature dependence of solubility in various systems (live Mendeley collection here - be patient it might take a minute to display). Although equations vary depending on the specific approach there seem to be the following commonalities:
  1. An assumption is made that miscibility is reached at the melting point of the solute.
  2. The log of the solubility is linearly proportional to the inverse of the temperature in Kelvin.
I have looked into a few examples that we have of solubility over a temperature range and the above do seem to hold. This means that with only a room temperature solubility and a melting point for a given solute, the solubility at any temperature can be interpolated or extrapolated. (The concentration at the miscibility point is calculated from the predicted density of the solute divided by its molecular weight).

Although there are situations where two liquids are not miscible because of extreme dissimilarity (e.g. methanol and hexane), for the most part our experience shows that the first assumption is valid. Also, we are not correcting for changes in density for the solute or solvent at different temperatures. Nevertheless, when only a single solubility measurement in a given solvent and a melting point are known, this simple model may prove to be of use as a rough guide for reaction design or re-crystallization. We'll report on its practicality over time as we put it to use.

Thursday, February 10, 2011

The Spectral Game with ChemDoodle

In the summer of 2009, we published an article on the Spectral Game. This game is based on spectra uploaded as Open Data (in JCAMP-DX format) on ChemSpider (currently about 2000 H NMRs and a few C NMRs, IRs and NIRs). Students get points by clicking on the molecule associated with the spectrum on display.

Although this has proved to be a useful tool to teach spectroscopy (especially H NMR), there have been some limitations, which are related to the use of Java (JSpecView) to provide an interactive display of the spectra.

1) Spectra do not display properly on Macs - there are problems with the "right-click" options in JSpecView. It took me a really long time to understand why some of my friends were really unimpressed by JSpecView. When I recognized that they were all Mac users I took a look and it became clear.

2) Spectra do not display at all on smartphones because of the Java components

I am very happy to report that these issues have been overcome (for the most part) using ChemDoodle. Through a collaborative effort between Kevin Theisen, Andrew Lang, Antony Williams and myself, we now have a non-Java based version of the Spectral Game at SpectralGame.com.



The game plays well on Mac, iPhone and iPad. However I have seen it fail on 2 Androids so there are still a few kinks to work out. Luckily I happen to be teaching NMR right now in my organic chemistry course so my students will be testing out the ChemDoodle version extensively.

There are some really nice additional features as well. My favorite is the auto-scaling of the integration line when zooming in. In the JSpecView version, integration is problematic because, when zooming into high field peaks, the start of the integration line does not reset to zero and this requires several iterations of changing the integration offset to get a usable measurement.

Another advantage in the ChemDoodle design is the simplicity of the interface. There are no right-click options: everything available is clearly labeled at all times (toggle integration, reset spectrum and view header information). This makes the game easier to learn and play.

I would especially like to thank Kevin Theisen for being so responsive on the ChemDoodle end. I was skeptical that we would have a playable game for this term but he addressed all of our major issues very quickly.

Labels: , ,

Creative Commons Attribution Share-Alike 2.5 License