Wednesday, November 26, 2008

First Submeta Open Notebook Science Award Winner

Jenny Hale, a Ph.D. student with Cameron Neylon at the University of Southampton, is the first of ten recipients of the Open Notebook Science Challenge Awards for December 2008. Open to students from the US and the UK who report their solubility measurements publicly as they work, the ONS Challenge Awards consist of a cash prize from Submeta and a one-year subscription to Nature magazine. Jean-Claude Bradley, Associate Professor of Chemistry at Drexel University, manages the award.

Friday, November 21, 2008

What is the Solubility of Vanillin in Methanol?

If we've learned anything in the past few months, we've learned that measuring solubility is really tricky.

The Open Notebook Science Challenge has generated 11 answers so far for the solubility of vanillin in methanol:
Rajarshi Guha has provided an extremely handy web query interface (must use FireFox) to generate these plots. It taps into live data from this GoogleSpreadsheet and links back to the specific experiments that generated the data.

Because we have access to the lab notebook pages, these measurements are not equal. Some of the measurements are based on reports where conditions that later turned out to be important were not reported or controlled. As we learn more about what is important many of these measurements will probably be removed and replaced with more reliable data.

But in the meantime, we're going to use the best possible estimate of the property that we have available. It lets Rajarshi feed his solubility models and gives us a tight iteration cycle between prediction and experiment. For this purpose, the average value of 3.5 M for the 11 measurements is probably good enough to be part of a training set to allow a rough prediction of solubility. As we get more confident over time we'll improve the model.

Right now, we're not quite ready to do predictions but we should be there soon. The main feedback we're getting now is which compounds we need to focus on to get to that minimum training set (Rajarshi says 50 compounds/solvent and we have about half that number for some solvents). It looks like we'll focus on aromatic aldehydes and aromatic carboxylic acids, mainly because many don't evaporate easily in the SpeedVac (one of the control parameters discussed earlier).

Another advantage of aromatics is that we can use UV spectroscopy to determine solubility without using evaporation. Hopefully in the coming weeks this will confirm what Jenny Hale has concluded today in ONSC-EXP011:
The results of the calculations give the solubility of vanillin in ethanol as 2.48 M and vanillin in methanol as 4.15 M. This finally gives excellent correlation with exp207, which measured the solubilities as 2.5 and 4.19 M respectively.
It appears that some compounds require significant time and agitation to reach saturation. In this last experiment Jenny carefully recorded what happens over the course of adding vanillin to methanol and periodically vortexing. Inspection of her log shows several points where someone might have assessed the solution to be saturated when it was just slow to dissolve. It also makes a case for always wearing safety goggles in the lab :)

At this point I am becoming more convinced that the solubility of vanillin in methanol is closer to 4.2 M. If that result is consistently obtained by other students and other methods (such as UV) using prolonged mixing times then we'll remove from the SolubilitiesSum spreadsheet the measurements that were obtained from experiments where the mixing time was less or simply not reported.

This evolution of this project also demonstrates the value of the ongoing open peer-review of an open lab notebook. The judges for the ONS challenge have provided feedback about future experiments, questioned assertions, pointed out omissions and suggested additional ways of thinking about the experiments. The contributions from the judges shows up in bold in the notebook pages and can be tracked over time by looking at the wiki page history.

Thursday, November 20, 2008

Nature Sponsors Open Notebook Science Challenge

I'm pleased to announce that the Nature Publishing Group will provide one year subscriptions of the Nature journal to the first three Submeta Open Notebook Science Award winners. The first award is expected to be announced December 1, 2008. The Open Notebook Science Challenge is an open call to crowdsource solubility measurements in non-aqueous solvents. Participating students from the US and the UK who meet eligibility criteria are welcome to apply for one of ten Submeta ONS Awards.

Thursday, November 13, 2008

From ONS to Peer Review: our JoVE Article is Published

Our article "Optimization of the Ugi Reaction Using Parallel Synthesis and Automated Liquid Handling" is now published on the Journal of Visualized Experiments (JoVE). I am very pleased with this because it showcases some interesting approaches to communicate science that were not possible not so long ago.

First, and foremost, this demonstrates that lab notebook pages and blog posts can be used to support claims made in a peer reviewed article. In a way this isn't drastically new since it has been possible for a while now to cite web pages in the peer reviewed literature. The key question is whether the reference is appropriate, regardless of its format. When providing a reference for a melting point or spectrum, nothing is more relevant that the lab notebook page where the specific batch of product was obtained and characterized.

Second, we have demonstrated that it is possible carry out research under Open Notebook Science conditions, write an article openly on a wiki, post it on a pre-print server (like Nature Precedings) and finally publish it in an peer reviewed journal. No, this won't work with every publisher. But if communicating science openly (beyond the confines of the regular Open Access model) is important to you, there are options out there that don't take anything away from the traditional system of academic validation.

Third, this is a good example of the use of video to enhance the communication of a protocol for a chemical reaction. But this is not a shortcut by any means. The process of writing a script and preparing for the shoot was very time-consuming because we were describing a whole workflow. When using video as raw data to record details of a specific experiment, it can actually save time that would otherwise be required to describe using text.

Finally, JoVE is an example of an Open Access journal with some Web2.0 capabilities, like the ability to leave comments and label them as agreeing or disagreeing with the authors. The final article can now also serve as a location for continuing the scientific conversation.

Monday, November 10, 2008

ChemSpider Talk at Drexel Nov 12 2008

Antony Williams is giving a talk on ChemSpider at Drexel University at 2:00 PM Wednesday November 12, 2008. (Disque 109, corner of 32nd and Chestnut streets, Philadelphia). Even if you attended his talk at Drexel a few months ago, come back to learn about the newest ChemSpider tools.
Crowdsourcing, Collaborations and Text-Mining in a World of Open Chemistry

There is an increasing availability of free and open access resources for scientists to use on the internet. Coupled with the increasing availability of Open Source software tools we are in the middle of a revolution in data availability and tools to manipulate these data. However, freedom costs and in many cases the cost is quality. ChemSpider has taken on the task of both robotically and manually curating publicly available data sources. This presentation will provide an overview of the issue of quality in many chemistry-related databases, approaches to cleaning up the data and how a curated platform can become the centralized hub for resourcing information about chemical entities. This includes experimental and predicted properties, analytical data, publications, suppliers and integrated databases. I will detail three efforts :1) the curation of chemistry on Wikipedia 2) an examination of structure integrity on the FDA Daily Med website, a web site of medication content and labeling as found in medication package inserts 3) recognizing chemical names in documents and providing a platform for structure-based searching of Open Access chemistry literature.


Thursday, November 06, 2008

Google Visualization API on ONS solubility data

Rajarshi has just tweaked his ONS solubility web query interface with Google Visualization tools to display the solubility of solutes in all available solvents in a chart, ranked lowest to highest. This kind of snapshot is perfect for finding possible errors and comparing duplicate runs or measurements using different techniques. The lab notebook page for any suspect measurement can be accessed by scrolling down the page and clicking on the reference link.

Currently this is only set up for selecting a solute. Give it a spin.

update: Rajarshi has a lot more details in this post

Wednesday, November 05, 2008

ONS Solubility Web Query

Rajarshi Guha created a web interface for the solubility data that we've been collecting in a GoogleDoc spreadsheet as part of the Open Notebook Science Challenge. There are convenient drop-down menus for solvents, solutes and you can even put in a range of solubilities. Duplicate runs should show up nicely for comparison. The solute molecules are also rendered as images.

In this example, Rajarshi is using an SQL database but it would be convenient to query the RDF data directly using similar types of interfaces.

And no matter how many ways we shuffle or reformat data there will always be a link to the original lab notebook page that generated each measurement. There isn't any reason to break the chain of provenance all the way to the peer-reviewed article when it is time.

Tuesday, November 04, 2008

Submeta Open Notebook Science Awards!

I am proud to announce that submeta is sponsoring TEN $500 (USD) Open Notebook Science awards as part of the ONS challenge to measure the solubility of compounds in non-aqueous solvents. Submeta follows Sigma-Aldrich as a sponsor for the project. Drexel University is managing the award distribution.

The idea was to make this available as widely as possible. However, because of legal issues, we were not able to make this open to everyone - only students from the US and UK are eligible. (see here for the complete rules).

I will be posting the list of judges shortly on the ONSchallenge wiki. They will be judging the process as much as the product and hopefully enable us to have a true ongoing peer-reviewed Open Notebook in chemistry.

This all came about from a FriendFeed conversation started by Bill Hooker a few months ago.

Drexel iSchool Open Notebook Science Talk

I'll be speaking on November 11, 2008 at 12:30 in Rush-014 at the Drexel iSchool. Based on the audience I'll focus more on the technology aspects of UsefulChem and go light on the chemistry. Thanks to Bob Allen for the invitation.

Open Notebook Science: a stepping stone toward automation of the scientific process

The communication of research in a transparent manner in as close to real time as possible has been termed "Open Notebook Science" (see Wikipedia entry for more details). The basic philosophy is that researchers keep to a minimum the amount of private information. By making available all relevant raw data and making the analysis transparent, the scientific process can migrate from validation heavily based on trust to one dependent upon proof. A case will be made that such an approach, coupled with zero read/write costs on the open web is ideal for further automation of the scientific method using distributed intelligence. As an example the UsefulChem project, an ONS initiative mainly focussed on the synthesis of anti-malarial compounds, will be described. The project makes use of free hosted tools as much as possible so that the infrastructure can be easily replicated by other research groups. For example, CDD is used to store assay results, ChemSpider is used to store characterization information and enable chemical database queries, Wikispaces is used to record the laboratory notebook, Blogger is used to summarize research progress and GoogleDocs are used to store tabular data. Such an open architecture is conducive to productive collaboration between groups of complementary competency. For example, the design, synthesis and testing of novel anti-malarial agents, bringing together groups from Indiana University, Drexel University and UCSF, will be detailed.

