Wednesday, April 28, 2010

Reaction Attempts Book Edition 1 and UsefulChem Archive

I am pleased to report that Andrew Lang and I have published the first edition of the Reaction Attempts book. It currently contains most of the Ugi reactions from the UsefulChem project and is associated with an April 27, 2010 snapshot archive of the entire UsefulChem project, including NMR spectra, spreadsheets, images and the entire lab notebook from Wikispaces.


At 582 pages the printing cost from LuLu amounts to $26.28. Not meant to replace electronic searches, it should prove to be a handy reference book for the lab to quickly browse through what was attempted for a given reactant, what the outcome was and the researcher involved.

We are hoping to include reaction attempts from other groups in future editions. More details can be found in the preface, reproduced below:

Reaction Attempts First Edition

Data Source: the UsefulChem project

Introduction

Open Notebook Science (ONS) refers to the practice of making the full contents of a laboratory notebook and all associated raw data files available in near real time.[1] This represents an opportunity for everyone to benefit from work in progress in an open research group. However, in order to make use of the information, it must be easily discoverable. A simple strategy to increase discoverability is redundancy over multiple communication platforms.

In another project - the Open Notebook Science Solubility Challenge[2] - we published non-aqueous solubility data in the form of physical and downloadable (PDF) books.[3] Although it is possible to search the solubility database using web query interfaces, exploration of a Google Spreadsheet, an XML feed, etc.[4], having a physical copy in the laboratory has proved to be very convenient in several instances. A similar format for reactions will also be useful.

The UsefulChem Project

UsefulChem started in 2005 as an organic chemistry Open Notebook Science project with a main goal of discovering new anti-malarial agents that can be prepared by simple and cheap syntheses.[5] Most of the reactions on UsefuChem are Ugi reactions, which involve the mixing of an amine, aldehyde, carboxylic acid and isonitrile in a solvent at room temperature generally for a few hours to days.[6] The multicomponent design of the Ugi reaction and the simple reaction conditions make it ideal for exploring large virtual libraries and selecting compounds of interest to make.[7]

Isolation of the Ugi products can be immensely simpler, cheaper and readily scalable if they precipitate in pure form from the reaction mixture. To this end, much of the research in the UsefulChem project focuses on reaction conditions that lead to this outcome.[8] This is in fact the origin of the ONS Solubility Challenge discussed above.[9]

The Reaction Attempts Database

In order to look for patterns in the reaction conditions which led to Ugi product precipitation, the CombiUgiResults Google Spreadsheet was set up.[10] Reactions indexed there can be sorted by precipitation outcome, solvent, reactant, concentration, etc. and links to the laboratory notebook pages can be followed for full details. However, this sheet is designed specifically for Ugi reactions and contains columns specifically for the aldehyde, amine, carboxylic acid and isonitrile.

In order to enable the tracking of other types of reactions, the information in the CombiUgiResults sheet was reformatted into two other sheets: ReactionAttempts[11] (containing reagents and reactants) and RXIDsReactionAttempts[12] (containing reaction conditions and results, such as solvent, concentration of limiting reactant, appearance of a precipitate, yield, etc.). The two sheets are connected via the use of a common ReactionID. This format permits the representation of any type of reaction, with an unlimited number of reactants and products.[13]

By definition, any Open Notebook Science project in a work in progress. The listing of a reaction in this database only means that the researcher attempted or is in the process of attempting it. Whatever the situation, a link to the laboratory notebook page is provided, where the most recent information is available. The philosophy used here is that partial information is always better than no information at all. Thus a researcher investigating the prior use a particular reactant in a Ugi reaction might find the report that a precipitate was obtained in methanol helpful for designing their own reactions, even if the characterization of the precipitate is still pending. At the very least, knowing that a certain researcher has at least attempted a similar reaction is enough information for initiating a discussion, which may lead to valuable insights.

Reaction Attempts on Chemspider

Although SMILES[14] are provided in the spreadsheets, the primary key to identify compounds is the ChemSpider ID (CSID)[15]. This allows us to render molecule images in the book automatically. In the case of the ONS Solubility Challenge book[3], use of the CSID enables a convenient way to calculate various descriptors for displaying values in the book.

In addition, the compounds in the Reaction Attempts database are indexed on ChemSpider as two Data Sources: ReactantsAttemptedReactions and ProductsAttemptedReactions[13]. In this way a substructure search for either reactants or products will identify indexed molecules. Clicking on the Syntheses tab in the ChemSpider record for a selected molecule will then reveal a list of hyperlinks to the relevant laboratory notebook pages.

Organization of the Book

In keeping with the layout of the ONS Solubility Challenge Book, the reactants are listed in alphabetical order. Each entry displays the list of reactions where the reactant was used. This includes a scheme with all reactants and product as well as key metadata: the researcher, reaction type, solvent, limiting reactant concentration, observation of a precipitate, comments and a reference (links to the laboratory notebook page).

In this edition, only Ugi reactions are included. The reaction schemes are laid out in the following order: carboxylic acid, amine, aldehyde and isonitrile. This should allow for easy comparison between schemes within a given record. Reactions where the Ugi product was isolated and characterized are marked with a green check and the percent yield is noted. Since the Ugi products do not have simple common names, they are not included as separate entries. However, all reactions where the synthesis of a specific Ugi product was attempted can be found by looking up the entries for any of the four reactants.

Although this compilation is not exhaustive, it does cover the vast majority of reactions in the UsefulChem project to date. Future editions will include other reactions from UsefulChem and other sources.

Archive

This edition is linked to the UsefulChem data archive (ZIP)[16], (DVD)[17] and interactive hosted archive format[18], ReactionAttempts (XLS)[19] and RXIDsReactionAttempts(XLS)[20] taken on 2010-04-27.

References

1. Open Notebook Science Wikipedia Entry http://en.wikipedia.org/wiki/Open_Notebook_Science
2. Open Notebook Science Solubility Challenge Wiki http://onschallenge.wikispaces.com
3. Bradley, J.-C. First Edition of ONS Solubility Challenge Book UsefulChem Blog (2009)
http://usefulchem.blogspot.com/2009/12/first-edition-of-ons-solubility.html
4. Open Notebook Science Solubility Challenge List of Experiments page http://onschallenge.wikispaces.com/list+of+experiments
5. UsefulChem Wiki http://usefulchem.wikispaces.com
6. Ugi Reaction Wikipedia Entry http://en.wikipedia.org/wiki/Ugi_reaction
7. Dömling, A., & Ugi, I. (2000). Multicomponent Reactions with Isocyanides. Angewandte Chemie International English Edition, 39(18), 3168-3210. http://www3.interscience.wiley.com/journal/73500473/abstract.
8. UsefulChem List of Experiments http://usefulchem.wikispaces.com/All+Reactions
9. Bradley, J.-C. Open Notebook Science Challenge UsefulChem Blog (2008)
http://usefulchem.blogspot.com/2008/09/open-notebook-science-challenge.html
10. CombiUgiResults Google Spreadsheet http://spreadsheets.google.com/ccc?key=plwwufp30hfpUERhse9y5Kw
11. ReactionAttempts Google Spreadsheet
http://spreadsheets.google.com/ccc?key=0Ak1R8T6wt4YQdG9NejNLcDNUMkVBVURGM01TR0NxdXc
12. RXIDsReactionAttempts Google Spreadsheet
http://spreadsheets.google.com/ccc?key=0Ak1R8T6wt4YQdGVENVFMWjdzaGd2REJTTnA4RG5vblE
13. Bradley, J.-C. Reaction Attempts on ChemSpider UsefulChem Blog (2010)
http://usefulchem.blogspot.com/2010/03/reaction-attempts-on-chemspider.html
14. SMILES Wikipedia Entry http://en.wikipedia.org/wiki/Simplified_molecular_input_line_entry_specification
15. ChemSpider Web Site http://www.chemspider.com/
16. UC archive Drexel server (ZIP) http://showme.physics.drexel.edu/usefulchem/archives/usefulchem2010-04-27.zip
17. UC archive on lulu.com (DVD) http://www.lulu.com/product/dvd/usefulchem-archive/10791847
18. UC interactive hosted format http://showme.physics.drexel.edu/usefulchem/archives/usefulchem2010-04-27/All%20Reactions.html
19. Bradley, J.-C.; Lang, A.. Reaction Attempts Reactants and Products. UsefulChem. 2010-04-27.
(Archived by WebCite® at http://www.webcitation.org/5pIsFEbT9)
20. Bradley, J.-C.; Lang, A.. Reaction Attempts RXIDs. UsefulChem. 2010-04-27.
(Archived by WebCite® at http://www.webcitation.org/5pIs2eh62)

Labels: , , , , ,

Tuesday, April 20, 2010

ONS Books Wiki

I recently reported on our use of Nature Precedings to archive different editions of the ONS Solubility Challenge book. One of the advantages is that Precedings automatically alerts visitors if more recent editions exist.

However, today I learned that there is a glitch to this system: it is not possible to link individual versions on Precedings to a corresponding book edition on LuLu. That means that if you find yourself on the Nature Precedings entry and want to order the book from LuLu it isn't obvious at all how to do so.

To resolve this issue once and for all I just created a wiki page (ONSbooks.wikispaces.com) to track every edition of the book. This is actually better because I can also provide links to all the available data archives and blog posts corresponding to each edition.

This is also the page where we will keep track of every edition of other Open Notebook Science books. The next one to be published shortly is for the UsefulChem project.

Labels: , , ,

Thursday, March 04, 2010

Nature Precedings as an Archiving Tool for ONS Solubility Book

The issue of archiving and citation is a topic that is usually raised whenever I give a talk about Open Notebook Science. We have recently tried to address this using several complementary strategies.

The publication of a book containing a snapshot of all the values obtained from the Open Notebook Science Solubility Challenge has turned out to be a convenient mechanism. By using LuLu, the book can be either downloaded for free as a PDF or ordered as a physical copy for just the printing and shipping charges.

However, Lulu does not have a convenient method of keeping track of different editions of the book and it is unclear how to best cite them.

Nature Precedings solves both of these problems quite nicely. I have uploaded the PDF of each book edition to NP and the versions are automatically linked to each other. In fact if you try to access an older edition, NP pops up a warning that a more recent version is available with the corresponding link (see image below).

Precedings also provides information about how to cite the document, including a DOI for each version. Unfortunately it appears that it can take some time for the DOIs to resolve. Links to different versions can also be formatted like this:
http://precedings.nature.com/documents/4243/version/1
http://precedings.nature.com/documents/4243/version/2
http://precedings.nature.com/documents/4243/version/3
Links to the Lulu version of each book are also provided, which is convenient for anyone who might want to order a physical copy.

At this time Precedings does not accept zip files containing the full archive of the source files for each book version - although a link to the archive is provided in the preface of the book. We have found that our library's DSpace repository is a convenient location for these.

Labels: , , , , ,

Friday, February 12, 2010

ONS Solubility Book: Edition 3 with Notebook Archive

Edition 3 (2010-02-11) of the ONS Solubility Challenge book is now available.

We've been trying for some time to find a way to conveniently take a snapshot of our Open Notebooks and all associated raw data files. This could serve as a way to back up all of our work as well as provide a means of finding out the state of knowledge for a project at a given moment in time. There is also a tremendous benefit to confidently using the best of free hosted Web2.0 services out there (e.g. GoogleDocs and Wikispaces) without being concerned with changes in policies or access down the road.

Our recent use of the ONS Challenge Solubility book to periodically create releases of summarized data has opened up a convenient opportunity. And yesterday the last piece of the puzzle fell into place. Through a combination of fairly quick manual and automated tasks, Andrew Lang and I are able to push out a full snapshot of all relevant files and lab notebook pages and associate it with an edition of the book.

As described below, the archive is accessible interactively on a server, as a zip download or as a CD from LuLu. Perhaps we can also find a home on library servers in the future.

More details are provided in the preface for Edition 3 (2010-02-11):
This is the first edition to include a full archive of the ONS Challenge notebook. A space export from Wikispaces provides an initial version of all the HTML pages in the notebook with local hyperlinks to copies of all images and files uploaded onto the wiki. All of the Google Spreadsheets are automatically downloaded as Excel spreadsheets and placed in the same "files" folder as the images. NMR spectra, stored as JCAMP-DX files, are placed in the "spectra" folder. All of the HTML pages are reformatted to provide local references to both Excel spreadsheets and the JCAMP-DX files.

The notebook archive is meant to represent a snapshot of the state of all source documents at the time of the publication of an edition of this book. When used from a server with web services running, clicking on links to the spectra will allow interaction via a browser interface, including zooming in or out and integration of the NMR spectrum. When accessed in stand-alone mode after downloading or directly from a CD, everything will work the same, except that JCAMP-DX files must be open from JSpecView running on the desktop. Excel files will retain any calculations in the cells of the original Google Spreadsheets but dynamic values generated from calling web services - such the script that automatically integrates NMR spectra - will be frozen as simple values. However the link to the web service used will be stored in the cell as a comment. Links to external websites are not crawled and embedded Google Spreadsheets or videos are not copied. These will work but will reflect live data on the web.

The February 11, 2010 version of the notebook archive is available on a hosted site, on a CD or by download.

Labels: , , , , , ,

Saturday, December 12, 2009

First Edition of ONS Solubility Challenge Book

Andrew Lang and I have been working on a book version of the Open Notebook Science Solubility Challenge database. The timing is good since we just awarded the last ONS Challenge Submeta award this month. All of the students, judges and educational partner are included as co-authors. A biography and picture of everyone is included in the book.
Jean-Claude Bradley, Associate Professor of Chemistry at Drexel University
Cameron Neylon, Senior Scientist at the ISIS Pulsed Neutron Source, Rutherford Appleton Laboratory and Lecturer in Chemical Biology at the School of Chemistry at the University of Southampton
Rajarshi Guha, Research Scientist at the NIH Chemical Genomics Center
Antony Williams, Vice President of Strategic Development, ChemSpider at the Royal Society of Chemistry
Bill Hooker, Postdoctoral Researcher in Molecular Biology
Andrew Lang, Professor of Mathematics at Oral Roberts University
Brent Friesen, Associate Professor of Chemistry at Dominican University
and
Tim Bohinski, David Bulger, Matthew Federici, Jenny Hale, Jenna Mancinelli, Khalid Mirza, Marshall Moritz, Daniel Rein, Cedric Tchakounte, and Hai Truong
We selected LuLu as a convenient mechanism to distribute copies. This 6 x 9 inches black and white soft cover edition is available for $5.96, which just covers the printing and shipping charges. Other formats are possible - such as a larger hardcover in color - but these are much more expensive. We thought it would be good to start with the most affordable version and look at other options later. The electronic version of the book is available for free on LuLu.

We were inspired by the style of the solubility book published by Atherton Seidell in 1919, freely available on Google Books. The compound entries are listed in alphabetical order, with tables of compound data and solubilities. We included data that we found to be useful for practical applications, including predicted density, room temperature phase and the solubility in molarity, mole fraction and g/100g solvent. References link to lab notebook pages or literature references.

Andy found a way to create the fully formatted book in an almost completely automated way, pulling the data directly from the Solubilities Summary and other Google spreadsheets and querying ChemSpider. The preface and biographies of the students, judges and educational partner are also automatically pulled in from Google Docs. With this system in place, it will be straightforward to publish future editions with the most updated information frequently.

This was also a good opportunity to make use of the WebCite service. It enables us to link the book to a frozen version of the Solubilities Summary sheet archived as an Excel spreadsheet. This format retains all the formulas and hyperlinks in the original Google Spreadsheet.

The preface further explains the scope of the book and project:

The Open Notebook Science Solubility Challenge

Solubility is an important consideration for many chemistry applications. Synthetic chemists usually use a solvent to perform reactions and knowledge of the solubility of the starting materials or products can be very useful to pick an appropriate solvent. Analytical chemists can use solubility to design separation techniques and factor in dynamic range considerations. Physical chemists can create and evaluate their models of how molecules interact in the solubilization and precipitation processes.

Solubility data can be obtained from a variety of online and offline sources. As with all chemical data, it can be a challenge to evaluate reported measurements. Some databases offer no references while others provide citations to peer reviewed journal articles. Given the choice, more weight is generally given to the latter. This is reasonable in most cases because more information about the purity of compounds and the methods used are available in peer-reviewed articles.

However, the information for how a specific measurement was obtained within a journal article is not generally provided. General methods are provided but the raw data for a specific measurement are typically not published. Peer review is not intended to validate individual measurements - its function is to ensure that the authors made appropriate conclusions based on their processed datasets and the state of knowledge in the field.

The Open Notebook Science Challenge was initiated in the fall of 2008 as the result of a discussion on a train in the UK between Jean-Claude Bradley and Cameron Neylon.[1,2] The concept was very simple: create a crowdsourcing opportunity for the chemistry community to contribute solubility measurements under Open Notebook Science conditions. This method of publication entails providing immediate public access to the chemist's laboratory notebook, as well as all raw data used to compute the measurements.[3,4]

On Sept 3, 2008 the first ONSC measurements were recorded by Bradley and Neylon at the University of Southampton in Neylon's laboratory.[5] The project was soon sponsored by Submeta, offering ten $500 awards for students in the US or the UK who best recorded how they performed their experiments.[6] Furthermore, the first 3 winners also received one year subscriptions to Nature magazine, thanks to a sponsorship from the Nature Publishing Group.[7] Sigma-Aldrich supported the contest by donating chemicals upon request.[8]

Students were evaluated by a group of judges who convened once a month to deliberate the next award. Judges also provided feedback to the students by commenting on their lab notebook pages directly on the wiki. Their expertise ranged from chemistry to mathematics, spectroscopy and molecular biology.

Techniques

Participants in the ONS Challenge were not required to use a specific method to measure solubility - although they were required to properly document their experiments and analyses. Due to its simplicity, most measurements in the past year were made using the SAMS NMR technique, requiring no volume measurement or calibration curves.[9] Two assumptions are made with this method. The first is that the volume of solute and solvent are additive, with the error becoming negligible at low solubility values. The second is that NMR integration values are proportional to the amount of solvent and solute. Some deviations from this have been observed for default NMR parameters and in later experiments long relaxation times are introduced into the protocol (D1 = 50s).[10]

Data Curation

Since an Open Notebook approach is used in this work, those interested in the validity of the measurements can assess the methods used - both for the preparation of saturated solutions and the raw data from the measurements. Over time, values in the database are likely to improve and possibly some errors may be uncovered and corrected. However, on the whole, we feel that the values provided in this work should be of use to chemists trying to gain an appreciation of solubility for most applications. This is especially the case for values that are not obtainable from any other source.

When clearly erroneous data points are discovered, they are flagged in the database as "DONOTUSE". This way interfaces with the dataset can ignore these values while allowing anyone to investigate why the data points were flagged. This might happen when early experiments did not allow for sufficient mixing or NMR D1 relaxation times were long enough to fully integrate peaks of interest. Out of 681 reported measurements, 51 are currently marked in this way. A shared Google Spreadsheet is used to collect and curate the dataset. This allows easy data entry while providing a simple way to interrogate the database for visualization applications via the Google API.[11]

Literature data and format conversions

An additional 400 solubility measurements from the literature are included in the database. These generally correspond to compounds that are structurally identical or similar to the compounds measured by the ONS Challenge participants. These values are averaged in with the values from the participants, with appropriate references provided. In order to compare values, conversions from molar fraction or g solute/100g solvent to molarity were made by assuming that the volumes are additive and obtaining the density of the solutes in most cases from the predicted values in ChemSpider.[12]

For the convenience of chemists with diverse applications, all three formats are provided. For the cases where solutes are miscible with the solvent, the molarity reported is simply the solute's density. The practical interpretation of this is that solutions of any molarity below the solute's density can be prepared.

In the process of converting units and averaging heterogeneous data sources, no attempt has been made to track significant figures. Those interested in any information about the precision of measurements should consult each individual data source. This may not be an easy task for measurements only carried out once and where factors such as the quality of spectral peaks and baselines are not optimal.

This collection will be most valuable for those who do not require highly precise measurements for their applications. For example, synthetic chemists can easily use rough estimates of solubility to select appropriate solvents for a reaction. In any case, one would be wise to consider all measurements as provisional, regardless of the source. As more data are collected, subsequent editions of this book will adjust values accordingly.

Searching the database

The values in this database can be accessed and filtered in various ways. More information is available at the ONS Challenge wiki[13] and Chapter 16 of the book "Beautiful Data".[14]

Database version

Archived as Excel Spreadsheet by WebCite on December 11, 2009.[15]

References

[1] Bradley, JC Open Notebook Science Challenge, UsefulChem blog (2008) http://usefulchem.blogspot.com/2008/09/open-notebook-science-challenge.html
[2] Open Notebook Science Challenge Wikipedia entry http://en.wikipedia.org/wiki/Open_Notebook_Science_Challenge
[3] Bradley, JC Open Notebook Science, Drexel CoAS E-Learning Blog (2006) http://drexel-coas-elearning.blogspot.com/2006/09/open-notebook-science.html
[4] Open Notebook Science Wikipedia entry http://en.wikipedia.org/wiki/Open_Notebook_Science
[5] Bradley, JC; Neylon, C UsefulChem Experiment 207 http://usefulchem.wikispaces.com/Exp207
[6] Bradley, JC Submeta Open Notebook Science Awards, UsefulChem Blog (2008) http://usefulchem.blogspot.com/2008/11/submeta-open-notebook-science-awards.html
[7] Bradley, JC Nature Sponsors Open Notebook Science, UsefulChem Blog (2008) http://usefulchem.blogspot.com/2008/11/nature-sponsors-open-notebook-science.html
[8] Bradley, JC Sigma-Aldrich First Official Sponsor of Open Notebook Science Challenge, UsefulChem Blog (2008) http://usefulchem.blogspot.com/2008/09/sigma-aldrich-first-official-sponsor-of.html
[9] Bradley, JC Semi-Automated Measurement of Solubility, UsefulChem Blog (2009) http://usefulchem.blogspot.com/2009/03/semi-automated-measurement-of.html
[10] Bradley, JC NMR Integration Progress for Solubility Measurements, UsefulChem Blog (2009) http://usefulchem.blogspot.com/2009/06/nmr-integration-progress-for-solubility.html
[11] Bradley, JC Interactive Visualization of ONS Solubility Data, UsefulChem Blog (2009) http://usefulchem.blogspot.com/2009/01/interactive-visualization-of-ons.html
[12] ChemSpider database http://www.chemspider.com
[13] ONS Challenge List of Experiments Page http://onschallenge.wikispaces.com/list+of+experiments
[14] Bradley, J.-C.; Guha, R.; Lang, A.S.I.D.; Lindenbaum, P; Neylon, C.; Williams, A.J. & Willighagen, E. Chapter 16: Beautifying Data in the Real World from Beautiful Data. O'Reilly Media, Eds: Segaran, T. & Hammerbacher, J. (2009)
[15] Bradley, Jean-Claude; Lang Andrew. Solubilities Summary Sheet. Open Notebook Science Challenge. 2009-12-11. URL:http://spreadsheets.google.com/pub?key=plwwufp30hfq0udnEmRD1aQ&output=xls. Accessed: 2009-12-11. (Archived by WebCite® at http://www.webcitation.org/5lx5ry3BV)


Labels: , , , , ,

Creative Commons Attribution Share-Alike 2.5 License