Thursday, August 26, 2010

Open Notebook Science in Drug Discovery at Opal Event

I presented on "Open Notebook Science in Drug Discovery" on August 24, 2010 at a panel on Industry and Academia part of the Opal Event "Drug Discovery: Easing the Bottleneck".
I only had about 15 minutes to present so I could not go into much detail but I did want to highlight the most recent work Andrew Lang and I (also with Peter Li from ChemTaverna) carried out involving solubility prediction and web services. Most of the attendees were from industry and I appropriately used the recent GSK malaria data sharing to introduce the talk. It is clear that there is a role for Open Science in drug discovery and I think that industry involvement will continue to increase in this area.

My co-panelist Rathindra Bose from Ohio University presented on his group's development of a novel cancer treatment compound based on platinum. He made the point that academic research complements that from industry by being able to explore more speculative hypotheses. The dominant hypothesis for the mechanism of action of platinum based drugs is binding with DNA. By exploring alternative scenarios, his group found an active platinum drug that does not bind with DNA.

During the preceding session on the Emergence of Biologics in Drug Discovery, Albert Giovanella from the University of Pennsylvania School of Medicine gave a particularly enlightening talk about comparing biologics with small molecule drugs. Although biological drugs tend to have less toxicity, the overall cost to bring them to market is still quite high and their cost to the consumer may be so high as to limit their impact. It looks like it will not be generally easy to translate new biomedical knowledge to a widespread impact on human health.

Labels: , , ,

Monday, August 23, 2010

ChemTaverna Workflows of ONS Web Services now on MyExperiment

I'm pleased to report that one of the collaborations initiated at the Berkeley Open Science conference last month is progressing very well.

Carole Goble introduced me to Peter Li who runs the ChemTaverna project. The idea was to use Taverna to construct workflows using the web services developed by Andrew Lang for our Open Notebook Science projects: UsefulChem and the ONS Solubility Challenge.

Peter quickly created several workflows to demonstrate what is possible. Here is a workflow that uses a Google Spreadsheet as input. SMILES for amines, carboxylic acids, aldehydes and isonitriles are entered in the appropriate columns. The workflow first creates a virtual library of Ugi products from all possible combinations of reactants. Then each product is submitted to a web service that predicts the solubility in methanol, the most common solvent for Ugi reactions.

The resulting spreadsheet can then be sorted by predicted solubility to recommend products that are more likely to precipitate from the reaction mixture. In this particular example Ugi products derived from boc-glycine are predicted to have a low solubility in methanol. The least soluble compound is predicted to have a solubility of only 0.07MIn this library, Ugi products derived from boc-methionine are predicted to be too soluble to precipitate. For example this Ugi product has a predicted solubility of 3.7 M.
(note: ChemSpider has a tendency to draw the minor tautomer for some amides and carbamates)

There are a few issues to take into consideration in order to use this particular workflow:

1) This will only work on Taverna Workbench 2.1.2 with these plug-ins installed. At one point it will be made to work on Taverna Workbench 2.2 and uploaded onto MyExperiment. The workflow used here is currently available here.

2) The SMILES in the input Google Spreadsheet must be written in the format of the current example (aldehyde, amine and isonitrile groups on the left and carboxylic acid groups on the right)

3) All of the Ugi products in the virtual library must already exist in ChemSpider. Otherwise, the solubility predictions will fail because of missing descriptors as discussed previously.

Peter has uploaded simpler workflows onto MyExperiment that are compatible with the current version of Taverna Workbench (v2.2).

First, the generation of Ugi product libraries from reactant SMILES in a Google Spreadsheet is available here.

Another workflow handles the prediction of Abraham descriptors.
This workflow processes the prediction of solubility for a given solute and solvent.

The main rationale for incorporating web services derived from our Open Notebook Science projects into Taverna is leverage. MyExperiment already benefits from a vigorous community of developers in the bioinformatics arena. With the growth of the ChemTaverna initiative, the integration of cheminformatics and bioinformatics workflows should become seamless.

By making our solubility and chemical reaction web services available in formats that are convenient for others to use it increases the opportunities that our work will be actually useful. It also makes it easier for us to leverage the resources made available by others for our own applications in drug discovery and reaction design.

Essentially this means that we have extended the reach of the information cascade triggered by the recording of an experiment in a laboratory notebook and a very simple abstraction process to represent that experiment in a semantically addressable format.

Labels: , ,

Wednesday, August 18, 2010

ONS Challenge Solubility Data used in COSMO-RS Tutorial

Update: Andreas Klamt pointed out to me that the best source of his software is the COSMOtherm package from COSMOlogic.

It is really interesting to see how our solubility data from the ONS Challenge are being used. In this example, our measurements for vanillin in various solvents are being compared to predicted values in a tutorial for the commercial software package COSMO-RS.

Monday, August 16, 2010

Green Solvent Metric on Solvent Predictor

In the spirit of contributing to Peter Murray-Rust's initiative to collect Green Chemistry information, Andrew Lang and I have added a green solvent metric for 28 of the 72 solvents we include in our Solvent Selector service. The scale represents the combined contributions for Safety, Health and Environment (SHE) as calculated by ETH Zurich.

For example consider the following Ugi reaction solvent selection. Using the default thresholds, 6 solvents are proposed and 5 have SHE values. Assuming there are no additional selection factors, a chemist might start with ethyl acetate with a SHE value of 2.9 rather than acetonitrile with a value of 4.6.

Individual values of Safety, Health and Environment for each solvent are available from the ETH tool. We are just including the sum of the three out of convenience.

Note that the license for using the data from this tool requires citing this reference:

Koller, G., U. Fischer, and K. Hungerb├╝hler, (2000). Assessing Safety, Health and Environmental Impact during Early Process Development. Industrial & Engineering Chemistry Research 39: 960-972.

Labels: ,

Friday, August 06, 2010

The Reaction Attempts Solvent Selector

The ONS Solubility Challenge and the Reaction Attempts project have now been integrated with code written by Andrew Lang to the point that recommendations for solvents are just a click away.

First use the Reaction Attempts Explorer either using the drop-down menus or substructure search as described previously. When a reaction of interest is identified just click on the the link for "Optimal Solvent Prediction".

The service will then provide a summary of solubility measurements and predictions, organized by the default criteria of minimum 0.3 M solubility of reactants, maximum 0.03 M solubility of the product and maximum solvent boiling point of 100 C. Liquid reactants (or reactants with melting points within 15 C of room temperature) are excluded since these generally have a high enough solubility in most solvents.

In the case of the Ugi reaction in this example, only the solubility of boc-glycine and the product are considered.

The results are color-coded. In this case 14 solvents are coded green, indicating that all criteria were met. The fifteenth solvent is coded yellow, indicating that one of the criteria was not met - in this case the boiling point of 205 C is outside of the limit of 100 C. High boiling point solvents are not optimal for quickly obtaining the product as a dry solid after filtering. This criterion can be changed in the input fields at the top of the page. It is also possible to change the number of times the product is washed there. This will only change the estimated yield, which is based on carrying out the reaction at the concentration of the least soluble reactant, up to 1 M.

Three columns are generated for the product and each reactant. The column on the right is the average of all measurements, as recorded in the SolubilitiesSum Spreadsheet. The middle column is a solubility prediction based on Abraham descriptors derived from experimental values, as described and used in the ONS Solubility Challenge book. The column on the left contains predictions from the Abraham001 model, which is based on calculated molecular descriptors only.

The numbers in bold represent the best solubility value available for each solvent. If a measurement is known, that will be the number used. If no measurement is available, the experimental Abraham descriptor model is used. If neither of these are available the predictions from the Abraham001 model are used by default.

From the list of solvents in the green section we find ethanol and acetonitrile. Both of these solvents were tried (as mixtures with methanol) in the optimization of this reaction (Bradley et al JoVE 2008) and provided good to intermediate results. THF was found to give low yields for this reaction and it scores at #51 in the yellow section, with a high solubility of the product accounting for the missed criterion.

One should keep in mind that this is just a tool to flag potentially interesting solvents. Common chemical sense needs to be used as well. For example, acetone and butanone are listed in the green section but these are incompatible with the Ugi reaction since they would compete with the aldehyde.

Note that the predictive models are way off in some cases. For example the Abraham001 model dramatically underestimates the solubilities of boc-glycine in the green section, while the measured Abraham descriptor model does much better for these cases. We will prioritize our next solubility measurements to try to improve the models - or at least understand what types of compounds are most likely to yield useful solubility estimates from these models.

In addition to being called from the Reaction Attempts Explorer, the Solvent Selector can be used for any compounds that have ChemSpider IDs. Simply separate the CSIDs with the pipe character:

After modifying the criteria and hitting update, the new criteria are conveniently represented in the URL in this format, making sharing a specific search with anyone easy:

It is even possible to use the service listing just one compound's CSID - this is useful for quickly comparing the measured solubilities with predictions from both models:

Monday, August 02, 2010

Berkeley Open Science Summit 2010 Notes

I just returned from the Open Science Summit held at Berkeley July 29-31, 2010.

There certainly was an impressive list of presenters as well as attendees. Many of the talks were quite good, although several on the last day were more about closed collaborations than Open Science. During these presentations the assumption that patents are required to exploit discoveries in health care was repeated. This was in sharp contrast to the second day's session on gene patents, where IP protection was shown to stifle innovation and the exploitation of discoveries.

A refreshing exception to this pattern on the last day was Andrew Hessel's presentation on the Pink Army Cooperative. Andrew's strategy to cure cancer is based on the idea of customizing drugs for each individual affected by the disease. Since each drug is only applicable to one individual, the approach of expensive clinical trials doesn't apply. Since he is not interested in generating a profit from selling the drugs, IP protection also doesn't apply and allows him to make every part of the drug design process, including genetic analysis, publicly available. It wasn't clear if such an approach would be legal in the US but he did mention going to another country if necessary. Although he didn't currently have cancer, he did indicate that he might have need of this technology one day by pulling out a pack of cigarettes in the middle of his talk.

Unfortunately my panel on Open Data was canceled at the last minute due to time management problems (see FF discussion on how it happened). However, I did have a chance to generally catch up with old friends (Carmen Drahl, Joanna Scott, Cameron Neylon, Jack Park).

I also discussed some promising collaborations with several people:

1) CoLab. I spoke at length with DJ Sprouse and Casey Stark about their system for scientific collaboration. We will try to represent one solubility experiment from the ONS Challenge notebook and one organic synthesis experiment from the UsefulChem notebook to see how the information can be represented within CoLab. There may be some opportunities to visualize raw data in new ways - perhaps using non-Java tools to interact with JCAMP-DX spectra.

2) IPzero Principles. I continued a conversation with Lisa Green started with John Wilbanks and Thinh Nguyen at Creative Commons about coming up with a series of simple recommendations for ensuring that an Open Notebook can effectively prevent the patenting of inventions within an area of interest to the Open Science community.

3) Open Chemistry Reactions. I had the chance to discuss our Reaction Attempts database with Peter Murray-Rust over breakfast on Saturday. He also showed me how he is using Oscar to extract chemical reaction information from various documents. Peter suggested that we pool together our data for a demonstration in September at the London Science Online Conference. Reaction Attempts will cover the reactions done in the UsefulChem and the Todd group's Open Notebooks. Peter will extract information from both patents and Acta Crystallographica.

4) ChemTaverna. I was pleased to learn from Carole Goble that Taverna is extending its coverage to cheminformatics applications with the ChemTaverna project. I had just mentioned that we would be interested in revisiting Taverna for creating virtual libraries of organic compounds and filtering them based on predicted solubilities in various solvents. This would allow us to contribute cheminformatics workflows to MyExperiment. Carole put me in touch with the project leader Peter Li at the University of Manchester.

Labels: , ,

Creative Commons Attribution Share-Alike 2.5 License