Wednesday, May 30, 2007

CombiUgi and Closing the Open Science Loop

A few weeks ago I asked Rikesh to kick off the CombiUgi project, to create lists of commercially available boc-protected amino acids, aldehydes, primary amines and isonitriles. He is now done and the links to purchase each compound is provided, in addition to the SMILES code.

What we need to do now is generate the list of 90,000 Ugi products resulting from their combination (not including enantiomers and diastereomers and the free amines resulting after boc deprotection). This can probably be done fairly easily with VBA in Excel but if anyone wants to pitch in using their favorite software that would be appreciated.

Although we have not been successful in cyclizing our Ugi products to diketopiperazines, from what we learned, I expect that we should be able to make any of these 90,000 compounds in less than a week (including shipping time).

By indexing these compounds in relevant search engines (I am working with Chemspider to make this happen) as UsefulChem molecules available upon request (and justification) we have an opportunity to close the loop on a practical Open Science project.

By the loop I mean a complete iteration from hypothesis to deciding which compounds to make to actually making them and getting testing results. These results will confirm or force a modification of the hypothesis and the cycle goes through another iteration hopefully closer to producing a useful outcome (a good drug lead compound for example).

I imagine that this loop operates in a lot of research groups. But doing the work under Open Science conditions lets it evolve in new ways. First of all, the direction of progress is determined by the collaborators that elect to participate in the process, not necessarily scientific objectives.

An example of that is our recent shift from the testing of our compounds as anti-malarial agents to testing them as tumor inhibitors simply because Dan Zaharevitz from NCI contacted me and suggested that we submit our compounds.

Right after we started to submit our compounds, Dan left this message:

The folks at Indiana have done a lot of cool stuff that is well worth looking at. One thing they have running in a preliminary form is a service that predicts a
compound's activity
in cell lines in the screen. This compound is predicted to be inactive in the cell lines in the prediction. I actually don't think that is a bad result. We probably should put up a place to discuss screens and screening strategy, but essentially a prediction tools such as this summarizes what is known. A compound that is predicted to be inactive, but turns out to be active is much more likely to show you something new and interesting than a compound that is predicted to be active and is active.

So that's the last piece that closes the loop. This web service will make a prediction about activity of the compounds generated by the CombiUgi algorithm and rank them. The flagged compounds will be identifed and synthesized then tested via NCI's assays for tumor cell inhibition.

My groups core expertise is the synthetic component. As far as we are concerned the other 2 processes are black boxes. And for scientists involved in the computation and testing, our synthesis operation is probably a black box. But doing everything in the open, hopefully this will allow other researchers to propose other models and create derivative loops of their own.

We'd love to do the same for the anti-malarial assays but we have not found an established system in place like NCI that will do substrate screening routinely at no cost (except shipping of course).

Is it becoming clearer why I think the scientific process can be automated in novel and useful ways with the progressive adoption of Open Science?

Friday, May 25, 2007

Research Remix and Open Data

Heather Piwowar has recently started a blog, Research Remix, to discuss Open Data and Open Science.

This blog is an experiment. I’m starting my PhD literature review on the topic of biomedical data sharing and reuse, and thought it would be appropriate to do it out in the open.

She pointed out an extremely interesting study about data disclosure:

A few years ago, as I expressed frustration due to lack of a reply from a corresponding author, a professor summarized his experience: one third of authors do not reply when contacted, one third reply but are not able or willing to supply requested data, and one third reply and do supply the information.

Journal articles often remind me of court cases. Without lying, a good writer will make their case as simple and as compelling as possible for the editors and reviewers. There would be nothing wrong with that system if all the messy raw data were also made available. I am including here all of the related "failed and ambiguous experiments" that cast a shadow on the simplicity of explanation in the article.

Science is a messy process. We get into trouble by pretending that it isn't sometimes. That may be the issue with the reluctance to share.

Thanks to Bill Hooker for the link!

Thursday, May 24, 2007

Second Life Best Practices in Education Conference

Well my poster is up (#12) at the Second Life Best Practices in Education Conference 2007. I'm displaying some pics about showing research results and teaching organic chemistry using Second Life on Drexel and Second Nature islands.

Beth, Eloise and Neo have been working like crazy to get this done and there are now 1000 registrants!

I'll be coming in and out over the day tomorrow at my poster. I'm right next to Max Chatnoir's Gene Pool poster. I'll see you there!

Need Another Mechanism

According to the hypothesis I posted a few days ago, exposing one of our Ugi products missing a methyl group on the furan to 50%TFA/CDCl3 should still cause the furfuryl group to cleave.

But looking at the monitoring H NMRs of EXP101 up to 30 hours there is not even a trace of a reaction, including cyclization to a diketopiperazine.

This is getting interesting. What will it do with a group other than methyl (like t-butyl) at the furfuryl 5 position?

Monday, May 21, 2007

Totally Retrosynthetic

Last week Sivappa Rasapalli contacted me about making his ideas in organic chemistry public. He has written research proposals on synthetic strategies for natural product synthesis. (And it turns out some of his ideas involving furans are quite relevant to one of our pressing immediate concerns.)

He has been looking for an academic position and was wondering if he was better off being open about his ideas or keeping them secret, outside of submissions to search committees.

It is a good question, isn't it?

From the standpoint of plagiarism, let's see.

Unregistered documents sent to small groups of people working in the same area


Documents registered with third-party time stamps and efficiently indexed by the most popular search engine in the world

Which situation would you rather be in if a case of plagiarism needed to be settled? Can you imagine how embarrassing it would be to get caught doing that?

Anyone who knows me can guess what I suggested. Shiva has created a wiki and blog:
My Research Proposals at
Totally Retrosynthetic at

He tells me that he will continue to post his ideas there and, when he is in a position to do it, his research results. If he does, this is exactly what I had in mind when creating UsefulChem: a completely free and hosted model to carry out Open Science that can be replicated by anyone, anywhere, overnight.

Please help me welcome Shiva to the chemical blogosphere!

Saturday, May 19, 2007

Missing Methyl Mystery Mechanism

I think I have a plausible mechanism for the missing methyl problem we've been trying to solve recently.

After basification, the amine component drags everything except for the methylfufuryl group into the organic phase. For example, Khalid is showing that the Ugi product A generates free amine B pretty clearly by MS and NMR. (EXP097). (Other examples: EXP065 EXP067 EXP070 EXP091)

Since the methyl group disappears along with the furan ring and methylene, my best guess is that C forms intractable products that get lost in the baseline during H NMR monitoring.

Also, it turns out that the mysterious doublets formed during the monitoring of these reactions corresponds to the coupling between the benzylic H and the amide H in B. I would not have expected to see coupling under the rapid exchange conditions of 50% TFA in CDCl3. (see methanol-d4 exchange discussion in EXP091 for assignment support)

If this mechanism is correct, I would expect it to take place for 5-methylfurfurylamine (EXP073), its acetate(EXP081) and its boc-gly amide (EXP092). None of those experiments generated the methyl loss scenario. Maybe the rate is just faster for these Ugi products for some reason.

Any advice from furan experts out there?

Chemical Blogspace Tags



benzyl isonitrile

trifluoroacetic acid

t-butyl trifluoroacetate

Labels: , , , ,

Sunday, May 13, 2007

Microsoft eScience Conference Oct07

The 2007 Microsoft eScience Workshop at RENCI looks really interesting for us and anyone else doing work in cheminformatics or science automation.

October 21-23, 2007
The Friday Center for Continuing Education
UNC-Chapel Hill
100 Friday Center Drive
Chapel Hill NC 27599-1020

It is no longer possible to do science without doing computing.

The use of computers creates many challenges as it expands the realm of the possible in scientific research and many of these challenges are common to researchers in different areas. The insights gained in one area may catalyze change and accelerate discovery in many others. This workshop is explicitly cross-disciplinary, with the goal of bringing together scientists from different areas to share their research and experiences of how computing is shaping their work, providing new insights and changing what can be done in science. The focus is on the research, and the technologies that make that research possible.

We would like to invite contributions from any area of eScience; examples include:

Modeling of natural systems
Knowledge discovery and merging datasets
Science data analysis, mining, and visualization
Healthcare and biomedical informatics
High performance computing in science
Innovations in publishing scientific literature, results, and data
The impact of eScience on teaching and learning
Applying novel information technologies to disaster management
Robotics in science
Scientific challenges with no obvious computing solutions

Thanks to BBGM!

Thursday, May 10, 2007

UsefulChem on Drexel Island

Today, I added a little section for UsefulChem on the northeast section of the Drexel Island in Second Life. There are slides from my presentation at the ACS on Open Notebook Science as well as a pic of one of our Ugi products docking in enoyl reductase. The 3D structure of the molecule is also floating there, ready to be rotated and inspected.

We're now working on getting the docking visualized in full 3D space. Eloise is helping with that now.

Don't be shy - come visit! (slurl)

Labels: , , ,

Sunday, May 06, 2007

More Talk on Open Notebooks

There is an editorial discussing Open Notebook Science in last week's Nature "Share your lab notes", repeated on one of the Nature blog Nautilus.

The article has stirred up some good discussion on a number of blogs, mainly on Bora's Blog Around the Clock and Pimm

Bjorn makes some pretty dire predictions about all this in a comment on Bora's blog:

In times of ever more limited funding and more and more competition, open science will not emerge. Researchers now have to fight not only for scientific results, but for their own livelihoods and that of their families.The more funding gets cut, the more it needs to be restricted on topics deemed important. More researchers will accumulate in such "hot areas", making them even more important. Nothing will be shared in such a situation, but instead you will see a rise in scientific misconduct

The ability to do science in a fully transparent and open format may be one of the most important functions of tenure at this time in scientific history. Because of that we can operate without reservation and avoid risking our livelihood.

If we do a good job the next wave of open researchers won't have to risk as much without the security of tenure.

Friday, May 04, 2007

Cell Article about Science Blogging

Laura Bonetta's article "Scientists Enter the Blogosphere" just came out in Cell (Volume 129, Issue 3, 4 May 2007, Pages 443-445) . UsefulChem got a mention:

Jean-Claude Bradley and his students at Drexel University are experimenting with a live open lab notebook on his blog Useful Chemistry ( and wiki ( The blog discusses and analyzes results, with links to the raw data on the wiki.

Bradley's group writes down the experimental plan, the results as raw data, observations, then conclusions—every detail a scientist would include in a lab notebook except that the information is available on the Web for everyone to see and comment on. “We don't just put things that work but also failed experiments. We thought that if we cannot use the data maybe others will find a use for them,” says Bradley. People have come to Useful Chemistry looking for the boiling point of a given compound or a chemical reaction. “It is encouraging to see that,” says Bradley. “Part of what we wanted to do was put small bits of information out there that might be useful.” He has not yet tried to publish any of the data on his blog but says he will soon be in a position to do so. He is well aware that most top-tier journals have guidelines precluding publication of anything that has already been reported, regardless of its format.
It warms the cockles of my heart to see incoming links from ScienceDirect to our blogs because it means the chasm separating social software from tradition has been bridged one more time.

Wednesday, May 02, 2007

Going to Science Foo Camp

I just got an invitation to attend Science Foo Camp in August 07, a unique meeting organized by Nature, O'Reilly and Google. Based on what I heard from last year's attendees this will be an amazing opportunity to bounce ideas around.

I'd like to hear more from others who are going or who attended last year.
As before, we will be inviting around 200 people who are doing particularly interesting work in a wide range of scientific disciplines, as well as in areas of technology and culture that influence, and are influenced by, science. The aim is to encourage cross-fertilization of ideas, creating a unique opportunity to explore topics that transcend traditional boundaries. Of course, senior colleagues from Nature, O'Reilly, and Google will also be present.

Labels: , , , ,

Creative Commons Attribution Share-Alike 2.5 License