Sunday, August 31, 2008

UsefulChem and CML in Cambridge

This is turning out to be a very productive trip to the UK. I'm currently in Cambridge at Peter Murray-Rust's house with Cameron Neylon and Egon Willighagen. We're in the process of converting one of the Ugi reactions from our recent optimization paper to CML.

Here is the document. We're using the tag UgiChem2CML to discuss it.

Gotta catch a train to London .... more updates later

Tuesday, August 26, 2008

Happy Accidents: A Must-Read for Open Scientists

I usually limit my book reviews to Goodreads or Shelfari but this one deserves much more attention.

In Happy Accidents: Serendipity in Modern Medical Breakthroughs; When Scientists Find What They're NOT Looking for, Morton Meyers reviews examples of the unpredictability of scientific progress.

This could just be a collection of interesting anecdotes - and some of the stories are truly fascinating. My favorite is probably the discovery of platinum compounds for the treatment of cancer. It came about from the accidental electro-dissolution of a platinum electrode during an experiment studying the effect of electricity on cell cultures!

But Meyers goes further and uses these examples to make larger observations about the way science operates today in both academia and industry. A quote from the preface foreshadows the tone of the book:
The dominant convention of all scientific writing is to present discoveries as rationally driven and to let the facts speak for themselves. This humble ideal has succeeded in making scientists look as if they never make errors, that they straightforwardly answer every question they investigate. It banishes any hint of blunders and surprises along the way. Consequently, not only the general public but the scientific community itself is unaware of the vast role of serendipity in medical research. Typically, a discoverer may finally admit this only towards the end of his or her career, after the awards have been received.
And starting on page 304:
An applicant for a research grant is expected to have a clearly defined program for a period of three to five years. Implicit is the assumption that nothing unforeseen will be discovered during that time and, even if something were, it would not cause distraction from the approved line of research. Yet the reality is that many medical discoveries were made by researchers working on the basis of a fallacious hypothesis that led them down an unexpected fortuitous path.
The peer review system forces investigators to work on problems others think are important and to describe the work in a way that convinces the reviewers that results will be obtained. This is precisely what prevents funded work from being highly preliminary, speculative or radical. How can a venture into the unknown offer predictability of results?(my emphasis)
Indeed the basic process of peer review demands conformity of thinking and disdains a maverick's approach.
What it comes down to is this: Who on a review committee is the peer of a maverick? (my emphasis)
The fact that some of us in the Open Science community are discussing this does not mean that we are advocating for the abolition of peer review or the NIH. We are not that naive. We still submit proposals and manuscripts for publication in peer-reviewed journals (although given a choice we probably would pick an Open Access journal over one running on a paid subscription model).

The point is what we do in addition to all those traditional processes.

We can share our failed experiments. We can share our research plans. We can discuss science freely admitting what we don't know. We can record our talks at closed meetings and make them public. We can initiate and participate in serious scientific conversations going on in the blogosphere without worrying about everyone's title and rank.

Basically, we can collaborate in ways that are most conducive to serendipitous discoveries. The free social software, databases and other infrastructure now available make this information exchange easier than ever.

The key question for a researcher today: to hoard or not to hoard?

To me, it seems likely that data hoarders will find it more and more difficult to claim priority for a contribution when competing against loose associations of open collaborators motivated by insatiable curiosity.

Some of the folks from the funding side are getting it. Take a look at SubMeta.

Monday, August 25, 2008

The Fall 2008 ACS meeting ends

It is always nice for a major conference to happen on home turf. Last week the American Chemical Society held its fall meeting in Philadelphia.

I finally got to meet my Second Life collaborator Andrew Lang during our first talk on Monday August 18, 2008. We presented on what is now possible to do for chemistry in Second Life. There are now several easy to use tools for chemistry and molecular biology. For example, Andy has created a tool to display protein surfaces using a lightweight sculpted prim from a PDB file. Take a look at the slides from the presentation for a quick overview.

I gave another talk on Monday about Second Life and Social Media: Networking Goldmine or Time Sink? It was a nice opportunity for me to talk about major success stories (Bora Zivkovic, Beth Ritter-Guth and Deepak Singh) as well as specific examples from my own experience. Sandy Adam from Sigma-Aldrich and Andy Lang contributed their own stories at the end. The take home message was that if you treat these platforms as a means of participating with your scientific community you're likely to get out of it more than you put in.

On Wednesday I spoke about Open Notebook Science and the value of raw data in drug discovery. The timing was perfect, since I had just finished analyzing the correlation of our docking predictions with biological assays against falcipain-2 and Plasmodium falciparum. (data here)

Saturday, August 23, 2008

Tony Williams Drexel Visit

Tony Williams stopped by Drexel on August 21, 2008. After a nice chat with Martin Walker over lunch, Tony presented a demo of ChemSpider. The timing was not great because this was concurrent with the ACS meeting in Philadelphia. However, Tony recorded the session and it is available here as a Flash screencast.

Tony also took the opportunity to make an announcement of some new text mark-up features about to be made available on ChemSpider. For a brief video and description see his blog post.

I met Tony when I was a graduate student at the University of Ottawa and he was running the NMR facilities there. Even though he had not touched an NMR in 12 years he clearly has not lost his touch and got us out of a serious jam with the acquisition of carbon spectra on our Varian instrument. Just like old times.

Once you get to know Tony you'll appreciate why so many of us are willing to support and put our trust in ChemSpider.

Monday, August 11, 2008

Thoughts on SciFoo 2008

Now that SciFoo 08 is over here are my thoughts.

On Saturday I had the chance to present an Open Notebook Science session with Cameron Neylon and Antony Williams. It was a small session, which was good in a way because it made it easier for everyone to engage in the discussion. I was especially pleased that Noel Gorelick from Google was there and took the time to speak with us afterwards.

Google is serious about making science more open and we were lucky to have a first look at the Google Research Data project in action. The idea is to provide massive storage space for scientific data, including version control and all the other indexing benefits one would expect. This will allow anyone to generate a stable url to a particular dataset. Very exciting stuff for us, especially for managing the docking data generated by our collaborator Rajarshi Guha.

Like last year there was a lot of discussion about Open Science but it seemed that this year there was more focus on the positive and actually getting things done. Probably the most visible moment of transformation erupted from Chris Patil at the closing session, where he vowed to make his lab notebooks public and organize a collaboration of open research on aging.

My session on Second Life was very well attended and, for the most part, went smoothly. Several people told me that they finally understood how this technology could be useful. Second Life generally requires an experienced user to demonstrate its value in order for a first time user's experience to be positive. Andrew Lang was kind enough to join us online and we were able to show our 5D representation of a dataset, making molecules and the SciFooLivesOn area.

One of the most important opportunities for me was a chance spend 7-8 hours with Tony and Cameron on Friday to thoroughly discuss specific technical implementations of Open Notebook Science. ChemSpider will shortly unveil some very powerful technology to assist us and others to increase the impact of our work. More on that as it materializes...

There were so many inspiring sessions. Jane McGonigal described her quest to make video games process real scientific information and we discussed a possible application in UsefulChem to convert laboratory logs to machine readable formats. Stan Williams gave a mesmerizing presentation of how memristors, solid state devices that function like neurons, could be assembled to approach the computational power of biological brains.

The conference certainly sparked a lot of new ideas and promising collaborations. Now begins the Darwinian process of seeing which ones blossom.


Thursday, August 07, 2008

Off to SciFoo - My Sessions

I'm heading off to SciFoo. Here are my contributions to the SciFoo sessions wiki:

Following up on Antony Williams and Cameron Neylon, I would like to participate in a session on Open Notebook Science. The idea here is a little different from the concept of Open Data, which can mean different things to different people. For example, a public database of boiling points would certainly classify as Open Data but with access to ONS you would get the lab notebook page of the researcher who measured a certain boiling point. You could then assess the exact conditions reported and assess the likelihood of the boiling point being accurate. Jean-Claude Bradley

I would like to set up a live session using Second Life sometime Saturday morning. After SciFoo last year we had numerous presentations on Second Nature Island (see SciFooLivesOn wiki for details). Just prior to this meeting, 23andMe did a presentation. It would be great to have a few people from the conference as well as outside participate. I can demonstrate some data visualization work (in collaboration with Andy Lang) in chemistry to show the benefits of the virtual platform. I'll start a thread on my FriendFeed to report more details of the time and content. Jean-Claude Bradley

Wednesday, August 06, 2008

Scribd as a Repository for Proposals and Science Docs

Since Nature Precedings has tightened its policies and no longer accepts proposals, I have been looking for some alternative PDF repositories.

I think it is very important to have a convenient way to cite these documents on platforms with third-party timestamps and all the bells and whistles of web2.0 - ratings, comments, easy sharing tools, etc. Open Science is not just about what we are doing but also where we're headed.

Scribd seems to be a good solution. I've posted my last proposal to the Gates foundation there, in addition to the SCIEnCE site.

Tuesday, August 05, 2008

ChemSpider seminar at Drexel

Antony Williams, President of ChemZoo, will be giving a talk in the Chemistry department at Drexel on "ChemSpider – Weaving a Web of Chemistry". There should be plenty of time for an interactive session and discussion.

When: Thursday August 21, 2008 2:00-3:00 PM
Where: Disque 109 (32nd and Chestnut Streets, Philadelphia)

Abstract :
Access to free chemistry resources is revolutionizing access to data and knowledge for both chemists and the public at an unprecedented rate. Whether the access be via Wikipedia articles, via the PubChem or Pubmed systems or any of the Open Access publishers there is an increase in the availability of information just a few keystrokes away. ChemSpider is a new resource to provide access to chemistry-related information and has been built with a structure-centric focus and the application of community-based curation to enhance the quality of public domain data. The ChemSpider platform presently holds information for almost 22 million chemicals and provides semantic links to over 130 different data sources. This presentation will provide an overview of ChemSpider, a vision of future directions and a hands on tutorial for how to use the system

Monday, August 04, 2008

Solubility and 96 well Ugi plates

Following up on the objectives I outlined last week concerning making progress on our ability to model solubility and Ugi product precipitation, I have created a spreadsheet for solubility and one for 96 well plates for quick screening of Ugi product precipitation.

Both spreadsheets are open for crowdsourcing. Anyone requesting a solubility or Ugi reaction must include their name and rationale. We'll prioritize the runs based on chemical availability and fit with ongoing projects.

The solubility runs will be carried out by evaporating 1 ml of a saturated solution of the given compound and solvent using the SpeedVac in the Owens lab.

Unless otherwise stated the Ugi screening runs will consist of 50 microliters from 2 M methanol solutions of the specified amine, aldehyde, acid and isonitrile. The 96 well plates we are using hold about 300 microliters and have a flat transparent bottom ideal for scanning or photography. Precipitate detection will take place 16 hours after mixing. As I discussed previously, this standardization should facilitate modeling and speed up testing.

I have briefed our new undergraduate researchers Aneta and Cedric last week and they will start with the solubility runs.

Friday, August 01, 2008

The BCCE, research discussions and good friends

I've just returned from the Biennial Conference on Chemical Education 2008.

I wasn't able to record my first talk on Communicating Results from Undergraduate Research because my computer crashed. However, I repeated many of the same concepts in my second talk on Open Notebook Science and Cheminformatics.

It was very fortunate that the BCCE was held in Bloomington at Indiana University this year because it was a great opportunity for me to meet up with Rajarshi Guha, David Wild and Amar Flood.

Amar and I discussed Open Notebook Science with his lab people and it may make sense to do this for some of their projects. We set up a wiki to explore that possibility.

Rajarshi and I discussed at length our collaboration on the prediction of Ugi precipitates and docking against falcipain-2. It is certainly easier to pour over the relevant papers and online documents when face to face.

These are the outcomes:

1) We're going to separate the problem of predicting the solubility of Ugi products from the problem of generating the best enzyme inhibitors. Our initial plan was to try to make the top ranked Ugi products for a given enzyme and hope that we generate enough precipitates from those results for Rajarshi to model. We're just not getting enough positive results that way to generate a reliable model in a reasonable amount of time. To stack the deck in our favor we're going to start with a well behaved Ugi reaction (EXP099) and modify the reagents one at a time.

2) We're going to separate the performance of the Ugi reaction from the solubility of Ugi products in various solvents. Once we have Ugi products in hand in pure form from a reaction in methanol we will simply measure their solubility in other solvents, starting with ethanol, acetonitrile, THF and toluene. Low solubility will not guarantee that the Ugi reaction will proceed smoothly to produce a precipitate when carried out in a given solvent but it will certainly be a great starting point.

3) We're going to actually measure the solubility instead of just noting soluble or insoluble, as we have been doing in our Ugi master table. This will make it much easier for Rajarshi to come up with a robust model. Kevin Owens is already set up with a SpeedVac in his lab and that will help immensely. Rajarshi made the point that models predicting solubility in non-aqueous systems are needed and could be quite helpful to the chemistry community. He will be using the crystallographic data of our precipitates in his calculations. More on this later...

4) We're going to require the reactions to be easily amenable to automation, even if we can carry them out manually sometimes. For example, some of the reactions had starting materials that were not very soluble in methanol by themselves, even though they went into solution when combined with the other starting materials and then generated a product. This is interesting behavior but extremely inconvenient for automation because we can't make up stock solutions of reagents at 2M concentration in methanol.

5) We're going to require the reactions to be fast. Some reactions required several days to complete. These will now be considered to be negative for precipitation at the 16 hour mark.

