Saturday, January 26, 2008

Andre Brown's Talk on Science2.0

From Bora Zivkovic's A Blog around the Clock:

Here is a video of SPARC-ACRL Forum '08 on 12 January, 2008 at the Pennyslvania Convention Center in Philadelphia:

The SPARC-ACRL Forum at ALA '08 entitled "Working with the Facebook generation: Engaging students views on access to scholarship." Panelists discuss the merits of student activism, patent reform, blogs as a communication medium for scientists, and students as active members of a discussion about the right to access information for scholarly work. Features Andre Brown, Nelson Pavlosky, Stephanie Wang, and Kimberly Douglas as panelists.

Pay particular attention to Andre Brown and minutes 42-55 as he talks about science blogs and Science 2.0 including
mentions of all the usual suspects (Jean-Claude Bradley, Rosie Redfield, Reed Cartwright, Bill Hooker, Peter Suber and me):



SPARC-ACRL Forum '08 from Matt Agnello on Vimeo.

Andre Brown does a really good job of summarizing the key points of the potential of social software in communicating primary research.

UsefulChem gets a mention around minute 50 and Gus Rosania is used as an example of how blogging about science can facilitate collaboration (minute 52).

Labels: , , ,

Friday, January 25, 2008

We Have Anti-Malarial Activity!

The results are in.

Jiri Gut from the Rosenthal group has run 2 of our Ugi products and they both show inhibition of falcipain-2 (EXP165) and Plasmodium falciparum (EXP166) in the micromolar range.

To put this in context the activities are roughly 2 orders of magnitude lower than the positive control used for the enzyme inhibition and chloroquine for the parasite.

But it is a start. And we have officially closed the Open Science Loop for the malaria project, meaning that we have openly documented the docking results from Rajarshi Guha (D-EXP014), our syntheses (EXP148 EXP150) and testing (EXP165 EXP166) in the Rosenthal group.

We can't tell much about the validity of Rajarshi's docking model from the results of two compounds but as more data come in the situation should become clearer.

However, Jiri did make this interesting observation:
The food vacuole abnormality, which is indicative of cysteine protease inhibition was not observed in the parasites, suggesting other mode of action.


Labels: , , ,

Monday, January 21, 2008

Back from Science Bloging 2008

Like last year, the North Carolina Science Blogging conference was a hit.

I moderated a session on public scientific data with Xan Gregg. Both of our talks were recorded and available here. (I used SciVee this time to store the screencast and was even able to use their supplementary document option to store the mp3 that Feedburner properly processed in the podcast feed.)

We spent about half the session with presenting then there was an active discussion. (Hopefully someone got some decent audio on that part and I'll post a link here if possible.) The usual issues of scalability, findability, fundability, scooping, academic validation and government policy came up. No resolution on any of these of course but that's not the point of the gathering. Some new contacts were made and maybe that will lead to some progress on the Open Science front.

Speaking of new contacts, I was quite pleased to meet Moshe Pritsker from JoVE. He said that his camera people would come to my lab to record some experiments. Any students in the UsefulChem lab who would like to get involved - lets discuss it.

Having the chance to touch base with friends like Bill Hooker, Deepak Singh and Antony Williams was certainly a bonus.

I also enjoyed Hemai Parthasarathy's session in the morning on Open Science. She did a great job in moderating the discussion, which was really a brainstorming session on how the scientific publication process could and should evolve.

Bora has a comprehensive list of pre and post-conference blogging, pics and videos.

Labels: , ,

Wednesday, January 16, 2008

Crowdsourcing Drug Development

Yesterday I had the privilege of attending a workshop at the NIH on the National Cancer Institute Clinical Development of Small Molecules:
This one-day workshop will provide specialized training and information to NCI-supported investigators who plan to undertake clinical development of novel concepts and who are directly involved with implementing translational clinical research. Individuals will benefit from the opportunity for direct interaction with FDA’s Center for Drug Evaluation and Research and NCI’s Developmental Therapeutics Program senior staff.
I am grateful to Dan Zaharevitz for the invitation to the workshop yesterday and to a visit of his screening labs later today.

Dan has a vision that the drug discovery and development process could benefit tremendously through openness at all stages by facilitating communication between all parties involved from discovery through clinical testing.

For example, if discussions about formulation, ease of scale-up and toxicity occurred early on, perhaps a more efficient overall process would result. Our selection of Ugi products that crystallize from the reaction mixture and that are obtainable from cheap commercially available starting materials is an attempt at anticipating the scale-up and cost factors down the road.

Of course, this is what Open Notebook Science is all about - getting feedback from those with expertise at the earliest possible moment. The scale-up consideration I mentioned is an obvious factor from the perspective of an organic chemist. But there are other parameters that could benefit immensely from input from the drug development community.

Formulation is a big one. So far I haven't placed any restriction or requirements on functional groups in our Ugi virtual libraries. Most are not going to be very soluble in water and I don't have a good feeling about how important that is. At the workshop, it was repeated several times that a water soluble compound is preferable but it is certainly not a deal breaker. For IV delivery liposomal formulations are an option, as are some other FDA approved co-solvent systems. Right now we're shipping powders for our assays. It is my understanding that these are usually taken up in DMSO. But we don't have to do that - I have some experience with the preparation of liposomes and it would not be too much of a hassle to ship our compounds formulated in that way. There are issues of stability that have to be considered but these are manageable.

Now another approach would be creating compounds that are water soluble from the start. We could do that by introducing tertiary amines as a second functionality on our starting materials but that would severely reduce the size of our virtual libraries. Another strategy would involve using boc-protected amino acids, which we know work well in the Ugi reaction. The problem there is that they would have to be deprotected and that would reduce the convenience of the preparation.

Then there is the issue of optimizing by metabolites. Perhaps we should not be docking only the Ugi products but also their likely metabolic products. An expert in pharmacodynamics could certainly provide valuable input here. Anticipating toxicity is another consideration.

These are just examples of the types of conversations that we could be having well before any compounds get to in vivo or clinical trials.

And that is what Open Notebook Science is all about. Instead of waiting for our paper on the synthesis of new inhibitors of a certain enzyme to appear in print, give us some feedback while the experiment is being done.

There are definitely some obstacles to overcome to achieving this type of transparency and collaboration but technology is not the bottleneck. Even finding collaborators is no longer the key issue, with people like Dan Zaharevitz, Rajarshi Guha, Egon Willighagen, Gus Rosania, Philip Rosenthal, Tsu-Soo Tan, Cameron Neylon, Antony Williams, Kevin Owens, Peter Murray-Rust, and others stepping up to contribute what they can.

I think the main issue now is convincing a funding organization that this is a model worthy of support. Maybe NSF will be one of them.

Labels: , , ,

Chemistry Crowdsourcing Pre-proposal Posted

I posted our pre-proposal to NSF's Cyber-Enabled Discovery and Innovation program on Nature Precedings:
We used GoogleDocs to write most of the document but near the end I had to switch over to Word to get the formatting right. It is really a shame that GoogleDocs doesn't export properly to Word (or even pdf). There are huge spaces between paragraphs and the images don't come out the same size, which completely messes up the page length. And of course with proposals the formatting has to be perfect to adhere to their requirements (including the Arial font required by NSF).

Still I really like the collaborative features of GoogleDocs (especially the version tracking). It is much better to have a single version of a document available online to all collaborators rather than emailing around attachments. Hopefully someday Google will get the formatting issues fixed.

Friday, January 11, 2008

Rosania Blog

A few weeks ago I announced our new collaborator Gus Rosania's new wiki: 1CellPK.

Since that time Gus and his group have been tremendously active.

His report on drug design and formulation is a must read for students thinking of landing a job in that area.

Now Gus has moved the discussion over to the 1CellPK blog, where he also reports on his activities in Second Life. His lab has a home on Nature's island and is well worth a visit.

I encourage all of our UsefulChem collaborators to subscribe to his blog and continue the discussions started there.

Wednesday, January 09, 2008

Scientific American Science 2.0 Article

Mitch Waldrop has written an informative piece on the Science 2.0 movement in Scientific American:

Science 2.0: Great New Tool, or Great Risk?

Consistent with the content of the article, Mitch invites feedback:
Welcome to a Scientific American experiment in "networked journalism," in which readers—you—get to collaborate with the author to give a story its final form.

The article, below, is a particularly apt candidate for such an experiment: it's my feature story on "Science 2.0," which describes how researchers are beginning to harness wikis, blogs and other Web 2.0 technologies as a potentially transformative way of doing science. The draft article appears here, several months in advance of its print publication, and we are inviting you to comment on it. Your inputs will influence the article’s content, reporting, perhaps even its point of view.

So consider yourself invited. Please share your thoughts about the promise and peril of Science 2.0.—just post your inputs in the Comment section below. To help get you started, here are some questions to mull over:

* What do you think of the article itself? Are there errors? Oversimplifications? Gaps?
* What do you think of the notion of "Science 2.0?" Will Web 2.0 tools really make science much more productive? Will wikis, blogs and the like be transformative, or will they be just a minor convenience?
* Science 2.0 is one aspect of a broader Open Science movement, which also includes Open-Access scientific publishing and Open Data practices. How do you think this bigger movement will evolve?
* Looking at your own scientific field, how real is the suspicion and mistrust mentioned in the article? How much do you and your colleagues worry about getting “scooped”? Do you have first-hand knowledge of a case in which that has actually happened?
* When young scientists speak out on an open blog or wiki, do they risk hurting their careers?
* Is "open notebook" science always a good idea? Are there certain aspects of a project that researchers should keep quite, at least until the paper is published?
UsefulChem got a mention:
Unfortunately, this kind of technical safeguard does little to address a second concern: Getting scooped and losing the credit. "That's the first argument people bring to the table," says Drexel University chemist Jean-Claude Bradley, who created his independent laboratory wiki, UsefulChem, in December 2005. Even if incidents are rare in reality, Bradley says, everyone has heard a story, which is enough to keep most scientists from even discussing their unpublished work too freely, much less posting it on the Internet.

However, the Web provides better protection that the traditional journal system, Bradley maintains. Every change on a wiki gets a time-stamp, he notes, “so if someone actually did try to scoop you, it would be very easy to prove your priority--and to embarrass them. I think that's really what is going to drive open science: the fear factor. If you wait for the journals, your work won't appear for another six to nine months. But with open science, your claim to priority is out there right away."

Under Bradley's radically transparent "open notebook" approach, as he calls it, everything goes online: experimental protocols, successful outcomes, failed attempts, even discussions of papers being prepared for publication. "A simple wiki makes an almost perfect lab notebook," he declares. The time-stamps on every entry not only establish priority, but allow anyone to track the contributions of every person, even in a large collaboration.

Bradley concedes that there are sometimes legitimate reasons for researchers to think twice about being so open. If work involves patients or other human subjects, for example, privacy is obviously a concern. And if you think your work might lead to a patent, it is still not clear that the patent office will accept a wiki posting as proof of your priority. Until that is sorted out, he says, "the typical legal advice is: do not disclose your ideas before you file."

Still, Bradley says the more open scientists are, the better. When he started UsefulChem, for example, his lab was investigating the synthesis of drugs to fight diseases such as malaria. But because search engines could index what his team was doing without needing a bunch of passwords, "we suddenly found people discovering us on Google and wanting to work together. The National Cancer Institute contacted me wanting to test our compounds as anti-tumor agents. Rajarshi Guha at Indiana University offered to help us do calculations about docking--figuring out which molecules will be reactive. And there were others. So now we're not just one lab doing research, but a network of labs collaborating."

Labels: ,

Campus Technology Article about Molecules in Second Life

Linda Briggs wrote a nice article in Campus Technology about using Second Life to teach and highlighted the chemistry application that I used last term:
Creating Life-Size Molecules in Second Life

A Conversation with Drexel University's Jean-Claude Bradley

1/9/2008

By Linda L Briggs

CT: Conversely, what are some things that work really well in Second Life?

JCB: One thing new that I've done this term is have students do a project in Second Life.

CT: Yes, you recently wrote in your blog that one of your students created a life-size model of a molecule as part of that. That sounded really cool.

JCB: Right. To be able to stand next to a molecule that is as tall as you are, and to have your teacher be able to walk around it with you and comment,... that's pretty useful.

[....]

CT: Do you have advice for instructors who want to integrate Second Life into their course?

JCB: You should have a really good reason to do it. The best advice is to find another teacher who is actually using it, and try to experience what the student is experiencing. You'll get some ideas and advice from that. I was just talking to another teacher an hour ago who might be doing some things in Second Life. She's also an organic chemistry teacher. I told her, just send your students to Drexel Island; have them interact with my students, click around on the quizzes, and if you think it might make sense, you can spawn off from that.

A lot of people have bad experiences in Second Life because they don't have a good reason for going there. It's like having people go to the Internet without a Web address. You want to be guided. That's the best possible scenario.

It's just another tool. I wouldn't teach exclusively on Second Life. We have WebCT Blackboard; I have my wiki; I have my blogs; and those things all have their strengths. You've got to leverage them all.
Drexel Island also got a mention in Matt Villano's article in Campus Technology: 13 Tips for Virtual World Teaching

Labels: , ,

Saturday, January 05, 2008

Tracking Results with Workflow Tables

Following my post about shifting the storage of chemistry experiments to a results-centric model, I received lots of good feedback.

Egon pointed out an ambiguity in specifying the addition of a compound and that is now fixed in RESULT0001, RESULT0002 and RESULT0003. Instead of
ADD methanol (InChIKey=OKKJLVBELUTLKV-UHFFFAOYAX, volume=1 ml)
we now have:
ADD compound (common name=methanol, InChIKey=OKKJLVBELUTLKV-UHFFFAOYAX, volume=1 ml)
Peter demonstrated some related work of his using CML to represent reactions taken from experimental sections of published articles. This looks tricky because there is usually a lot of missing information in journal articles but I definitely think it is worth doing. We're using our laboratory notebook (specifically the log sections) so we have reasonably complete information in most cases.

I certainly am interested in using CML to represent our result modules and I appreciate Peter's help in trying to translate some of our modules into CML. Hopefully everything can be specified with the existing components of CML and CMLReact.

But representing the information in machine-readable format is just one half of the equation. Being able get information back out with powerful queries is just as important.

Antony's comments about workflows got me to rethink the problem from a slightly different angle. Although the result files that I have been constructing are very flexible, until someone actually populates a database with the data they describe, it will be difficult to get aggregate information back out. The main problem is that to compare two workflows requires lining up the corresponding actions. It is doable but requires some intelligent processing, only possible once a database is in place.

However, by sacrificing a bit of the generality, we can gain a lot in the short term. The vast majority of reactions that we've carried out in my lab are just variations on the Ugi synthesis. All Ugi syntheses have an amine, an aldehyde, a carboxylic acid, an isonitrile and a solvent. It turns out that with a series of tables, we can represent all the workflows leading to a result in a way that enables ready comparison and sorting.

The first table records the time of action initiation (normalized to minutes) for each workflow. Since these are in absolute times from the start of the experiment, the order of the columns is unimportant. If we were looking for experiments where the aldehyde was added after the amine, we would simply substract the aldehyde addition time from the amine addition time and look for positive values. Also reactions involving only the formation of an imine would be a subset of the Ugi reaction with blanks for acids and isonitriles.

The second table records the quantities of compounds (normalized to millimoles) and the third records the duration of time variable actions (normalized to minutes). Examples of the latter include vortexing and centrifugation durations.

Two additional tables record the identity of the compounds, one using the InChIKey for machine recognition and the other a common name for human use.

I have represented all of the workflows with documented results for EXP150. Links are available to the raw image data on Flickr or JCAMP-DX files for the NMR and IR spectra on our server.

Using GoogleDocs is very nice for this kind of thing. Right clicking on any cell offers a Google search, which is extremely convenient for the InChIKey. It is also easy to make the data public this way and invite collaborators. (Speaking of which, I need some help to complete the conversion from the wiki to these tables :)

Labels: , , ,

Thursday, January 03, 2008

Modularizing Results and Analysis in Chemistry

Chemical research has traditionally been organized in either experiment-centric or molecule-centric models.

This makes sense from the chemist's standpoint.

When we think about doing chemistry, we conceptualize experiments as the fundamental unit of progress. This is reflected in the laboratory notebook, where each page is an experiment, with an objective, a procedure, the results, their analysis and a final conclusion optimally directly answering the stated objective.

When we think about searching for chemistry, we generally imagine molecules and transformations. This is reflected in the search engines that are available to chemists, with most allowing at least the drawing or representation of a single molecule or class of molecules (via substructure searching).

But these are not the only perspectives possible.

What would chemistry look like from a results-centric view?

Lets see with a specific example. Take EXP150, where we are trying to synthesize a Ugi product as a potential anti-malarial agent and identify Ugi products that crystallize from their reaction mixture.

If we extract the information contained here based on individual results, something very interesting happens. By using some standard representation for actions we can come up with something that looks like it should be machine readable without much difficulty:
  • ADD container (type=one dram screwcap vial)
  • ADD methanol (InChIKey=OKKJLVBELUTLKV-UHFFFAOYAX, volume=1 ml)
  • WAIT (time=15 min)
  • ADD benzylamine (InChIKey=WGQKYBSKWIADBV-UHFFFAOYAL, volume=54.6 ul)
  • VORTEX (time=15 s)
  • WAIT (time=4 min)
  • ADD phenanthrene-9-carboxaldehyde (InChIKey=QECIGCMPORCORE-UHFFFAOYAE, mass=103.1 mg)
  • VORTEX (time=4 min)
  • WAIT (time=22 min)
  • ADD crotonic acid (InChIKey=LDHQCZJRKDOVOX-JSWHHWTPCJ, mass=43.0 mg)
  • VORTEX (time=30 s)
  • WAIT (time=14 min)
  • ADD tert-butyl isocyanide (InChIKey=FAGLEPBREOXSAC-UHFFFAOYAL, volume=56.5 ul)
  • VORTEX (time=5.5 min)
  • TAKE PICTURE



It turns out that for this CombiUgi project very few commands are required to describe all possible actions:
  • ADD
  • WAIT
  • VORTEX
  • CENTRIFUGE
  • DECANT
  • TAKE PICTURE
  • TAKE NMR
By focusing on each result independently, it no longer matters if the objective of the experiment was reached or if the experiment was aborted at a later point.

Also, if we recorded chemistry this way we could do searches that are currently not possible:
  • What happens (pictures, NMRs) when an amine and an aromatic aldehyde are mixed in an alcoholic solvent for more than 3 hours with at least 15 s vortexing after the addition of both reagents?
  • What happens (picture, NMRs) when an isonitrile, amine, aldehyde and carboxylic acid are mixed in that specific order, with at least 2 vortexing steps of any duration?
I am not sure if we can get to that level of query control, but ChemSpider will investigate representing our results in a database in this way to see how far we can get.

Note that we can't represent everything using this approach. For example observations made in the experiment log don't show up here, as well as anything unexpected. Therefore, at least as long as we have human beings recording experiments, we're going to continue to use the wiki as the official lab notebook of my group. But hopefully I've shown how we can translate from freeform to structured format fairly easily.

Now one reason I think that this is a good time to generate results-centric databases is the inevitable rise of automation. It turns out that it is difficult for humans to record an experiment log accurately. (Take a look at the lab notebooks in a typical organic chemistry lab - can you really reproduce all those experiments without talking to the researcher?)

But machines are good at recording dates and times of actions and all the tedious details of executing a protocol. This is something that we would like to address in the automation component of our next proposal.

Does that mean that machines will replace chemists in the near future? Not any more than calculators have replaced mathematicians. I think that automating result production will leave more time for analysis, which is really the test of a true chemist (as opposed to a technician).

Here is an example of an analysis module making a simple point, useful to the chemistry community, and linking back to result modules that ultimately link back to the original experiment in the online laboratory notebook:
Context: obtaining precipitates in the CombiUgi project

Ugi reactions in methanol where the solution is supersaturated with Ugi product may give false negatives for precipitation. For example, a Ugi product rapidly crystallized at the 17th hour (RESULT0003) after addition of all reagents, while appearing as a clear solution at the 15th hour (RESULT0002). It is therefore recommended that the vials be submitted to vortexing (15 s) prior to taking a picture.
We'll be recording these analysis and result modules on UsefulChem wiki pages:
We'll be using InChIKeys for compact unambiguous identification of molecules (and convenient indexing in Google) and the terms in this post for action options. Anyone is free to automatically incorporate these in a database, as long as attribution is provided. (If anyone knows of any accepted XML for experimental actions let me know and we'll adopt that.)

I think this takes us a step closer from freeform Open Notebook Science to the chemical semantic web, something that both Cameron Neylon and I have been discussing for a while now.

Labels: , , ,

Creative Commons Attribution Share-Alike 2.5 License