There is a new kid on the cheminformatics block.

ChemSpider Beta went live on Saturday March 24, 2007 with over 10 million compounds. Anyone using other free online molecular databases (like eMolecules or Chemistry Search Lookup Service) should definitely give this one a spin.

At this time, it is possible to use the system not only to find molecules in databases but also to predict molecular properties (like density and boiling point), which can come very handy.

But the best reason for keeping an eye on this one is that it is led by Tony Williams. I have known Tony since my graduate student days at the University of Ottawa, where he was running our NMRs. Lets just say he's a little obsessive about the projects he takes on and likes to push the envelope. That usually makes things very interesting.

ChemSpider is still under construction and new features are getting added daily. Make sure to provide some feedback on the site for new features that you would like to see added. Projects such as these are exactly what we need to lubricate free and open science.

Also make sure to subscribe to the accompanying Spinneret Chemistry Webzine written by David Bradley.

I am on the advisory board representing the academic side.

Wednesday, March 28, 2007

Communicating Chemistry at ACS

Yesterday I presented my second talk, this time on the use of blogs and wikis to do laboratory research. This was under the Chemical Education symposium: Communicating Chemistry. Most of the talks were about teaching so it was perhaps not the best audience but that doesn't concern me as much since I started recording my talks at conferences.

And if I hadn't presented there I would have missed meeting Thomas Poon, whose high quality organic chemistry pre-lectures I have used in my classes as an extra resource for a while. I would have also probably missed Michelle Francl's talk about her podcasting adventures in Quantum Chemistry. She has had her podcasts downloaded 250 000 times so far!

In the afternoon I headed over to the Evolving Network of Scientific Communication symposium. I heard Joanna Scott again, this time discussing in more detail Nature's experience with social software, like Connotea. She kept repeating that Nature is not just a journal article publisher and is serious about experimenting with other forms of science communication. From what I have seen they are certainly one of the most aggressive publishers on this front.

Joanna also invited me to join the Nature island in to continue the work we started on using Second Life for chemistry research and teaching. The following images were taken from here: slurl. (Beth is posing next to the quiz in the first pic)

Marketplace Segment on Open Notebook Science

The NPR interview on Open Science I discussed two weeks ago has aired and is now available.

I think it was very well balanced. The positive aspects of not losing failed experiments was weighed against the difficulties in publishing in some journals and of deriving profit.

Monday, March 26, 2007

Second Life at the ACS and Quizzes

Yesterday, I gave my first talk at the March07 ACS meeting on Teaching Organic Chemistry with Blogs and Wikis. The screencast is now available.

It was part of a symposium on Using Social Networking Tools to Teach Chemistry organized by Harry Pence and Andrea Gay. Joanna Scott gave a most interesting talk about Nature's experimentation with Second Life and the great possibilities for communicating research work. Harry is also involved with Second Life. Indeed I met him in world by accident a few days ago!

Largely because of Beth Ritter-Guth's tireless dedication to implementing educational opportunities in Second Life, I am finally coming around and seeing the potential for teaching and research. With the help of Eloise Pasteur (SL name), Beth has created an adaptation of the EduFrag quizzes for second life.

The rules are the same but the interface is very different from Unreal Tournament. Click on the obelisk to get the quiz started. Four images will appear and only one will be correct. Click on the correct one to go to the next set. Clicking on an incorrect image will make you start over. If you make it past the 20th room you will be a rewarded with a picture of my cat yawning.

The material in the current quiz is on introductory organic chemistry (Lewis structures, Newman projections, nomenclature, etc.) and I will make good use of it in the class I am teaching next week.

Give it a try and let me know how it works. The quiz is located in the Open Notebook Science building on Eduisland (slurl). We'll be adding more material related to UsefulChem there shortly (thus the use of the Blue Obelisk).

Taking the Long Way to Chicago ACS

Well I finally made it to Chicago for the American Chemical Society meeting on Saturday.

The bad news is that my 9:35 flight from Philly was canceled and I had to take another then ended up leaving at 17:00.

The good news is that I was stranded in the airport with Bruce Ganem from Cornell. As luck would have it Bruce is also working on multi-component reactions, including the Ugi reaction and his version of a 5-component Ugi involving nitrile and organometalic components. He received the ACS award for Creative Invention yesterday.

Needless to say, I was eager to get his opinion on our little puzzle of the disappearing methyl. He was reminded of the acid-catalysed hydrolysis of furans leading to the opening of the ring. This might open the door deprotonating the methyl as shown below. Lots of things could happen from this point, including cyclizations. When I have more time I'll see how we could get that pair of doublets, with one of them massively deshielded at 8.4 ppm. I am wondering if that peak stays there after neutralization. Time will tell.

More on the ACS shortly....

I hope to meet up with some of you at the conference.
There is a chemistry blogging meeting at 13:00 today and a Blue Obelisk dinner at 19:00 on Tuesday.

Wednesday, March 21, 2007

Disappearing Methyls

Ever since we isolated our Ugi products, we've been trying to cyclize them to the diketopiperazines. As described by Hulme, we are trying to effect an intramolecular transamidation catalyzed by trifluoroacetic acid (TFA). Instead of dichloroethane we are generally using CDCl3 so that we can monitor the reaction by NMR.

The first step of removing the boc group seems to proceed very smoothly, with the appearance of t-butyl trifluoroacetate at 1.61 ppm within a few minutes or hours, depending on the concentration of TFA. Here is a H NMR from Khalid's EXP070, 13 min after addition of 50% TFA in CDCl3 showing complete deprotection:

After 2 days at room temperature, a clean conversion to another set of peaks is completed. Whatever is going on, this is not to the desired diketopiperazine. For one thing the furan methyl group is gone.

There also appears to be one missing methylene and the furan protons have disappeared as well. A pair of mysterious doublets appear at 8.4 and 5.6 ppm.

Alicia observes a very similar course for her reaction (EXP065). Upon washing with water, nothing remains in the organic phase. This suggests that all the components are trifluoroacetate salts, most likely still in one piece.

Does anybody have any idea what is going on?

(If you look at the NMR spectra on the wiki with JSpecView, use IE; Firefox is still unstable)

Tuesday, March 20, 2007

Code for Open Content

After my post on searching Google for documents that are free to share, Egon asked me about how to indicate that on our sites.

Wikispaces does it automatically and looking at the HTML it looks like you just need to add this tag (or whatever Creative Commons License you choose):

< a rel="license" href="">Creative Commons Attribution Share-Alike 2.5 License >

I've now added this to the template file of most of my blogs on Blogger.

Monday, March 19, 2007

NPR interview on Open Notebook Science

Last week I had the pleasure of getting interviewed by Janet Babin at the WHYY studio in Philly. Janet is putting together a piece on Open Notebook/Open Source Science for her Marketplace series on NPR.

It was encouraging to see how much interest is being generated on this topic lately, especially in the popular media.

If you have listened to her pieces, such as the one on MIT's OpenCourseWare initiative, you would appreciate the pains to which she goes to provide a balanced perspective.

So it was interesting to see the issues that she asked me to address, based on her interviews with other parties. One of the concerns expressed about Open Notebook Science is that scientists would not want their raw data available to others without their interpretations.

I am used to dealing with questions of intellectual property rights, priority and impeding the ability to publish in prestigious journals. But the idea that it may be useless (or even perceived as irresponsible) to publish raw data without full analysis by the head scientist of a group is probably also an important barrier to the adoption of Open Notebook Science, or at least more open forms of science.

When my group publishes experiments on UsefulChem, the general order is typically: the experimental plan, the log, the results as raw data, observations then conclusions. Error correction based on feedback occurs at all points in the process.

So by default we almost always have much our raw data available without interpretation for long periods. And probably most of that information will never get interpreted (at least by us) because we don't need it to meet the narrow objectives of our experiments now or in the future.

But it is essential for these raw data to be available openly to humans and automated agents if we want Science2.0 to explode. (The data also need to be tagged and formatted properly to truly leverage automation - but more on that later)

The evolution of an experiment page is messy. Doing science is messy. There are errors to correct and faulty assumptions to confront and remove as we get more information and analyze an experiment.

I think that learning about science almost exclusively through polished journal articles can be discouraging, especially to new students. Like attorneys, scientists tend to write papers (at least the good ones) with arguments using selective evidence to support a clear point. There is nothing wrong with this, and I think that humans need this type of format much of the time to process new information. However, this approach leaves a lot out about how science actually gets done.

For example, if a chemist has developed a new reaction, the typical way to publish it is to try the reaction under different conditions and with different reactants. In principle this is very simple: do the reactions, fill in a table with yields then publish. In practice, at least in the organic chemistry labs where I have done my time, it does not work that way usually. Yields will vary between people and sometimes the reaction just won't work for a reason that never gets elucidated.

So what is the actual yield that will be reported? The best one? The worst one? The average? If you use the average, do remove outliers? If some of the product was spilled, do you still take that yield into consideration or completely scrap that run?

Every scientist who has written a paper has had to make a decision about what to do with the ambiguity of raw scientific information. And every day that the paper is not written and submitted because of ambiguity, the world doesn't know about it.

When we report our raw laboratory logs and data, we are not concerned about a number that will show up in a table (eventually hopefully) in a printed journal. We are concerned about truthfully reporting what we did, observed and thought at that time. There is no carpet under which to sweep ambiguity. All scientists should be doing that in their laboratory notebook. By sharing it in real time the world can benefit immediately.

Co-incidentally with my NPR interview, I just finished Diane Rehm's wonderful autobiography "Finding my Voice". In that book she talks about the evolution of talk radio in the 70s, including her pioneering efforts. People were discovering a new way to communicate and nobody really knew where it would end up. I think we are in a similar position now with science and new web technologies.

Sunday, March 18, 2007

Chemistry Wide Open Column on SB

I now have a column on called Chemistry Wide Open.

As you may guess from the title, I'll be discussing issues related to chemistry and Open Science. The challenge here is to post for a more general audience but I'll likely repost or restructure selected content from my blogs, mainly from this one and Drexel CoAS E-Learning.

But who knows? Maybe there are some hard core organic chemists on there that might appreciate some NMR problems. We'll see from the comments what makes sense.

Thursday, March 15, 2007

Searching for Open Access Chemistry on Google

The way people search for and find chemistry information is always in flux. Right now, Open Access is a hot topic (e.g. Open Source Archivangelism post) and it is interesting to see how those seeking OA sources are connecting with those who choose to share information in that way.

This morning I noticed from our SiteMeter referrals that someone had found some of our experiments (EXP019) involving anisaldehyde by doing a Google search with the advanced features set to include only documents with usage rights set to "free to use or share". A nice benefit of using Wikispaces as our lab notebook is that the Creative Commons Attribution Share-Alike license is applied by default to all content using the basic free account.

Looking at some of the other hits from that search provides an interesting sample of Open Access sources in chemistry currently being used:
Sunday, March 11, 2007

Chemical Heritage Foundation Talk on Open Science

I'll be giving a talk and participating in a panel about Open Source/Open Notebook Science on April 17, 2007 at the Chemical Heritage Foundation in Philadelphia.

See Registration info here

Thursday, March 08, 2007

Walkingshaw Science Web 2.0 Talk

Andrew Walkingshaw has a nice presentation on the application of Web 2.0 concepts in science, and more specifically in chemistry.

A transcript is also available, which is always convenient when pressed for time.

Monday, March 05, 2007

UsefulChem on eMolecules and Structure Searching

As some of you know, most of the structures in the UsefulChem molecules blog are cataloged in eMolecules under the supplier "Bradley Lab at Drexel University". Just check the Specific Supplier option to do substructure or exact match searching of our database. This is probably the most convenient way to search but the downside is that the database is only updated every few months. There are also currently errors of missing double bonds for some compounds that will get fixed at the next update. According to eMolecules, in the future the updating will be far more frequent.

Emolecules has also set up a direct link to our database here. It is possible for users to register to save their searches or just click on the Search Catalog button. More sophisticated searches can be done here, including browsing through molecules in grids. Again, there are some errors in here as well that should be fixed in the next eMolecules database update.

To access the most updated database of our molecules, Dave has set up SMARTS search capability on our server using OpenBabel. It is possible to do a quick substructure search just using a SMILES representation of the molecule. A really convenient way to get a SMILES code is to draw the structure on eMolecules then copy and paste. SMARTS can enable searching with a lot more control but just using the SMILES code is probably good enough for most searches. For example "C=N" shows all of the imines on the UsefulChem molecules blog. The link to this search is on item #2 of the UsefulChem Wiki front page.

Saturday, March 03, 2007

Automated Reaction Kinetics using Excel VBA, JCAMP, and Java

I have developed a software package which allows us to take data from our Varian UNITY INOVA-300 NMR and plot the concentrations of various species in the course of a reaction. The NMR data (in JCAMP format) are decompressed and combined into a BLOCK file which the Excel VBA program reads and plots. Concentration versus time plots, which then can be used in kinetics studies, are automatically generated. For more information on this software and how to use it, click here, or to download the information as a Word document, here. The software itself (which contains the documentation in Word format) can be downloaded as a zip file from here. For a test suite of data, including output files generated by the software, click here.

Creative Commons Attribution Share-Alike 2.5 License