Useful Chemistry: 2010-02

Friday, February 26, 2010

Robert Grubbs Webinar on March 2, 2010

Honeywell Nobel Interactive Studio will host an interactive seminar with Robert Grubbs at 11:00 ET on March 2, 2010. Sign up here. Questions can be submitted via email, Twitter, Facebook or Orkut.

2005 Nobel Laureate in Chemistry, Robert Grubbs, will discuss how the availability of a catalyst that promotes scrambling of the fragments of a carbon-carbon double bond by a metathesis reaction has led to a variety of commercial applications including the production of tough polymers and highly functionalized pharmaceuticals.

Labels: Grubbs, metathesis, nobel, organic chemistry, webinar

Monday, February 22, 2010

Science Commons Symposium Thoughts

UPDATE: the recording of my talk is here, following Cameron Neylon. Also see other sessions.

The Science Commons Symposium held at the Microsoft Campus in Redmond on Feb 20, 2010 turned out to be the best conference I have attended in the past year. Hope Leman and Lisa Green did a fantastic job of lining up an electric group of speakers and making sure that everything ran smoothly. Chris Pirillo provided streaming video of the talks and the liveblogging on FriendFeed and Twitter was pretty active. The recordings will be made available shortly.

It was utterly captivating from start to finish. Cameron Neylon started us off with "Science in the Open: Why do we need it? How do we do it?" by outlining the tremendous opportunities of doing science more openly while remaining aware of the obstacles. I followed up with a specific Open Science implementation "Using Free Hosted Web2.0 Tools for Open Notebook Science", including the recent work I did with Andrew Lang on creating snapshot archives of a notebook with source files.

Antony Williams followed with "ChemSpider: Collecting and Curating the World’s Chemistry with the Community", convincingly demonstrating the power of crowdsourcing to curate Open Data. Peter Murray-Rust then covered "Open Data and how to achieve it", pointing out the role of an embargo period in getting people to start to participate in exposing data. All of these presentations made the symposium fairly chemistry centric but I don't think the audience minded - and there were a few chemists in the audience.

After lunch Heather Joseph from SPARC talked about "Is Open Access the “New Normal”?". Her views were about the role of policy change to support OA, for example how NIH funded work is required to be OA within 12 months of publication. Stephen Friend blew a lot of minds with his talk on "Setting Expectations: Need for Distributed Tasks and Evolving Disease Models". I'm not quite sure I completely get his network approach compared to our current disease models of targeting a specific receptor but I am sure I'll come across it again since it depends on the processing of (vast amounts of) Open Data.

Peter Binfield proudly recounted the achievements of PLoS ONE, of course including the article-level metrics: "PLoS ONE and article-level metrics – A case study in the Open Access publication of scholarly journals". I didn't agree with his call for converting all the metrics to a single number for academic performance reporting - but that did lead to a vigorous discussion on FriendFeed.

Finally John Wilbanks from Science Commons delivered the keynote. It was a mesmerizing overview of what is needed to make Open Science more productive and the importance of working at the bottleneck. He described the elegant way in which the CC0 license allows for a very simple way of making data available as if it were public domain, regardless of the laws in various countries. He also showed his current work on trying to make automatic licenses for processes under patent protection and material transfer agreements.

Brian Glanz has provided a detailed summary of all the sessions, including a wealth of links to slides and additional information.

My slides:

Science Commons Open Notebook Science Talk

View more presentations from Jean-Claude Bradley.

#scspn

Labels: microsoft, open data, open notebook science, science commons, symposium

Friday, February 19, 2010

Support Open Data by endorsing the Panton Principles

If you care about Open Data take a few seconds today to endorse the Panton Principles. There are also logos there to label your work as Open. Some fit nicely in the navigation bar of a wiki.

Science is based on building on, reusing and openly criticising the published body of scientific knowledge.
For science to effectively function, and for society to reap the full benefits from scientific endeavours, it is crucial that science data be made open.

Friday, February 12, 2010

ONS Solubility Book: Edition 3 with Notebook Archive

Edition 3 (2010-02-11) of the ONS Solubility Challenge book is now available.

We've been trying for some time to find a way to conveniently take a snapshot of our Open Notebooks and all associated raw data files. This could serve as a way to back up all of our work as well as provide a means of finding out the state of knowledge for a project at a given moment in time. There is also a tremendous benefit to confidently using the best of free hosted Web2.0 services out there (e.g. GoogleDocs and Wikispaces) without being concerned with changes in policies or access down the road.

Our recent use of the ONS Challenge Solubility book to periodically create releases of summarized data has opened up a convenient opportunity. And yesterday the last piece of the puzzle fell into place. Through a combination of fairly quick manual and automated tasks, Andrew Lang and I are able to push out a full snapshot of all relevant files and lab notebook pages and associate it with an edition of the book.

As described below, the archive is accessible interactively on a server, as a zip download or as a CD from LuLu. Perhaps we can also find a home on library servers in the future.

More details are provided in the preface for Edition 3 (2010-02-11):

This is the first edition to include a full archive of the ONS Challenge notebook. A space export from Wikispaces provides an initial version of all the HTML pages in the notebook with local hyperlinks to copies of all images and files uploaded onto the wiki. All of the Google Spreadsheets are automatically downloaded as Excel spreadsheets and placed in the same "files" folder as the images. NMR spectra, stored as JCAMP-DX files, are placed in the "spectra" folder. All of the HTML pages are reformatted to provide local references to both Excel spreadsheets and the JCAMP-DX files.

The notebook archive is meant to represent a snapshot of the state of all source documents at the time of the publication of an edition of this book. When used from a server with web services running, clicking on links to the spectra will allow interaction via a browser interface, including zooming in or out and integration of the NMR spectrum. When accessed in stand-alone mode after downloading or directly from a CD, everything will work the same, except that JCAMP-DX files must be open from JSpecView running on the desktop. Excel files will retain any calculations in the cells of the original Google Spreadsheets but dynamic values generated from calling web services - such the script that automatically integrates NMR spectra - will be frozen as simple values. However the link to the web service used will be stored in the cell as a comment. Links to external websites are not crawled and embedded Google Spreadsheets or videos are not copied. These will work but will reflect live data on the web.

The February 11, 2010 version of the notebook archive is available on a hosted site, on a CD or by download.

Labels: archive, book, lab notebook, lulu, onschallenge, snapshot, solubility

Monday, February 08, 2010

Funding Agencies and Open Science

I've been invited to participate in a panel discussion on "New tools in research, teaching, and publishing" on May 24, 2010 at the annual PI meeting for the Integrative Graduate Education and Research Traineeship (IGERT) program at NSF. After speaking with program manager Vikram Jaswal, I feel encouraged that funding agencies are interested in exploring the emerging role of Open Science and related novel communication channels for facilitating scientific progress.

The role that funding agencies can play in Open Science has been the subject of some discussion in the blogosphere. One view is that they can require more openness as a condition of funding. The NIH's requirement to make papers resulting from funding Open Access after 12 months of publication is a step in that direction. There is a debate about whether this should be extended to Open Data - even to the point of Open Notebook Science, where even failed experiments would be shared for the scientific community to learn from.

I tend to prefer the carrot to the stick. I think that funding agencies could value plans for "sharing beyond the norms" in proposals without imposing strict requirements. In the long run OS will succeed because each stakeholder (researcher, funder, publisher, etc.) acts out of selfish motives. I believe that the most effective way to stimulate this selfishness is to show concrete examples of practice and benefits.

Funding agencies should see the benefits of OS as a higher ROI - in terms of knowledge gained and shared with the scientific community - as well as the wider population ultimately footing the bill. A perceived downside of higher transparency might be the greater difficulty in fueling hype cycles. Most things aren't as pretty up close and science is no exception. If you measure success as the absence of failure and ambiguity then increased transparency is going to be a problem. Most experiments are failures of some sort (as the saying goes - if you're not failing you're not trying hard enough). But failed or successful - both categories of results can be useful to others if they are made available in a way that they can be discovered easily. Funding agencies can help transparency by making it clear that the whole truth is more valuable than a subset of the truth presented in a way that might be conveniently misleading.

This doesn't mean that you can't put your best foot forward and give a slick PowerPoint presentation to guide your audience. It is ok to construct an easily digestible narrative of your research. It is ok to distill your work down to key conclusions. It isn't necessary to confuse your audience with every ambiguous result and unanswered question.

But - in addition to the streamlined version of your work - if you provide all the details of the failures and ambiguities for those who can benefit from further exploration of what you have done - there is a great potential for accelerating the scientific process. For a funding agency OS can mean a bigger bang for the buck.

Labels: funding, IGERT, NSF, open notebook science, open science

Useful Chemistry