Monday, June 07, 2010

IGERT NSF panel on Digital Science

On May 24, 2010 I was part of a panel in Washington for the NSF IGERT annual meeting. As I mentioned previously, it is encouraging to find that funding agencies are paying more attention to the role of new forms of scholarship and dissemination of scientific information.

My co-panelists included Janet Stemwedel, who talked about the role of blogging in an academic career, Moshe Pritzker, who made a case for using video to communicate protocols in life sciences and Chris Impey, who demonstrated applications of clickers and Second Life in the classroom.

We only had 10 minutes each to speak so the presentations were basically highlights of what is possible. Still, it was enough to stimulate a vigorous discussion with the audience. There was a bit of controversy about the examples I used to demonstrate the limitations of peer review in chemistry. People can misinterpret what we are trying to do with ONS - it certainly doesn't include bringing down the peer review system (not that we could anyway). But we have to face the situation that peer review does not validate all the data and statements in a paper. It operates at a much higher level of abstraction. Providing transparency to the raw data should work in a synergistic way with the existing system.

My favorite part of the conference was easily Seth Shulman's talk on the "Telephone Gambit". Ever since reading his book, I have been using the story of how carefully reading Bell's lab notebook has forced us to revise the generally accepted notion of how the telephone was invented. Seth's presentation was truly captivating because he explained not only what was done but also what motives were at work to deceive and obfuscate. This cautionary tale is still very much relevant to science and invention today - and highlights how transparency can mitigate against this type of outcome.

Labels: , , ,

Monday, February 08, 2010

Funding Agencies and Open Science

I've been invited to participate in a panel discussion on "New tools in research, teaching, and publishing" on May 24, 2010 at the annual PI meeting for the Integrative Graduate Education and Research Traineeship (IGERT) program at NSF. After speaking with program manager Vikram Jaswal, I feel encouraged that funding agencies are interested in exploring the emerging role of Open Science and related novel communication channels for facilitating scientific progress.


The role that funding agencies can play in Open Science has been the subject of some discussion in the blogosphere. One view is that they can require more openness as a condition of funding. The NIH's requirement to make papers resulting from funding Open Access after 12 months of publication is a step in that direction. There is a debate about whether this should be extended to Open Data - even to the point of Open Notebook Science, where even failed experiments would be shared for the scientific community to learn from.

I tend to prefer the carrot to the stick. I think that funding agencies could value plans for "sharing beyond the norms" in proposals without imposing strict requirements. In the long run OS will succeed because each stakeholder (researcher, funder, publisher, etc.) acts out of selfish motives. I believe that the most effective way to stimulate this selfishness is to show concrete examples of practice and benefits.

Funding agencies should see the benefits of OS as a higher ROI - in terms of knowledge gained and shared with the scientific community - as well as the wider population ultimately footing the bill. A perceived downside of higher transparency might be the greater difficulty in fueling hype cycles. Most things aren't as pretty up close and science is no exception. If you measure success as the absence of failure and ambiguity then increased transparency is going to be a problem. Most experiments are failures of some sort (as the saying goes - if you're not failing you're not trying hard enough). But failed or successful - both categories of results can be useful to others if they are made available in a way that they can be discovered easily. Funding agencies can help transparency by making it clear that the whole truth is more valuable than a subset of the truth presented in a way that might be conveniently misleading.

This doesn't mean that you can't put your best foot forward and give a slick PowerPoint presentation to guide your audience. It is ok to construct an easily digestible narrative of your research. It is ok to distill your work down to key conclusions. It isn't necessary to confuse your audience with every ambiguous result and unanswered question.

But - in addition to the streamlined version of your work - if you provide all the details of the failures and ambiguities for those who can benefit from further exploration of what you have done - there is a great potential for accelerating the scientific process. For a funding agency OS can mean a bigger bang for the buck.

Labels: , , , ,

Wednesday, December 17, 2008

NSF proposal: Crowdsourcing Chemistry and Modeling using Open Notebook Science

On December 8, 2008 I submitted the pre-proposal "Crowdsourcing Chemistry and Modeling using Open Notebook Science" with Rajarshi Guha and Antony Williams to the NSF CDI program.

Last year we submitted to the same initiative and the reviewer comments were positive for the most part. The main criticism was the lack of a more fully developed computational component. I think we've addressed that this year by including Rajarshi and his plans to carry out modeling of the non-aqueous solubility data and Ugi reaction optimization.

We also have the ONS Challenge in place and the sponsorship by Submeta, Nature and Sigma-Aldrich should help.

I posted the PDF version of the proposal on Scribd, linked to it from Noam Harel's SCIEnCE wiki and put up a text version on the ONSC wiki. In some ways proposals can be more important than papers to connect up collaborators and gain an appreciation of where science is headed. Ironically the only people to see proposals (the reviewers) are typically a research group's closest competitors. So making them public makes sense. It could also help funding agencies connect up with researchers.

I think it would be helpful to have a Web2.0 database of research proposals. The SCIEnCE project aims to do this but doesn't currently have a structured interface. I created a "Research Proposal" group on Scribd that is open for anyone to drop in proposals. That gives us the standard Web2.0 functionalities like commenting, visitor count, favorites, etc. One of the most convenient features of this strategy is that it provides an RSS feed for new submissions. I've added this feed to my FriendFeed account.

Labels: , , , ,

Sunday, October 26, 2008

There are no facts: my position at NSF eChem workshop

I recently attended an NSF workshop on eChemistry: New Models for Scholarly Communication in Chemistry in Washington (Oct 23-24, 2008). The group consisted of about a dozen members, including publishers, social scientists, librarians and chemists. For background, this was the mandate:
Many scholarly communities have embraced new web-based models for disseminating the results of their research. These models include open access to formal publications and "gray literature", access to primary data and the tools to manipulate and visualize that data, interactive peer review, and integration with on-line discussion tools such as blogs and wikis. According to their advocates these new models make the scholarly process more transparent and substantially improve the opportunities for examination, re-use, and enhancement of new results.

This workshop will focus on Chemists who have generally been indifferent or resistant to these web-based models and to open access. By and large they continue to publish results in journals to which access is restricted to subscribers and reuse is limited by copyright. This lack of interest may have a number of origins including the different funding methods available to chemistry, the prevalence of industry participation and associated opportunities for profit from results, concerns about confidentiality and privacy, the possibility of longer term use of the data by their originators, or other aspects of the social and political organization of research in chemistry. The workshop will bring together experts from the chemistry, information science, open access, and science and technology studies communities to examine the multiple factors that influence adoption of new scholarly communication models.

The outcomes of the workshop will be reported in a white paper that will be made publicly available via this web site. The report will provide funding agencies, including the National Science Foundation and the JISC in the UK, with suggestions for targeted research programs that further examine the issues discussed at the workshop and that improve the communication and dissemination mechanisms that underlie chemistry scholarship (and internet-based scholarship in general).
Although the final report will be made publicly available in a few months, the presentation materials are not. After some discussion, I was permitted to liveblog the meeting under the Chatham house rule: Day 1, Day 2.

Of course individual participants may share their own presentations - here is mine. I can also share the scenario of the research process Jane Hunter typed up based on discussions from our sub-group between her, Jeremy Frey and myself.

My position statement and my main contribution to the workshop revolved around Open Notebook Science and its role in making the scientific process better through transparency. This is an extension of a statement I made a year ago on the importance of replacing trust with proof.

There are no facts in science - only measurement embedded within assumptions.

There are properties that have been determined so many times by different researchers and different techniques that we can treat a narrow range of values by consensus as if they were absolute facts. An example would be considering the boiling point of methanol at 1 atm to be 65C within one degree of accuracy. For most purposes that will suffice, as long as we understand the source of our confidence.

The problem arises when we treat rarely measured properties as facts simply because they are printed in peer-reviewed articles or tables in books. We teach our students not to trust numbers in Wikipedia but have no problem if they can cite a reference in a peer-reviewed journal, even without thoroughly analyzing the experimental sections.

We delude ourselves into thinking that we can appreciate our uncertainty of the value of a property simply by taking multiple measurements, taking an average and reporting standard deviation. That is actually a useful thing to do if we remember that we are measuring random errors and completely ignoring systematic errors, which are possibly very common in infrequently measured properties.

What is the solubility of 4-chlorobenzaldehyde in chloroform? UsefulChem experiment EXP208 reports it to be 0.07 molar. It was measured only once but I think duplicate runs would have come out pretty close to that. It might have slipped under the radar if it had not been measured in parallel with other chemically similar aromatic aldehydes with values all much greater than 1 molar. It just didn't make sense so we looked at the conditions reported in the experiment and the boiling points of all the compounds - this one had the lowest value (214 C at 1 atm). The pressure had not been recorded during the course of the experiment but when empty the Speed-Vac could go as low as 0.1 Torr, which would reduce the boiling point close to room temperature.

The next most volatile compound in this group was 2,6-dichlorobenzaldehyde. It was calculated by ChemSpider to be 239C at 1 atm, which is reasonable based on the 4-chloro analog. But here's an interesting twist - the reported boiling point is 165C on this MSDS sheet. It should be simple enough to see if that is an error by clicking through to the lab notebook page that generated that MSDS sheet... oh wait... MSDS sheets don't require proof, just this handy disclaimer: "We have not verified this information, and cannot guarantee that it is up-to-date." It also looks mighty trustworthy: "the page is maintained by the Safety Officer in Physical Chemistry at Oxford University". I'm not knocking Oxford - this is standard practice for the flow of chemical information in the current culture.

The bottom line is that 2,6-dichlorobenzaldehyde didn't evaporate off - we get a value of 3.4 M in chloroform. Now is it possible that some of it evaporated under the conditions of that experiment? Maybe but it my call that we're going to use that number for now as a good enough approximation for our model. It is possible that your application might have a different requirement. At least you have the information available in the Open Lab Notebook to make the call.

The solubility of 4-chlorobenzaldehyde in chloroform was measured again, this time monitoring the pressure and minimizing time on the Speed-Vac. The pressure varied over the course of the evaporation, making it impossible to neatly summarize in the experimental section of a paper. The measurement was done in duplicate in EXP209 and comes out at 3.61 molar with a standard deviation of 0.02. That isn't a fact but a good enough number under these circumstances to pretend it is and use it for our model. We'll see how it plays out when we have different researchers and use different techniques.


Labels: , , , , ,

Creative Commons Attribution Share-Alike 2.5 License