Sunday, October 26, 2008

There are no facts: my position at NSF eChem workshop

I recently attended an NSF workshop on eChemistry: New Models for Scholarly Communication in Chemistry in Washington (Oct 23-24, 2008). The group consisted of about a dozen members, including publishers, social scientists, librarians and chemists. For background, this was the mandate:
Many scholarly communities have embraced new web-based models for disseminating the results of their research. These models include open access to formal publications and "gray literature", access to primary data and the tools to manipulate and visualize that data, interactive peer review, and integration with on-line discussion tools such as blogs and wikis. According to their advocates these new models make the scholarly process more transparent and substantially improve the opportunities for examination, re-use, and enhancement of new results.

This workshop will focus on Chemists who have generally been indifferent or resistant to these web-based models and to open access. By and large they continue to publish results in journals to which access is restricted to subscribers and reuse is limited by copyright. This lack of interest may have a number of origins including the different funding methods available to chemistry, the prevalence of industry participation and associated opportunities for profit from results, concerns about confidentiality and privacy, the possibility of longer term use of the data by their originators, or other aspects of the social and political organization of research in chemistry. The workshop will bring together experts from the chemistry, information science, open access, and science and technology studies communities to examine the multiple factors that influence adoption of new scholarly communication models.

The outcomes of the workshop will be reported in a white paper that will be made publicly available via this web site. The report will provide funding agencies, including the National Science Foundation and the JISC in the UK, with suggestions for targeted research programs that further examine the issues discussed at the workshop and that improve the communication and dissemination mechanisms that underlie chemistry scholarship (and internet-based scholarship in general).
Although the final report will be made publicly available in a few months, the presentation materials are not. After some discussion, I was permitted to liveblog the meeting under the Chatham house rule: Day 1, Day 2.

Of course individual participants may share their own presentations - here is mine. I can also share the scenario of the research process Jane Hunter typed up based on discussions from our sub-group between her, Jeremy Frey and myself.

My position statement and my main contribution to the workshop revolved around Open Notebook Science and its role in making the scientific process better through transparency. This is an extension of a statement I made a year ago on the importance of replacing trust with proof.

There are no facts in science - only measurement embedded within assumptions.

There are properties that have been determined so many times by different researchers and different techniques that we can treat a narrow range of values by consensus as if they were absolute facts. An example would be considering the boiling point of methanol at 1 atm to be 65C within one degree of accuracy. For most purposes that will suffice, as long as we understand the source of our confidence.

The problem arises when we treat rarely measured properties as facts simply because they are printed in peer-reviewed articles or tables in books. We teach our students not to trust numbers in Wikipedia but have no problem if they can cite a reference in a peer-reviewed journal, even without thoroughly analyzing the experimental sections.

We delude ourselves into thinking that we can appreciate our uncertainty of the value of a property simply by taking multiple measurements, taking an average and reporting standard deviation. That is actually a useful thing to do if we remember that we are measuring random errors and completely ignoring systematic errors, which are possibly very common in infrequently measured properties.

What is the solubility of 4-chlorobenzaldehyde in chloroform? UsefulChem experiment EXP208 reports it to be 0.07 molar. It was measured only once but I think duplicate runs would have come out pretty close to that. It might have slipped under the radar if it had not been measured in parallel with other chemically similar aromatic aldehydes with values all much greater than 1 molar. It just didn't make sense so we looked at the conditions reported in the experiment and the boiling points of all the compounds - this one had the lowest value (214 C at 1 atm). The pressure had not been recorded during the course of the experiment but when empty the Speed-Vac could go as low as 0.1 Torr, which would reduce the boiling point close to room temperature.

The next most volatile compound in this group was 2,6-dichlorobenzaldehyde. It was calculated by ChemSpider to be 239C at 1 atm, which is reasonable based on the 4-chloro analog. But here's an interesting twist - the reported boiling point is 165C on this MSDS sheet. It should be simple enough to see if that is an error by clicking through to the lab notebook page that generated that MSDS sheet... oh wait... MSDS sheets don't require proof, just this handy disclaimer: "We have not verified this information, and cannot guarantee that it is up-to-date." It also looks mighty trustworthy: "the page is maintained by the Safety Officer in Physical Chemistry at Oxford University". I'm not knocking Oxford - this is standard practice for the flow of chemical information in the current culture.

The bottom line is that 2,6-dichlorobenzaldehyde didn't evaporate off - we get a value of 3.4 M in chloroform. Now is it possible that some of it evaporated under the conditions of that experiment? Maybe but it my call that we're going to use that number for now as a good enough approximation for our model. It is possible that your application might have a different requirement. At least you have the information available in the Open Lab Notebook to make the call.

The solubility of 4-chlorobenzaldehyde in chloroform was measured again, this time monitoring the pressure and minimizing time on the Speed-Vac. The pressure varied over the course of the evaporation, making it impossible to neatly summarize in the experimental section of a paper. The measurement was done in duplicate in EXP209 and comes out at 3.61 molar with a standard deviation of 0.02. That isn't a fact but a good enough number under these circumstances to pretend it is and use it for our model. We'll see how it plays out when we have different researchers and use different techniques.

Labels: , , , , ,


At 4:09 PM, Blogger McDawg said...

As far as I know,(please correct me if I am wrong) this is the first example that I'm of aware of of someone liveblogging an event under Chatham house rules.

Keep breaking them boundaries Jean-Claude !! - Much appreciated.

At 1:44 AM, Blogger Jean-Claude Bradley said...

Someone may have done it 2 years ago at SciFoo, when that was Chatham rule by default for the sessions. The key point of Chatham is non-attribution, not silence - at least that is what we agreed at the NSF workshop. This was an unusual situation for me at a meeting and I think most people would prefer attribution for their thoughts. But I had to respect the chair's directive after group discussion.

At 9:37 AM, Anonymous Anonymous said...

I tried to do this at a workshop in Edinburgh earlier in the year (here and similar entries. Its all a bit vague though. I can see the point of Chatham House rules but it does as JC says seem to rule out proper attribution which is counterproductive.


Post a Comment

<< Home

Creative Commons Attribution Share-Alike 2.5 License