Friday, October 31, 2008

RDF triples for Open Notebook Science solubility data

Following up on a recent FriendFeed conversation about a blog post on the Triumvirate of Scientific Data prompted Pierre Lindenbaum to create an RDF representation of our solubility data that validates into triples.

I'll admit that I don't have much experience with RDF but I can see how the triples are connected and it is clear that this is a format very suitable for machine readability. It is nice that the provenance of the data is clearly indicated by links to the original lab notebook pages (for example EXP207) with all the details about how the solubility measurements were carried out.

In terms of information flow we go from a lab notebook wiki to a GoogleDoc spreadsheet to RDF. The GoogleDoc should only contain data has passed my inspection so treating these properties as "facts" in the RDF document is not unreasonable.

It will be interesting to see what can be done with this.

As Pierre mentions in the FF conversation, in order to view the triples as shown below, copy the RDF document as text and paste in this RDF validation service.

Thursday, October 30, 2008

ONS in 15 minutes

In the middle of my trip to the UK my hard drive crashed. It now looks like I won't be able to recover any of the recordings of the talks I gave. That's the bad news.

The good news is that I recently had the opportunity to give a fairly abbreviated summary of our Open Notebook Science and malaria work for a mini-symposium of faculty research in the Chemistry department at Drexel.

Screencasts and video recordings of talks are useful for many people but dedicating an hour of linear time is a tall order these days.

It is handy to have a 15 minute version available as mp3 or video.


Blogosphere and ONS in Nature today

Maxine Clarke did a brief write-up coming out today in Nature 455, xi (30 October 2008) on our Open Notebook Science work:

Is your science ready for total transparency? Jean-Claude Bradley, a chemist at Drexel University in Philadelphia, Pennsylvania, works on the synthesis of new antimalarial compounds using Open Notebook Science — a practice that makes all aspects of experiments and lab notebooks publicly available online.

During a question-and-answer session on the Sceptical Chymist blog, Nature Chemistry associate editor Neil Withers asked him when he last did an experiment in the lab.

The timing on that question was very fortunate since I have not done a full experiment in the lab myself in a while, except for my visit last month in Southampton in Cameron Neylon's lab.


Sunday, October 26, 2008

There are no facts: my position at NSF eChem workshop

I recently attended an NSF workshop on eChemistry: New Models for Scholarly Communication in Chemistry in Washington (Oct 23-24, 2008). The group consisted of about a dozen members, including publishers, social scientists, librarians and chemists. For background, this was the mandate:
Many scholarly communities have embraced new web-based models for disseminating the results of their research. These models include open access to formal publications and "gray literature", access to primary data and the tools to manipulate and visualize that data, interactive peer review, and integration with on-line discussion tools such as blogs and wikis. According to their advocates these new models make the scholarly process more transparent and substantially improve the opportunities for examination, re-use, and enhancement of new results.

This workshop will focus on Chemists who have generally been indifferent or resistant to these web-based models and to open access. By and large they continue to publish results in journals to which access is restricted to subscribers and reuse is limited by copyright. This lack of interest may have a number of origins including the different funding methods available to chemistry, the prevalence of industry participation and associated opportunities for profit from results, concerns about confidentiality and privacy, the possibility of longer term use of the data by their originators, or other aspects of the social and political organization of research in chemistry. The workshop will bring together experts from the chemistry, information science, open access, and science and technology studies communities to examine the multiple factors that influence adoption of new scholarly communication models.

The outcomes of the workshop will be reported in a white paper that will be made publicly available via this web site. The report will provide funding agencies, including the National Science Foundation and the JISC in the UK, with suggestions for targeted research programs that further examine the issues discussed at the workshop and that improve the communication and dissemination mechanisms that underlie chemistry scholarship (and internet-based scholarship in general).
Although the final report will be made publicly available in a few months, the presentation materials are not. After some discussion, I was permitted to liveblog the meeting under the Chatham house rule: Day 1, Day 2.

Of course individual participants may share their own presentations - here is mine. I can also share the scenario of the research process Jane Hunter typed up based on discussions from our sub-group between her, Jeremy Frey and myself.

My position statement and my main contribution to the workshop revolved around Open Notebook Science and its role in making the scientific process better through transparency. This is an extension of a statement I made a year ago on the importance of replacing trust with proof.

There are no facts in science - only measurement embedded within assumptions.

There are properties that have been determined so many times by different researchers and different techniques that we can treat a narrow range of values by consensus as if they were absolute facts. An example would be considering the boiling point of methanol at 1 atm to be 65C within one degree of accuracy. For most purposes that will suffice, as long as we understand the source of our confidence.

The problem arises when we treat rarely measured properties as facts simply because they are printed in peer-reviewed articles or tables in books. We teach our students not to trust numbers in Wikipedia but have no problem if they can cite a reference in a peer-reviewed journal, even without thoroughly analyzing the experimental sections.

We delude ourselves into thinking that we can appreciate our uncertainty of the value of a property simply by taking multiple measurements, taking an average and reporting standard deviation. That is actually a useful thing to do if we remember that we are measuring random errors and completely ignoring systematic errors, which are possibly very common in infrequently measured properties.

What is the solubility of 4-chlorobenzaldehyde in chloroform? UsefulChem experiment EXP208 reports it to be 0.07 molar. It was measured only once but I think duplicate runs would have come out pretty close to that. It might have slipped under the radar if it had not been measured in parallel with other chemically similar aromatic aldehydes with values all much greater than 1 molar. It just didn't make sense so we looked at the conditions reported in the experiment and the boiling points of all the compounds - this one had the lowest value (214 C at 1 atm). The pressure had not been recorded during the course of the experiment but when empty the Speed-Vac could go as low as 0.1 Torr, which would reduce the boiling point close to room temperature.

The next most volatile compound in this group was 2,6-dichlorobenzaldehyde. It was calculated by ChemSpider to be 239C at 1 atm, which is reasonable based on the 4-chloro analog. But here's an interesting twist - the reported boiling point is 165C on this MSDS sheet. It should be simple enough to see if that is an error by clicking through to the lab notebook page that generated that MSDS sheet... oh wait... MSDS sheets don't require proof, just this handy disclaimer: "We have not verified this information, and cannot guarantee that it is up-to-date." It also looks mighty trustworthy: "the page is maintained by the Safety Officer in Physical Chemistry at Oxford University". I'm not knocking Oxford - this is standard practice for the flow of chemical information in the current culture.

The bottom line is that 2,6-dichlorobenzaldehyde didn't evaporate off - we get a value of 3.4 M in chloroform. Now is it possible that some of it evaporated under the conditions of that experiment? Maybe but it my call that we're going to use that number for now as a good enough approximation for our model. It is possible that your application might have a different requirement. At least you have the information available in the Open Lab Notebook to make the call.

The solubility of 4-chlorobenzaldehyde in chloroform was measured again, this time monitoring the pressure and minimizing time on the Speed-Vac. The pressure varied over the course of the evaporation, making it impossible to neatly summarize in the experimental section of a paper. The measurement was done in duplicate in EXP209 and comes out at 3.61 molar with a standard deviation of 0.02. That isn't a fact but a good enough number under these circumstances to pretend it is and use it for our model. We'll see how it plays out when we have different researchers and use different techniques.

Friday, October 17, 2008

Journal of Cheminformatics

Set to come out within the next few months, the Journal of Cheminformatics has it all for the Open Access hungry mob:

Dr David J Wild, Assistant Professor of Informatics at Indiana University, USA

Chemistry Central announces the imminent launch of the Journal of Cheminformatics.

The Journal of Cheminformatics is devoted to the dissemination of new and original knowledge in all branches of cheminformatics and molecular modelling including:

  • chemical information systems, software and databases, and molecular modelling
  • chemical structure representations and their use in structure, substructure, and similarity searching of chemical substance and chemical reaction databases
  • computer and molecular graphics, computer-aided molecular design, expert systems, QSAR, and data mining techniques

Chemistry Central publishes peer-reviewed open access research in chemistry.

All research articles published by Chemistry Central are made freely and permanently accessible online immediately upon publication to ensure effective communication of research findings.

High standards are maintained through full and stringent peer review.

Authors who publish original research in Chemistry Central retain copyright over their work.

Chemistry Central is an independent publishing service operated by BioMed Central - the leading life science open access publisher.

Sceptical Chymist Reactions Interviews

Neil Withers just posted my interview on Nature's Sceptical Chymist blog. It was a fun one to do because the questions are atypical of what I normally get asked. It came at a good time because he asked both Cameron and myself about the last time we did some labwork. About a month ago we worked together in his lab in Southampton.

Also a great opportunity to get some plugs in for the Open Notebook Science Challenge :)

Wednesday, October 08, 2008

Open Notebook Science on Wikipedia

Andy Lang re-created the Open Notebook Science page on Wikipedia a few days ago. Last time we tried over a year ago the page got quickly deleted as being a neologism and got re-routed to the Open Data page.

The page initially got marked for deletion again but this time strong support from the FriendFeed crowd saved us. We still have to work it a bit but I think it should stay. Many thanks to Cameron Neylon, Michael Nielsen, Richard Akerman, Deepak Singh, Bill Hooker, Neil Saunders, Daniel Mietchen and others.

Tuesday, October 07, 2008

Web 2.0 in Science: Success or Failure?

Timo Hannay recently gave a talk "Scientific Researchers and Web 2.0: Social Not Working?", which is reproduced in this Nascent blog post.

This is a sobering review of the state of social software in science and he lists several roadblocks to its widespread adoption. It is important to counterbalance the almost unavoidable hype that emerges from the enthusiasm of those energized by a movement.

However, it can be a tricky endeavor to attempt to define success or failure, especially within systems that are evolving rapidly.

Are you a failure if you only get 10% of your proposals funded? What about a telemarketer who has a 95% failure rate of making a sale from dialing the phone? Are you a failure if you send a paper to Nature and get turned down 90% of the time?

The way I see it, Web2.0 technologies are just communication vehicles and should be measured using similar metrics to the telephone, email, lunch meetings, conferences, talking to somebody during a flight, etc.

You don't decide to use a telephone based on an analysis of the number of people on the other side of the line - you use it when you need to communicate. And sometimes that communication may be intended primarily for your future self. I absolutely agree with Ben Good that you should blog even if there is a chance that nobody will read it:
Well, as one graduate student that continues to blog even though only 2 or 3 people read most of my posts (namely my Dad and occasionally, if she is bored, my wife), I feel compelled to say that yes, some people and even some scientists will continue to blog even if no one ends up listening at all.

For me, keeping a blog is a very convenient way to write-up my annual reports and keep track of my progress. But, as a bonus, if others read it and give me feedback or collaborate, all the better. And this is where the sweetest part of the icing is found. As Rich Apodaca mentions, it comes down to jobs, funding and collaborations.

I have personally experienced very good examples of that. My recent trip to the UK (generously funded by Cameron Neylon) would have never happened without my active participation with Web2.0 tools. The same goes for the last paper (submitted to JoVE - Precedings version here) and proposal that I submitted.

The best reason for blogging is self-interest.

Thursday, October 02, 2008

NISO meeting on Open Research Data Standards

I spent the day in Baltimore yesterday at the National Information Standards Organization. We discussed the role of standards in Research Data, with a large focus on Open Data (see meeting blog). The FriendFeed discussion is here. A publication will result from this and I'll link to it when available.

A lot of the discussion revolved around the citation of datasets. My own view (and something that Cameron Neylon champions as well) is that a good way to encourage sharing of data is to make saving datasets convenient and part of the researcher's workflow. I recommended 4 simple options:
  1. use the open JCAMP-DX format for XY datasets (e.g. spectra) and Robert Lancashire's JSpecView for easy manipulation in a browser
  2. use GoogleDocs
  3. use Google DataSets
  4. do Open Notebook Science using your favorite tool (we use Wikispaces)
Then, following the Southampton Resolution on Open Science, after your paper comes out share these datasets.

