Monday, April 04, 2011

ACS and ACRL presentations on web services and trust in science

Update: the recording of my ACS talk on Rapid Dissemination of Chemical Information for people and machines using Open Notebook Science is now available here.

On March 30 and 31, 2011 I presented two related talks - the first remotely for the American Chemical Society (ACS) Meeting and the second in Philadelphia at the meeting for the Association of College and Research Libraries (ACRL).

In the ACS talk "Rapid Dissemination of Chemical Information for people and machines using Open Notebook Science", I spoke for the first time in detail about the results of the open modeling Andrew Lang and I carried out on the open dataset of melting points we collected starting with the Alfa Aesar dataset recently made public.

We used Skype and Google Presenter with the help of Peter Murray-Rust on site at the conference and it went fairly well I think. Henry Rzepa had a good question about polymorphism possibly being responsible for different melting points from various sources. I don't think that is the problem in most of these cases but we can certainly spend some time investigating the reports of polymorphism for cases where the information is available. One of the big problems is that we don't know the history of the sample used for a melting point from most sources like chemical vendor sites. At least in journal articles we might be told which solvent was used to crystallize the sample. If multiple sources agree on a certain melting point and there is one outlier, I think it is reasonable to assume that the common melting point is likely to correspond to the thermodynamically favored polymorph. This might not be correct in all cases but - without the means to discover more information about the sample histories - I think it makes sense to proceed in this way. Since we don't consider polymorphism in our modeling, there is an implicit assumption that - in the case of polymorphism - we are dealing with the thermodynamically most stable form.

My ACRL talk "Is there a role for Trust in Science?" focused more on the Chemical Information Validation study and outcomes. There were several good questions at the end. One particularly good comment addressed my speculation that within a few years, the open models in most of the useful chemical spaces will be sufficiently good that it will be as easy to Google a melting point or a solubility as it is now to get driving directions. The question was: weren't we just replacing trust from one information source to another, namely these models. I don't think the concept of trust applies in these cases because the training sets, the descriptors and the performance of the models are (and will be) open. This is in sharp contrast with most commercial software generating predictions for solubility and melting points - these are generally black boxes because either the training set, the model or the descriptors are not open.

Creative Commons Attribution Share-Alike 2.5 License