Tuesday, October 26, 2010

Elizabeth Brown's guest lecture for ChemInfo Retrieval

Elizabeth Brown from the Binghamton University Libraries presented on "Web 0.0/1.0/2.0/3.0 and Chemical Information" on October 21, 2010 as a guest lecturer for my fifth class on Chemical Information Retrieval this term.

Beth made an interesting analogy with art to illustrate the differences between these communication platforms. What stuck me during her presentation was the similarity between the current state of the semantic web (Web3.0) and the state of computerized searching in chemistry when I was a graduate student in the early 90s. I had the chance to do just one substructure search at the time and it had to be done through an expert librarian. The search had to be carefully planned because of the expense.

From what I recall, the perception at the time was that computerized searching was impractical and perhaps even unnecessary. After all "the way" to search for chemical information was to spend a weekend in the library systematically going through Chemical Abstracts books. This had "worked" for long time for the chemistry community and doing things differently was considered superfluous and even wasteful.

Today, "the way" to search for chemical information is to use expensive databases to perform a targeted search and extract the information from mainly toll access peer reviewed journals. If you are off the academic grid you have to rely on free information on the web which is rarely associated with a chain of provenance. As my students will attest, even with access to the best tools, it still takes a lot of time to find the information and compare it for consistency.

The unfamiliarity with computerized searching in the early 90s is now the common attitude towards the semantic web. This is understandable because its availability is limited and people don't understand what to do with the tools that are available. Web services that we provide (derived from other services from ChemSpider and other sources) are usable by anyone who can open up a Google Spreadsheet and copy and paste a URL but it will take time for people to understand how to incorporate these into their workflows.

I think that in 10 years the semantic web will simply be part of the infrastructure. Currently, when you type a word in a browser or word processing software the system knows enough to alert you to possible misspelling by underlining a word. Similarly, in the near future, typing or specifying in some way a chemical compound will automatically pull in all relevant measured or calculated properties and provide suggestions in the context of a chemical reaction under consideration. Access to information will be free and unencumbered and the chain of provenance will be clear.

Instead of taking hours to find and process property data for single compounds, I think students in the future will be handling large libraries of compounds, looking for the best synthetic targets for their applications.

Friday, October 15, 2010

Dynamic links to private tagged Mendeley collections

My close collaborators and I have been using Mendeley as a convenient way to share PDFs of journal articles. Not all of us have access to the same libraries so links are not enough - we need the full documents. We also use Dropbox as a redundancy but Mendeley allows tagging and recording notes, which is very handy for everyone in the group.

Now that Mendeley is providing an API, Andrew Lang has written code that significantly leverages the information in our private ONS collection. We can now create public links that return the most updated results for specific tags, including multiple tags (which I don't think you can do on Mendeley). For example the following link returns all articles in the ONS collection tagged with "science2.0" and "chemistry":
The results include available information from Mendeley, including the title, authors, journal citation, doi, url, tags and the abstract. Because this information is public the PDFs can't be provided but the hyperlinks make it as convenient as possible.

At the end of the report the full list of all available tags for the ONS collection is provided. A more refined or different search can be done immediately simply by checking boxes and hitting the submit button.Because the tags are controlled by the users of the private collection, these links can be useful when discussing an ongoing project and referring to a very specific topic. For example, we have been collecting examples of articles where a Ugi reaction is carried out and the product precipitates. This link provides an updated report on that very narrow topic:
There are still 2 major limitations to this service:

1) The search is very slow (can take a minute or two) because there is no way currently to use the Mendeley API to selectively return results based on tags. Every search requires initially returning all results for the collection (currently a few hundred).

2) Notes are currently not returned. If the API is updated to include these the usefulness would increase dramatically. For example in the results for the above query I took notes of the conditions involved in the Ugi precipitate for each paper. With the current format, one has to read each paper to find the relevant information.

Progress on our Mendeley related services will be posted on the ONSwebservices wiki.

Labels: , , ,

Thursday, October 07, 2010

Drexel Chemistry Mini-Symposium on Bradley Lab

Every year the chemistry department at Drexel gives faculty the opportunity to present their research to incoming students in 10 minutes slots. On September 30, 2010 I presented on "Open Notebook Science for Malaria Drug Discovery and Solubility Modeling". I think such a short format is good for keeping student attention. Recording it also provides a handy link to use for other purposes. Most people just don't have time for 30-60 minute presentations.

Labels: , , ,

The Meaning of Data panel at a class on the Rhetoric of Science

Update: Lawrence Souder provided the audio for the presentation and panel questions.

On October 6, 2010 I had the pleasure to participate in a panel discussion at Lawrence Souder's class on the Rhetoric of Science. I gave a brief presentation on "The Meaning of Data" to kick off the discussion. I provided a scientist's perspective of data - with the basic idea that at best data are evidence. Data cannot be treated as irrefutable facts since there is always some uncertainty and many assumptions must be made in their interpretation. I also argued that, although uncertainty cannot be eliminated completely, transparency goes a long way to reducing it.

The students in the class did not have a science background and I think it was informative for them to explore a different perspective from their other exposure to science through popular media or even textbooks. The term "fact" is thrown around a lot in these information sources to simplify but it doesn't reflect how scientists think about data. We discussed how unbelievably wrong the scientific details in movies and TV shows such as NCIS and House can be. Nevertheless, one student pointed out that these shows can be effective in attracting people to science - as long as it was understood that the details were probably incorrect.

Because of market forces we recognized that science portrayed in the popular media would have a strong tendency to be exaggerated or oversimplified resulting in the phenomenon of "hype". A related effect can distort the way scientists communicate with each other, where there is a perception that exposing ambivalence in the form of seemingly contradictory data may not be in the researcher's best interest. This is a strong deterrent to the general adoption of transparency.

We explored the possible evolution of scientific communication as new tools such as blogs and wikis become increasingly used by scientists. Of course the issue of claims of priority was brought up and I discussed how this issue was handled in my own research work.

We debated whether these new forms of communication would alter the language used in scientific communication - even in traditional journals. I think that with the advent of the semantic web many researchers will start to write in a way that is understandable to humans as well as machines, their new target audience. One student remarked that the new generation coming up is very used to texting and that type of succinct communication is certainly in line with machine readability.

The assigned reading for the class involved a study of the language used by the Nobel laureates who discovered the buckyball to describe their research:
"In Praise of Carbon, In Praise of Science - The Epideictic Rhetoric of the 1996 Nobel Lectures in Chemistry by Christian Casper". This paper demonstrates that both personality and the perceived target audience can dramatically affect the language and focus that scientists use to explain their research.

Wednesday, October 06, 2010

Cheminfo Retrieval Classes 1 and 2 in 2010

My first Chemical Information Retrieval class for the Fall of 2010 took place on Sept 23, 2010. This is the second time that I've taught the class as sole instructor and it was certainly convenient to have last year's wiki to build upon. The assignments are the same so it was helpful to be able to give students access to what students did last year as examples.

The key message from my introductory lecture was that it can be really difficult to find usable chemical information and that there are no shortcuts like relying on a true trusted source - those don't exist. I showed a few examples of emerging models - Open Access, Open Notebook Science, Collaborative Competition (like pharma companies sharing some drug data openly) and other Open Science initiatives.

I also announced that we would be doing something new in the Science3.0 theme (the semantic web). One of the assignments involves collecting 5 values from the literature for each of 5 properties for a compound of the student's choice. In addition to adding these values on the wiki, we will collect them in a format that is friendly to machines: a ChemInfo Validation Google Spreadsheet. Andrew Lang has agreed to help with adapting our previous code for solubility to creating web services for this application. For example, we can have a service that reports the mean and standard deviation for a particular property and chemical. Another could produce statistics for a given data source or compare peer reviewed vs non peer reviewed sources, etc. Since it will be possible to to call these web services from within a Google Spreadsheet or Excel it should enable much more sophisticated analysis of the data related to the "validity" of chemical information as it exists today.

I didn't record the first lecture but I have the slides below:
During the second lecture on September 30, 2010 I spent most of the time showing students how to use Beilstein Crossfire, SciFinder and ChemSpider to find values for chemical properties. The recording for the second lecture is available below:

Labels: ,

Creative Commons Attribution Share-Alike 2.5 License