Monday, December 20, 2010

Visualizing Social Networks in Open Notebooks

Increasing the role of automation in the scientific process has long been a fundamental objective of Open Notebook Science. The automatic discovery of new connections in open scientific work is potentially a very important contribution to this end.

Visualizing social networks within and between Open Notebooks is certainly a good first step. Luckily, our Reaction Attempts project has already abstracted the key elements of organic chemical reactions within a collection of Open Notebooks. This means that creating connection maps between people and chemicals can be attempted with reliable and semantically unambiguous database sources.

The Reaction Attempts database records the identity of reactants and products as ChemSpiderIDs for each reaction within a collection of notebooks. Also the name of the researcher, the solvent, the yield (when available) and a few more key identifiers are recorded.

We are very fortunate that Don Pellegrino, an IST student at Drexel, has selected the analysis of networks within Open Notebooks as part of his Ph.D. work. He has started to report his progress on our wiki and is eager to receive any feedback as the work progresses (his FriendFeed account is donpellegrino).

Don's first report is available here. He is using the Open Source software Gephi for visualization and has provided all of the data and code on the associated wiki page. (also see Tony Hirst's description of mapping ONS work which provided some very useful insights) Don has provided a detailed report of his findings but I think the most important can be seen in the global plot below.
This represents a map connecting people through chemicals. The large top right structure represents the connections within the UsefulChem project and the main circle represents the activity of graduate student Khalid Mirza who was the most active on this project. The crescent structure to the right of the circle represents other students - mainly undergraduates - who worked with the same chemicals as Khalid.

At the top left there are 3 isolated small networks, representing completely separate projects: the sodium hydride (NaH) oxidation study, Dustin Sprouse and Sebastian Petrik. I'll be posting about Sebastian's work in a future post.

Near the bottom middle there is another small network connected to the main group by a single link mediated by 2,2-dimethoxyethylamine.
This represents the overlap between Open Notebooks (Wolfle from Todd group and Mirza from Bradley group) that I mentioned previously.

I think that automatically discovering such connections as they occur could be a really useful outcome of this network analysis work. For example, the researchers could be alerted by email that a new potentially interesting overlap between their projects now exists. This could accelerate new collaborations.

A key challenge in Don's work is to figure out the right questions so that the results will be genuinely useful and novel to the researchers involved and the research community. I'm optimistic that he will succeed. As a separate outcome, just learning how researchers collaborate and record their work over time is bound to be interesting.

For a description of Don's planned work over the next several months take a look at his full Thesis Proposal: "Proposal of a System and Methods for Integrating Literature and Data".

Labels: , ,

Monday, December 13, 2010

Mirza PhD defense on the Ugi reaction for anti-malarial screening

My student Khalid Baig Mirza defended his Ph.D. thesis at Drexel University on December 6, 2010. It was nice to see all the pieces come together - even though there are still a few intriguing puzzles left to be fully resolved. Like most research projects, every answer generates several interesting additional questions and Khalid did a good job of showing what these key issues are for his work.

In his presentation, Khalid first discusses Open Notebook Science and his contribution to the sodium hydride oxidation controversy. Then he describes the UsefulChem project, involving the use of the Ugi reaction as an approach to synthesizing new anti-malarial agents, including a few unexpected side reactions and challenges. Finally he presents an overview of the ONS Solubility Challenge and its application to organic synthesis.



Sunday, December 05, 2010

Dana Vanderwall on Cheminformatics at Drexel

Dana Vanderwall, Associate Director of Cheminformatics at Bristol-Myers Squibb, presented for my last Chemical Information Retrieval class on December 2, 2010.

The first part covered "Cheminformatics & The evolving relationship between data in the public domain & pharma" and included a general discussion of modern drug discovery and the details of a malaria dataset recently released from the pharmaceutical industry to the public.
The second part described a project based on "Molecular Clinical Safety Intelligence", where tracking side effects from approved drugs can help in the design of new drugs.
It was a very nice way to close out the course, showing very practical applications of the concepts we covered over the term. The recording is available below.

Labels: , , ,

Creative Commons Attribution Share-Alike 2.5 License