Visualizing social networks within and between Open Notebooks is certainly a good first step. Luckily, our Reaction Attempts project has already abstracted the key elements of organic chemical reactions within a collection of Open Notebooks. This means that creating connection maps between people and chemicals can be attempted with reliable and semantically unambiguous database sources.
The Reaction Attempts database records the identity of reactants and products as ChemSpiderIDs for each reaction within a collection of notebooks. Also the name of the researcher, the solvent, the yield (when available) and a few more key identifiers are recorded.
We are very fortunate that Don Pellegrino, an IST student at Drexel, has selected the analysis of networks within Open Notebooks as part of his Ph.D. work. He has started to report his progress on our wiki and is eager to receive any feedback as the work progresses (his FriendFeed account is donpellegrino).
Don's first report is available here. He is using the Open Source software Gephi for visualization and has provided all of the data and code on the associated wiki page. (also see Tony Hirst's description of mapping ONS work which provided some very useful insights) Don has provided a detailed report of his findings but I think the most important can be seen in the global plot below.
This represents a map connecting people through chemicals. The large top right structure represents the connections within the UsefulChem project and the main circle represents the activity of graduate student Khalid Mirza who was the most active on this project. The crescent structure to the right of the circle represents other students - mainly undergraduates - who worked with the same chemicals as Khalid.
At the top left there are 3 isolated small networks, representing completely separate projects: the sodium hydride (NaH) oxidation study, Dustin Sprouse and Sebastian Petrik. I'll be posting about Sebastian's work in a future post.
Near the bottom middle there is another small network connected to the main group by a single link mediated by 2,2-dimethoxyethylamine.
This represents the overlap between Open Notebooks (Wolfle from Todd group and Mirza from Bradley group) that I mentioned previously. I think that automatically discovering such connections as they occur could be a really useful outcome of this network analysis work. For example, the researchers could be alerted by email that a new potentially interesting overlap between their projects now exists. This could accelerate new collaborations.
A key challenge in Don's work is to figure out the right questions so that the results will be genuinely useful and novel to the researchers involved and the research community. I'm optimistic that he will succeed. As a separate outcome, just learning how researchers collaborate and record their work over time is bound to be interesting.
For a description of Don's planned work over the next several months take a look at his full Thesis Proposal: "Proposal of a System and Methods for Integrating Literature and Data".
It might also be interesting to compare this sort of network analysis with a co-author and citation network analyses?
ReplyDeleteI'm not sure what you mean by this Tony - how would you define "co-author" and "citation" within the context of ONS?
ReplyDeleteDefining co-authorship and citation within the ONS context is an interesting problem. There may be a few ways to operationalize the data for these concepts. The cases where multiple researchers are listed in the notebook entry could be analogous to co-authorship in literature. Citation may be a harder analogy to establish but perhaps compounds could be treated as references and the use of a compound in a subsequent reaction could be used as a citation.
ReplyDelete