Monday, October 29, 2007

Drug Design on the Open Web

A few months ago I started working with Mesa Analytics and Computing as a consultant on an SBIR project aiming to provide new tools for drug design. Although the business model component is confidential, a large part of the project involves the use and creation of freely available online tools for educational or other purposes.

This is a great example of a for-profit company aiming to provide significant value in the form of free services for the chemistry and biology communities. ChemSpider is another example.

What we would like to do is provide an intuitive interface for someone to perform some QSAR and docking work.

Mitch Chapman has provided a detailed description of a test dataset we'll use for the QSAR example. The advantage of using this source is that Rajarshi Guha has already created a publicly available service that we can use for performance comparison.

We are updating a "working scenario" to think about how this could all work and identify which pieces are missing. Hopefully we can put together a prototype for Phase I and find the right partners to get something robust constructed during Phase II.

This is all taking place on the UsefulChem wiki and we welcome contributions and suggestions from everyone.

Labels: , ,

Wednesday, October 24, 2007

Back from ASIST Open Science Panel

Yesterday (Oct 23, 2007) I participated in a panel on Open Science and Science Blogging at the ASIST conference in Milwaukee.

The full three presentations are available here: streaming Flash. (63:44)

Following a brief introduction by Phil Edwards,

  • Bora Zivkovic kicks off with his presentation "The Many Flavors of Science Blogging" ppt followed by mine
  • Jean-Claude Bradley "UsefulChem: An Open Notebook Science Project" ppt (starts at 18:35)
  • then Janet Stemwedel's "Social and Scientific Implications of Scientific Blogging" ppt (starts at 41:30).
Bora provided an overview of science blogging with plenty of good examples. I provided details of how we use blogs and wikis to do drug development and collaborate openly with the UsefulChem project. Janet wrapped up touching on the importance of blogs that chronicle the experiences of researchers and how the system works.

A few people took some really good notes:
Christina Pikas
Ken Varnum
Stephanie Willen Brown

This conference was a great opportunity to get together with Bora, Janet and Christina Pikas over sausages and sauerkraut. Hopefully we'll meet up again at Bora's NC Science Blogging conference in January.

The half-hour question session was also recorded and I'll provide a link when it is available.

Labels: , ,

Sunday, October 21, 2007

JSpecView Java Fix

Two months ago, I reported a serious problem with the new Java update and JSpecView.

Although we could band-aid the situation by uninstalling the latest Java update, most people didn't know that and would have reached dead links when trying to open our NMR files via a browser.

Robert Lancashire has now provided a fix and UsefulChem spectra are now back up. ChemSpider spectra also depend on JSpecView and are now available as well.

(taken from EXP132)

Labels: ,

Open Science at ASIST in Milwaukee

Well I'm off to Milwaukee tomorrow for the ASIST conference.

I am participating in a panel on Open Science on Tuesday October 22 at 8:30. Anyone can add questions for us to answer on the wiki. Here is an abstract with a few questions already posted:

Opening Science to All: Implications of Blogs and Wikis for Social and Scholarly Scientific Communication (SIG STI, SIG BWP)
Bora Zivkovic, Jean-Claude Bradley, Janet Stemwedel, Phillip Edwards and K.T. Vaughan

A growing number of scientists are turning to Web2.0 communication tools such as blogs and wikis to provide open channels for their social and scholarly discourse. Because of these tools, scientists are increasingly able to share data, results, and analysis of research (scholarly communication) with distant, and sometimes unknown peers, and are also able to enter the realm of scientific commentary (social communication) with the general public. While many science bloggers focus on purely social commentary on science, others include conference announcements and reports, book reviews, brief discussion of “failed” experiments, and non-publishable research findings. Within this environment there is a strong awareness that readers include – and may preferentially be – non-scientists, perhaps even nonspecialist skeptics about established theories. This session is not only concerned with presenting a state of the blog for science communication, but also with thinking about the impact of “plain English” science writing on both society and on science.

Question: Is it really a good thing to let anyone who thinks they have a scientific breakthrough have access to free, open, public, Googleable media?
Question: What if I make a mistake in my data, never fix it, no one catches it, and then someone dies because a medical decision was based on my "findings"? Isn't this exactly why we have formal peer review in formal publications?
Question: Who is the audience for science blogs and wikis anyway? Scientists or laypeople?
Question: Can you get published if you've already posted your results to your blog/wiki?

I hope to see some of you there!

Labels: , , , ,

Wednesday, October 17, 2007

BioMed Search

I just came across BioMed Search via Attila's Pimm blog.

What it does is simple but very effective - it finds images from a few bio sources like BioMed Central. So for quickly locating visually-rich information, like examples of inhibitors docking to enoyl reductase, BioMed Search is a no-brainer.

Tuesday, October 16, 2007

Science Bloggers SFLO Session

Berci led yesterday's (Oct 15, 2007) SciFoo Lives On session on Science Bloggers.

Sandra Porter, Bora Zivkovic and I (UsefulChem) discussed why and how we blog about science.
Bora really had a lot of problems with his video card and I was impressed that he persisted to deliver a great presentation and participate. I've been there so I know how frustrating that can be. Unfortunately a lot of people assume that is the typical Second Life experience and give up.

Berci live blogged the event here and did a tremendous job in representing the flow of the session. The transcript and other links are here.

Labels: , , ,

Wednesday, October 10, 2007

ONS Friendly Labs

Following up on some discussions with Jeremiah Faith and Cameron Neylon, I thought it might be useful to set up a place on the NodalPoint wiki to list PI policies on Open Notebook Science (and other forms of Open Science more broadly) in their labs.

Cameron quickly followed up with a Dabble application to do something similar.

I think there is a need for such "match-up services". Students and postdocs who feel strongly about sharing their research openly would certainly prefer working in laboratories where that type of thing is tolerated, if not outrightly encouraged or required.

Similarly, PI's who feel strongly about Open Science would be thrilled to find students and postdocs sharing their mindset.

Check out (and contribute) to the Nodalpoint and Dabble initiatives if you can.

Labels: , , , ,

Friday, October 05, 2007

Cameron Neylon ONS Talk at Drexel

I am very pleased to announce a talk by Cameron Neylon at Drexel next month:

A Beginner’s Guide to Open Science
(not for beginners but by beginners)

2:00 Friday November 2, 2007
Disque 109, Drexel University
32nd and Chestnut streets, Philadelphia, PA

Cameron Neylon, STFC Rutherford Appleton Laboratory and School of Chemistry, University of Southampton

The modern biochemistry or molecular biology laboratory generates large quantities of data that are generally stored across multiple computers attached to multiple instruments. Much of this data is never published and the majority languishes on old computers and is ultimately lost. At a local level this is a frustration for investigators who will often struggle to obtain specific pieces of data produced in their own laboratory. On a larger scale this is becoming a much more serious issue with the obligation of researchers to funding bodies to both preserve research data and make it available to other users increasingly becoming a formal a condition of publicly funded grants. Systems are required that can capture and preserve data along with sufficient information and metadata to make it possible for others to use this data.

In parallel with this a movement is growing within the research community that advocates greater openness in providing both the raw data from published studies as well as making available the large quantities of data that are never published. The logical extreme of this approach is Open Notebook Science [1], pioneered at Drexel University [2], where the researcher’s laboratory notebook is made available on the internet as it is recorded. Achieving the aims of Open Notebook Science also requires systems which can capture data and provide it in a useful format. In addition these systems must make the data visible to relevant online searches.

We are developing and using an electronic laboratory notebook based on a Blog format to capture experimental data in a biochemistry laboratory [3,4]. Within the system each sample is recorded in a single post. Analysis and manipulations of the sample are recorded in separate posts with links back to the input sample and forward to any products. All the information is made immediately available on the Web as it is recorded. The Blog engine has been specially built in house and has a number of features designed to enable and encourage the effective capture of data and metadata in the environment of a biochemistry laboratory. I will describe the Blog system and our evolving approach to capturing metadata as well as the process of integrating this with other web services to provide an open environment for recording work in the laboratory, laboratory materials, and validated procedures. The challenges and problems encountered in reconciling the twin aims of capturing data and making it available and readable will also be discussed along with the similarities and differences emerging between different approaches to Open Notebook Science [2,5,6].



Thursday, October 04, 2007

Science is About Mistrust

I have mixed feelings about the proliferation of the term "Open Notebook Science".

I started using the term a year ago to describe our UsefulChem project because it had no hits on Google and so it offered an opportunity to start with a fresh definition. There are currently over 43 000 hits for that term and it is nice to see that the first hit is still the post with the original definition.

The first part of the term, "Open Notebook", is meant to be taken literally. It refers to the ultimate information source used by a researcher to record their work. The fundamental philosophy of ONS is that of "no insider information". That means linking to all raw data associated with experiments and making available all experiments relating to a project under discussion. This includes failed and aborted experiments. (see recent conference on ONS)

I think that if ONS is practiced in this way it is potentially better than our current system of peer-reviewed article publication.

Here is why:

A key aspect of the scientific revolution a few centuries ago was moving from trust in an authority to mistrust of everything and everybody.

Galileo wrote (as reproduced on p. 25 of John Gribbin's "The Fellowship"):

It appears to me that they who in proof of any assertion rely simply on the weight of authority, without adducing any argument [that is, experimental evidence] in support of it, act very absurdly.

In principle, articles reporting on experimental science are supposed to contain enough information for a reasonably competent peer to repeat. Speaking from experience in my own field of organic chemistry, experimental sections are often highly condensed. When space was limited in paper journals this may have made some sense. But now with electronic storage being cheap that is not an issue (at least in organic chemistry).

Of course, journals now usually have a supplementary section available online to address this. But the past few instances that I have tried to debug a reaction using this resource I have found it insufficiently detailed.

What I really needed was access to the researcher's lab notebook and all associated files to follow specific instances of reactions, not abstracted general procedures. That way I can see what the researcher did and did not do without making so many assumptions. For example: Were the starting materials checked for purity? How exactly was the reaction monitored and what do those spectra or TLC images look like? Was there any solvent left in the product when it was weighed that might account for those impressive yields?

A major flaw in the current scientific publication system is that there is still too much trust. Readers are expected to trust editors to choose appropriate anonymous peers to review submissions. Reviewers trust primary authors when reporting the summarizing of their research results. Primary authors trust their collaborators, students and postdocs to give them accurate information when writing papers.

If we make the laboratory notebook and all associated raw data public we can significantly reduce the amount of trust required to keep this house of cards standing.

The main problem is not so much that people will completely fabricate data, although this does happen. It is more that mistakes get made and corners are cut to get the paper out the door under pressure. And once these errors are in print it is very difficult to get people to correct them, if they are ever discovered.

As a researcher I don't even trust myself. And I shouldn't. Students who haven't yet mastered the discipline required to keep a good detailed timed lab notebook log as they execute and observe will likely be humbled quickly when they realize how poor human memory reveals itself to be for these tasks.

Time-stamped video and digital photographs can go a long way to reduce the burden on the researcher to record details of their experimental set-up and observation of the reaction over time. This has proved very useful in the past in my group and I would like to see even more systematic use.

There is currently tremendous skepticism associated with publishing scientific results on the web using social software. If people start citing blog posts that do not link to primary raw data for support, in a manner implying that there is strong support, then it is going to be difficult to do good science with social software. There is already plenty of that type of thing going on in the peer-reviewed literature, with one article citing another citing another citing "unpublished results".

Now there is nothing wrong with discussing some interesting aspect of ongoing research without linking to primary raw data, but that is not Open Notebook Science in the sense that I have been using it. These could be called "teaser posts" and might be useful for finding collaborators and initiating discussions but they cannot be used as an alternative to traditional peer-reviewed literature.

I know that there are many who think that peer review is needed to legitimize scientific blog posts. I think that the ability to comment on posts is useful to continue discussions with the community. But expecting any kind review system to completely validate research published using any vehicle is not realistic.

The only people truly qualified to judge a piece of research are those who have actually looked at the raw data to see if everything adds up and that takes time, assuming they have access to it. It is unlikely than anyone will do that without being properly motivated - generally only other researchers trying to reproduce the experiment for their own purposes will have a good reason to invest the time. Now comments from these individuals would be valuable but that applies to a very small proportion of all recently published work.

And going forward, I see Open Notebook Science as being a natural way for the scientific process to become more automated, with machines reporting executed protocols on the open web. We have a ways to go to reach that point but I think this is a more likely scenario than expecting machines to learn how to write human-reviewed articles.

The point of science is generating actionable information - at least in a field like synthetic organic chemistry. We have a great opportunity to use these new web tools to do this so much more efficiently than ever before.

(By the way, we still intend to publish our work through conventional channels. Not so much to communicate new information to the community but for all the other reasons that researchers are under pressure to publish.)


Tuesday, October 02, 2007

3D Periodic Table in Second Life

Further adding to the set of chemistry tools in Second Life, Hiro Sheridan has created a 3D periodic table with rotating atoms. Although not directly proportional, the relative sizes of the spheres are in the correct order. Clicking on them provides basic information about the corresponding element.

The 3D periodic table is available on the Chemistry Corner on Drexel Island (SLURL).

Labels: ,

Wired Article on Dark Data

Tom Goetz wrote a thoughtful article "It's Time to Free the Dark Data of Failed Scientific Experiments" in Wired this week.

So what happens to all the research that doesn't yield a dramatic outcome —or, worse, the opposite of what researchers had hoped? It ends up stuffed in some lab drawer. The result is a vast body of squandered knowledge that represents a waste of resources and a drag on scientific progress. This information — call it dark data — must be set free.


There are some islands of innovation. Since 2002, the Journal of Negative Results in Biomedicine has offered a peer-reviewed home to results that go negative or against the grain. Earlier this year, the journal Nature started Nature Precedings, a Web-based forum for prepublication research and unpublished manuscripts in biomedicine, chemistry, and the earth sciences. At Drexel University, chemist Jean-Claude Bradley practices "open notebook" science — chronicling his lab's work and sharing data via blog and wiki. And PLoS is planning an open repository for research and data that is other wise abandoned.

The main focus of the article is on results that don't make it to an article because they are not interesting enough. "Failed Experiments" in this sense are those that do not uncover a hoped for correlation or, in synthetic organic chemistry, those where the desired product is not obtained.

However, there are many more shades of Dark Data. One large category often downplayed consists of experiments aborted because of mistakes and accidents. For example in EXP096, the product was spilled and lost. But all of the spectra and data collected up to that point are still perfectly usable for someone wanting to repeat this or a similar experiment. That is the reason researchers don't tear out pages from their lab notebooks when accidents happen. The same logic applies to Open Notebook Science, where the audience extends to the whole world.

Thanks to Attila for posting an early report.

Labels: ,

Creative Commons Attribution Share-Alike 2.5 License