Monday, May 25, 2009

ONS talk at AI conference in July 2009

I've been invited to talk at the IJCAI'09 Workshop on Abductive & Inductive Knowledge Development in Pasadena on July 12, 2009. This will be a great opportunity to focus what might become possible on the machine side of Open Notebook Science.

Although Ross King (of Robot Scientist fame) won't be there in person, his collaborator Oliver Ray will be giving a talk on their project. That should be quite interesting.

My abstract:
The Role of Openness in Scientific Automation: a case for Open Notebook Science

The use of Open Notebook Science to collect and make publicly available non-aqueous solubility measurements and the synthesis of anti-malarial agents will be described. ONS involves the real time sharing of all experiments and associated raw data by a community of collaborators who are geographically distributed and may have never communicated using channels other than these shared projects. Monthly cash prizes are awarded to participating students by means of the ONS Challenge Submeta Awards for solubility measurements. The laboratory notebook pages are recorded on a public wiki and the solubility measurements, including relevant calculations, are stored in public Google Spreadsheets. A combination of ChemSpider, the GoogleDoc visualization API and web services is used to enable flexible searching and display of desired subsets of the data.

The use of such a distributed and open platform with virtually zero read/write costs for the communication of science creates new opportunities for rapid collaboration. By using a redundant information dissemination system, channels that are more human friendly can be integrated with those that are more geared to machine readability. For example a publicly editable Google Spreadsheet tied to the operation of a robotic liquid handling system opens up the possibility of integrating crowdsourced intelligence with human workflows. In another example, web services called from within a publicly editable Google Spreadsheet to perform calculations on NMR spectra can be integrated readily with manually executed steps to accelerate progress and minimize the possibility of errors.

The advantages and disadvantages of ONS and related bottom-up Open Science strategies will be discussed. The key concerns revolve around intellectual property, trust, reference-ability, publication in traditional academic vehicles and other implications for collaborations.

Labels: , , , ,

Thursday, May 14, 2009

Two new ONS Challenge Judges: Andrew Lang and Troy Milliken

Bill Hooker and I have been contacting the chemistry departments in the US to recruit new students to the Open Notebook Science Challenge. As a result, I have the pleasure of reporting that today we have a new judge: Troy Milliken from Jackson State University. Troy is a polymer chemist and will look out for some promising students to supervise to contribute to our solubility measurements.

Andrew Lang from Oral Roberts University also agreed to participate as a judge this month. As a mathematician and creator of many of our web services to calculate and display solubility measurements, Andy is in a unique position to provide valuable feedback to our students.

This brings our total judge count to 7.

Labels: , ,

Baseline correction for automated integration of NMR JCAMP-DX files

We were initially getting a surprisingly large solubility measurement for one of our solutes (spectrum from ONSC-EXP077). After investigating, it was clear that the discrepancy was originating from a peak with a sloping baseline:


The web service was integrating all of the area beneath the peak and the raised baseline. After discussing the situation with Andrew Lang, he modified his code to exclude the area beneath a linearly sloping baseline. Andy's code is Open Source and made available here, including detailed instructions for anyone wishing to implement it themselves. This modified web service is included in the most recent Google Spreadsheet template for semi-automated measurement of solubility using NMR.

The progress of science is a clumsy walk toward a non-attainable ideal of full understanding and control. With every experiment we have to re-question what think we know as variables change. This is why I am so passionate about Open Notebook Science and having all stakeholders interact at the level of the individual experiment. We caught this issue early and we were able to deal with it because our computational collaborator was engaged in details with the chemists and responsive.

Labels: , ,

Saturday, May 09, 2009

Leaders and Pushers in Open Science projects

Fred Zimny just posted an interesting piece on open scientific collaborations: Your (r)evolution will be digitized: online tools for radical collaboration — DMM

The article tries to give a balanced view of radical sharing and our Open Notebook Science projects (ONSchallenge and UsefulChem) get a mention.

Some good discussions about what it takes to have an open collaboration succeed have emerged from it. Deepak Singh has posted about the requirement for a benevolent dictator. There is also a healthy discussion on FriendFeed where Andrew Lang argues for an organizer.

I think that open projects function very much like any other projects, with the advantage of it being easier for people to join in and make use of the results.

People often mention leadership as the key ingredient to making projects work. Leadership is associated with that nebulous concept of vision, a type of flexible long range planning that probably does work best when coming from a single individual.

Leaders with a clear vision are known for giving inspiring speeches and presentations. But vision by itself does not get things done. For that you need pushers - people who relentlessly push themselves and others to execute.

In the book "The Dip", Seth Godin explores the value of strategic perseverance and quitting. In anything worth doing there is a period after initial enthusiasm and before one can see the light at the end of the tunnel where most people quit - this is the dip. Godin argues that the key to being successful is to being able to tell the difference between a true dip and a pit leading nowhere. Winners strategically quit the dead ends and persevere through the dips. That is exactly what pushers do.

People often think that successful leaders attract followers - people who are subservient. In my experience successful projects result from a collaboration of colleagues who share common values. Within the group there may be individuals with less experience who can best contribute by trusting those with more experience and making a firm commitment to learn quickly so that they can initiate contributions that count.

Of course leaders have to be pushers themselves. But since people have a limited ability to maintain simultaneous goals with equal urgency, it is helpful for collaborating colleagues to act as pushers in a complementary fashion.

To give a few concrete example of this, consider our Open Notebook Science projects.

Andrew Lang has been a close collaborator for a long time and has written code that enables us to visualize our solubility results (with Rajarshi Guha) and process NMR files automatically. The project would be missing key components without Andy pushing to make things happen. But Andy has also initiated other high impact actions that are unrelated to writing code: our ONS Wikipedia entry, recruiting David Bulger at ORU to do solubility measurements and adding our measurements to common chemicals in Wikipedia - which has ended being a popular portal to our data.

Bill Hooker, in addition to writing in depth about Open Science, has recently stepped up to help with emailing all of the chemistry departments in the US to get some more students to participate in the ONS Challenge. Shirley Wu volunteered to assist Andy in making ONS logos. Brent Friesen included the solubility challenge as part of his sophomore organic chemistry lab at Dominican University.

Cameron Neylon has done a tremendous amount - recently he pushed to get a group of us to publish a chapter in the upcoming O'Reilly Media book Beautiful Data. Organizing my trip to the UK last fall was another major accomplishment that he made happen. Cameron speaks extensively about Open Notebook Science and although there is a significant overlap in our objectives, he has a clear vision about what needs to happen that focuses on slightly different - and complementary - priorities.

There are many other people and examples that I could have used but I think those highlight the point I am trying to make about open collaborations. Pushers make things happen without being asked and that keeps projects alive.

My most pressing objectives often involve making sure key lab experiments get done and results processed into a usable format, including publications. Sometimes my collaborators need to push me about other issues and I am generally appreciative of that. It is when you have people that don't share your values pushing you that conflict arises.

As I mentioned previously, the point of open collaborations is the shared experience with others with similar values. At the end of the day it is their opinion that matters most.

Labels: , ,

Monday, May 04, 2009

Streamlining automated solubility measurements with NMR JCAMP-DX files

Two months ago I reported on a protocol for measuring solubility using NMR JCAMP-DX files and a web service set up by Andrew Lang called from within a Google Spreadsheet. Things were going well until David at ORU was a little too productive and crashed the server from too many requests.

Andy had to change the way the script worked and used this as an opportunity to make the service more broadly usable. It turns out that the compressed JCAMP-DX files produced by different NMR instruments are not created with exactly the same standards. A way to address that issue is to convert the files to an uncompressed XY format. Unfortunately, there was a glitch in JSpecView which created XY formatted spectra displaying in Hz instead of the standard ppm.

Now all of these issues have been resolved and the process is simpler than ever. Robert Lancashire fixed the glitch in the April 26, 2009 release of JSpecView. And Andy not only made his integration web service work for the new release but also created another service to display JCAMP-DX spectra directly from the the DX file (see here for an example). In the past students had to create an associated HTML file to display JCAMP-DX files and this was just another point in the process to introduce errors and slow things down.

The new process for the semi-automated measurement of solubility (SAMS) using NMR is as follows:

1) Make a saturated solution in a given solvent (sonicate for at least 30 mins - more on this in a separate post)
2) Transfer about 0.1 mL to an NMR tube with some compatible deuterated solvent (for locking)
3) Take the NMR spectrum and export the JCAMP-DX file (on our machines these are in a compressed format)
4) Open the initial JCAMP-DX files in JSpecView and save as JCAMP-DX XY format
5) Upload the converted file to the ONSC server in the spectra folder
6) Fill out the requested information in the SAMS spreadsheet and you have the solubility calculation (first open the SAMStemplate and save as a copy with a new name)

We can now easily finish processing the backlog of measurements that the ONSchallenge participants have been obtaining and record them in the SolubilitySum spreadsheet for querying.

Labels: , , ,

Creative Commons Attribution Share-Alike 2.5 License