Saturday, September 30, 2006

More Changes to usefulchem-molecules Feed and CMLRSSReader

Recent changes to the usefulchem-molecules feed and CMLRSSReader:



Menu programs can now be added to both the Feed and Item menus, via entries in the configuration file.  These entries have the form

<MenuPrograms>
<FeedPrograms>
<FeedProgram>
<RunString></RunString>
<MenuName></MenuName>
</FeedProgram>
<FeedProgram>
<RunString></RunString>
<MenuName></MenuName>
</FeedProgram>
</FeedPrograms>
<ItemPrograms>
<ItemProgram>
<RunString></RunString>
<MenuName></MenuName>
</ItemProgram>
</ItemPrograms>
</MenuPrograms>

These programs receive as respective parameters the saved (Java serialized) feed and item files, and can use the APIs in the CMLRSSReader software to read these files.  For an example, run the software with Run.Menu.bat instead of Run.bat.



Flexibility has been added in the content displayed in the item pages.  Instead of displaying all the molecule fields in a feed, specified fields can be omitted.  This is desirable when a field contains data not suitable for display; for example, the text of a blog (which is now, in fact, contained in the usefulchem-molecules feed).  Also, instead of displaying molecule fields, the <description> contained in the item can displayed instead; or both (or neither!) can be displayed.  The configuration entries are

<IncludeItemDescription></IncludeItemDescription>
<IncludeMoleculeFields></IncludeMoleculeFields>
<OmittedFields>
<OmittedField></OmittedField>
<OmittedField></OmittedField>
</OmittedFields>



The entries in the <PostDownload> and <PostProcess> hooks should now be file names containing a list of programs to be run, not the programs themselves.  This gives greater flexibility in configuring these programs.  Finally, another hook has been added, the <Cycle> hook.  This hook runs whenever CMLRSSReader checks a feed, regardless of whether a new version of the feed is downloaded and processed or not.

Friday, September 29, 2006

Chemical Blog Review

We just got a nice mention in a cefic review article on chemistry blogs, (although our number of visitors is only about half of that reported).
There are new professional channels for interaction emerging from the blog. Jean-Claude Bradley and colleagues set up http://usefulchem.blogspot.com/ that is an attempt to provide an open source science in chemistry’. The idea is to provide a place to post specific problems in chemistry that need to be solved. It has related sites for posting useful molecules, a wikipedia site and an experiments site. The site gets around 1000 visitors a week at present, but its community is rising steadily.

Thursday, September 28, 2006

ArgusLab

I just saw a post on the SynapticLeap by Anatoly Chernyshev about a docking program that sounds perfect for our needs:
If you need a quick assessment of the docking mode for a molecule, the best tool IMHO is ArgusLab (http://www.arguslab.com). It's free, runs under Win and has very good user interface. The tutorial is also supplied, so that average person can learn the software in one evening. It can dock molecular libraries without user intervention. I also found its performance comparable to more advanced packages (Dock, AutoDock). Perhaps, it will not show all 'best hits', but it, definitely, discards the wrong ones.
Lets find out how good it is. Anybody out there have experience with ArgusLab?

Wednesday, September 27, 2006

Defining what UsefulChem does

There has been a discussion recently on the Blue Obelisk mailing list about the definition of terms relating to Open Source Science. Some of our group members will be talking about UsefulChem at upcoming meetings and it would be a good idea to see what kind of misunderstandings can arise. Here is a suggestion for how to be more clear.

Chemical Structure Lookup Service

CSLS looks like another very comprehensive search tool for chemical structures that we should implement. It boasts access to 26 million compounds via 80 databases. It will be interesting to see how it compares with Emolecules. It can take a wide range of formats as input but does not appear to have the substructure search capability of Emolecules.

Thanks to David Bradley for finding this gem.

Saturday, September 23, 2006

Developments With CMLRSSReader

The CMLRSSReader software I discussed earlier has had several important developments. Most importantly, it now tracks new and modified items as feeds are updated. New/updated items and feeds are considered "unread" and displayed, at the top of the item list, in bold, while old, "read" items are in plain text. Menu selections allow both items and entire feeds to be marked as either read or unread.


Also, I am now storing processed feeds by Java serialization. This has two advantages. First, feeds can now be loaded faster than by re-parsing the entire feed file itself. More importantly, all information about the feed can be stored this way, including, for example, which items have been marked read This has allowed me to add another program "hook" to the configuration, the <PostProcess> hook. Post-process programs can read the serialized feed (although they must be written in Java to do so).

A working example of what can be done with this new feature can be found
here
, which lists all new/modified items in the usefulchem-molecules RSS feed. Remember that this feed is regenerated every time the usefulchem-molecules blog is updated, so this page will always reflect the most recent changes to this blog.

For more details on this, and work to date in UsefulChem automation, see my summary at the UsefulChem Wiki (cheminfo).

Thursday, September 21, 2006

spectra in JCAMP

James has been trying to get our NMR data in a more manipulatable format so that we can zoom in and amplify peaks at will. He finally did get some data off our 500 Mz Varian instrument in JCAMP format. For example see this file for crude butyraldehyde.

We are still looking for a good viewer that will let us manipulate the spectra within the browser, including integration. Right now, after downloading, the butyraldehyde spectrum can be viewed with JDXview.

Any suggestions? (And no, I don't want to use Chime because MDL requires printing out a contract and mailing it in for approval)

Sunday, September 17, 2006

Final Thoughts on ACS

Well the 2006 Fall ACS meeting in San Francisco is now over and I am back in Philly, catching up on things.

My talk on UsefulChem was on the last talk of the last day on Thursday. Although not optimal, a few people stuck around to the bitter end. However, I recorded my talk and made it available here.

The most productive part of this conference was probably the discussions I had over lunch with Peter Murray-Rust, Henry Rzepa and Geoff Hutchison. We brainstormed some ways to move cheminformatics forward. Peter suggested that we set up a literature review blog from a few organic chemists. With Tenderbutton calling it quits, it couldn't hurt to inject some more organic chemistry into the blogosphere. Certainly, we are eager to extend the CMLRSS feeds from UsefulChem and see if CMLReact makes sense to use.

One of the more exciting projects brought up at the conference was OSCAR, the robot that reads chemistry journals looking for errors and extracting chemical information (like NMR data). If the copyright issues can be overcome, I think that this approach has a huge potential in quickly populating the open chemical data resources.

Peter also did a nice write-up of UsefulChem on his blog.

Tuesday, September 12, 2006

Cheminformatics at ACS

Here is a little mid conference report from the American Chemical Society meeting (ACS fall 2006) in San Francisco. The Sunday CINF session on Cyberinfrastructure in Chemistry, Information and Education: New Emerging Technologies was the most interesting for me.

Evan Bolton gave a nice overview of PubChem. The slide that struck me the most was the graph showing the exponential rise of compounds and visitors during the past 12 months. There are now almost 13 million compounds in that database now. From an organic chemistry perspective PubChem has been of limited use to us because it does not generally have links to synthetic or spectral data for compounds, although the biological data is likely to become more useful as we move to the in vitro testing of our anti-malarial compounds. However, using the LinkOut feature in Entrez to annotate records may provide a mechanism to start doing that. PubChem is a very important project for chemical Open Source Science and its continued explosive growth is very encouraging.

Jeremy Frey talked about his CombeChem project, which involves learning about how chemists plan and execute experiments for the development of software and hardware to assist with that. I learned that Jeremy had published their experiments automatically to a blog but that their server is currently down. When he brings this back up I'll be sure to report on it in detail. We had the opportunity to discuss the possibility of collaborations over lunch. We both have an interest in developing anti-malarial agents and he has some docking software that may prove to be quite helpful to determine the affinity of some of our compounds for enoyl reductase. One of the problems is that the licensing terms of commercial software makes it difficult to share the results of the calculations openly. What we would like to do is run docking experiments as a web service at some point. That would be very useful for Open Source Science. Jeremy also gave a talk on Tuesday about his e-malaria project, where he has students try to design anti-malarial compounds.

Rene Deplanque alerted us to Chemgaroo and the Chemgapedia. Most of it is in German but they have translated the organic chemistry section to English and it is free for academic use. I've added it to the resources for my class this fall.
Christoph Steinbeck gave an interesting overview of the development of the NMRShiftDB to make NMR spectra openly available to everyone. It is very challenging to set up and run these volunteer based efforts.

Peter Murray-Rust talked about moving towards Open Data and Open Access. He described a system in place where X-ray Crystal structures are deposited and automatically become public after some time. That enables the researcher time to publish by traditional channels, while making it easy to obtain permission to make the data public by default. Peter wanted to give several demos, including the new version of Bioclipse. However, not having access to the internet made that difficult.

Open access chemistry journal

A new open access outlet for chemists' peer reviewed research was launched today. Chemistry Central Journal. Publisher BMC says, the journal is the first international open access journal covering all of chemistry and will publish its first issue early in 2007.

Bryan Vickery, speaking today at the journal's launch being held during the ACS meeting, said, “I am delighted by the number of noted chemists and scientists who have agreed to join the Editorial Advisory Board of the journal from the outset." Among them is 1996 Nobel laureate Robert Curl. "I think open access journals are a great idea and am delighted to join this venture as a member of the Editorial Advisory Board," he said.

Read more...

Saturday, September 09, 2006

Molecules Blog Format

When creating and editing posts on the UsefulChem molecules blog, remember to adhere to the formatting requirements so that Dave's script can process the information correctly. Currently, only the SMILES: text is recognized but very soon the other fields listed on the format page will be passed through.
Note that the SMILES information is needed by the script to generate the rest of the information such as InChI, Emolecules search, MW, etc. If you know the UC number of a compound the easiest way to find this information is via the drop down page. You can link to any of the pages with the processed molecular info by following the link titled like this "View file UC0192.html in full window mode."

Wednesday, September 06, 2006

UsefulChemistryMolecules.html

The individual entries in the Useful Chemistry Molecules blog can now be viewed at UsefulChemistryMolecules.html.  This page has links to both the original blog entries for the usefulchem-molecules blog and pages containing the Jmol applet to view the molecule.  The pages are automatically generated from the same software which generates the usefulchem-molecules RSS feed.

Monday, September 04, 2006

NMRShiftDB API

Here is another gem from Rich Apodeca: a hack for an API to NMRShiftDB.

This is exactly the kind of thing we will be implementing for the automatically processed info from the UsefulChem Molecules blog. We currently process the SMILES code into an image, InChI, MW, Jmol and commercial sources from Emolecules. Automatically generating spectral info is also high on that list, especially H NMR. Unfortunately, the NMRShiftDB is still fairly small with a little over 22,000 spectra and we usually have to use other sources for our molecules of interest. But this situation will likely improve over time and we will certainly contribute our spectra.

Saturday, September 02, 2006

Ugi Reaction in Water

In our lab group meeting on Friday Sept 1, 2006 with Khalid, James and Lin, we discussed the status of our effort to use the Ugi reaction to synthesize a library of potential anti-malarial compounds.

Lin's attempt to synthesize a diketopiperazine with piperonal, 5-methylfurfurylamine, Boc-glycine and benzyl isocyanide yielded a fraction 19D-F4B that is consistent with the desired product but still has too many impurities after two attempts at chromatography. We'll have to see if it is worth one more column or starting again and separating the Ugi condensation from the cyclization step.

James has tried to isolate the Ugi product before cyclization, using 3,4-dihydroxybenzaldehyde, 5-methylfurfurylamine, Boc-glycine and benzylisocyanide. We discussed his NMR for 21B-F1 at length. The point of contention was whether the peaks between 5-6 ppm could correspond to the furan ring. Unfortunately we probably won't be able to do C NMR or MS for a few weeks as the instruments are repaired. Khalid will try to run a COSY to clarify this. James's attempt to cyclize the putative Ugi product 21B-F1 did not result in a diketopiperazine.

One of my concerns was that Chris Hulme has already stated that the Ugi reaction works well for aliphatic aldehydes so it may be that James's attempt with the aromatic aldehyde was doomed to failure. The presence of phenols has also been brought up as a potential problem.

At this point I suggested that we go back to doing a simpler, preferably published synthesis of a specific Ugi product for practice. We talked about using acetaldehyde, which is usually available as an aqueous solution. Khalid thought that water should accelerate the Ugi reaction. It turns out that this morning I was checking the Sitemeter hits to UsefulChem and someone from Hungary had tried searching for Ugi and water. Following this search on Google quickly pulled up the Wikipedia entry for the Ugi reaction and Khalid's hypothesis was indeed confirmed. There are some nice links in that summary that I think our group needs to invest some time reading.

It turns out that one of the authors on that Ugi in water article is Mike Pirrung, one of my postdoc supervisors in the early 90s. Small chemical world.

Murray-Rust Blog

Peter Murray-Rust has started the CML blog. He is a pioneer of Chemical Markup Language and a strong advocate of open source science. This should prove to be a great resource for our cheminformatics efforts.

Thanks to Egon for the link.

Creative Commons Attribution Share-Alike 2.5 License