Saturday, December 31, 2005


I thought Chmoogle was nice when I blogged about it a few weeks ago.
But QueryChem takes open content chemical searching to the next level and here is why:

1) Just like Chmoogle, you can use an editor to draw the a molecule or type the SMILES code as input. But you can also add text queries to fine tune the results. For example, typing "CAS" in the text box pulls up only those hits where the CAS number is likely to be listed.

2) This is the big one: the results in QueryChem take you directly to the pages of the commercial suppliers. In a Chmoogle search, the results only take you to the general company URL, where you have to do the search over again.

3) A QueryChem search does a lot of work for you. It figures out the possible names for a compound then throws that back into Google or Google Scholar and then shows where those names appear in the results. That saves a lot of manual labor.

4) Compound analogs also show up and the threshold of similarity can be set in the search.

With all of these advantages, QueryChem is now my first choice for single search open content chemical information. For ongoing monitoring of our UsefulChem project I am still going to use CAS number searches in MSN (exportable via OPML) because QueryChem does not yet provide RSS feeds for searches.

Another feature that I would love to see in QueryChem is the ability to form a URL of a given search.

Update: Justin Dale Klekota from QueryChem has just enabled forming a URL for a search. For example the search below formed by searching the SMILES code for glycoaldehyde and "CAS" can be called up by clicking on this link. He also informs me that they are working on RSS feeds for searches. How is that for responsive!

Friday, December 30, 2005

aldehyde problem

It looks like we'll be able to get the amines and BOC-protected amino acids for the Ugi synthesis of the diketopiperazine anti-malaria compounds.

But we have not yet found a commercial source for the aldehyde components so far.
This catechol aldehyde is one of the more common required aldehydes. After looking at several alternatives, the cheapest and most direct route seems to be a pinacol style rearrangement of the amino alcohol adrenaline. Racemic adrenaline is pretty cheap, although the yield is only about 20% in this Korean paper.

These are the concerns I have:

1) The Korean paper reports the heating of perchloric and glacial acetic acids, which I really don't want to do because of the explosive hazard. I am wondering if sulfuric acid will work just as well. The mechanism only involves acid catalysis so I don't see why not.

2) Because the Ugi reaction is carried out in methanol, I am hoping that the phenolic groups will not have to be protected. If they do have to be protected then we'll probably have to make an acetal that will come off during BOC deprotection at the end.

Any comments/ideas/chemicals that would help us?

Monday, December 26, 2005

SMILES lookup table for malaria37

Thanks to Ruslan, we have a lookup table for the first hundred compounds in the malaria37 batch. The first 6 have been processed so for anyone contributing to this project, feel free to add the rest in the format used here. For some reason the SMILES codes provided by Ruslan were all set to the R configuration - I removed the "@@" characters to wipe out the stereochemical information that was not given with the original compounds.

I first tried to use the SMILES code as the name of the molecule but that made double checking against the original data too hard. So the format is just "malaria37-1". I have already uploaded all the gif files. Just link to them from the blog using this format: ""

Here is the lookup table:

1 "N1([CH](C(N[CH](C1=O)CCCC)=O)Cc1cc(c(cc1)O)O)C[CH]([CH](C(O)=O)Br)Br"
2 "N1([CH](C(N[CH](C1=O)CCSC)=O)Cc1cc(c(cc1)O)O)Cc1oc(cc1)C"
3 "N1([CH](C(N[CH](C1=O)CCSC)=O)[CH](c1cc(c(cc1)O)O)O)CC(=O)C"
4 "N1([CH](C(N[CH](C1=O)Cc1ccc(cc1)I)=O)CCCCNC)Cc1ccc(c(c1)O)O"
5 "N1([CH](C(N[CH](C1=O)CC(C)C)=O)Cc1cc(c(cc1)O)O)Cc1ncccc1"
6 "N1([CH](C(N[CH](C1=O)Cc1ccccc1)=O)CCCN)CC[CH]([CH](CO)O)O"
7 "N1([CH](C(N[CH](C1=O)CC)=O)Cc1cc(c(cc1)O)O)Cc1oc(cc1)C"
8 "N1([CH](C(N[CH](C1=O)Cc1sccc1)=O)CCCCN)Cc1ccc(c(c1)O)O"
9 "N1([CH](C(N[CH](C1=O)CCC)=O)[CH](c1cc(c(cc1)O)O)O)CC(=O)C"
10 "N1([CH](C(NCC1=O)=O)Cc1cc(c(cc1)O)O)C[CH]([CH](C(O)=O)Br)Br"
11 "N1([CH](C(N[CH](C1=O)CCSC)=O)Cc1cc(c(cc1)O)O)CCCC(O)=O"
12 "N1([CH](C(N[CH](C1=O)CCl)=O)Cc1cc(c(cc1)O)O)Cc1sccn1"
13 "N1([CH](C(NCC1=O)=O)CCC(=O)O)Cc1ccc(c(OC)c1)O"
14 "N1([CH](C(N[CH](C1=O)CC1CCCCC1)=O)Cc1cc(c(cc1)O)O)CC(O)=O"
15 "N1([CH](C(N[CH](C1=O)Cc1ccc(cc1)F)=O)CCCCN)Cc1ccc(c(OC)c1)O"
16 "N1([CH](C(N[CH](C1=O)CC1CCCCC1)=O)CC(=O)O)CCCc1ccc(c(OC)c1)O"
17 "N1([CH](C(N[CH](C1=O)CC(C)C)=O)Cc1cc(c(cc1)O)O)Cc1oc(CC)cc1"
18 "N1([CH](C(N[CH](C1=O)[CH](CC)C)=O)CCOCCN)Cc1ccc(c(c1)O)O"
19 "N1([CH](C(NCC1=O)=O)Cc1cc(c(cc1)O)O)Cc1nc(ccc1)C"
20 "N1([CH](C(N[CH](C1=O)CC(C)C)=O)Cc1cc(c(cc1)O)O)Cc1nc(ccc1)C"
21 "N1([CH](C(N[CH](C1=O)Cc1ccc(cc1)I)=O)CCCN)CC[CH]([CH](CO)O)O"
22 "N1([CH](C(N[CH](C1=O)CC(C)C)=O)CC(=O)O)CCCc1ccc(c(OC)c1)O"
23 "N1([CH](C(N[CH](C1=O)CCSCC)=O)Cc1cc(c(cc1)O)O)Cc1oc(c(c1)C)C"
24 "N1([CH](C(N[CH](C1=O)Cc1cc(ccc1)O)=O)CCN)Cc1ccc(cc1)O"
25 "N1([CH](C(N[CH](C1=O)CC(C)C)=O)[CH](c1cc(c(cc1)O)O)O)CC(=O)C"
26 "N1([CH](C(N[CH](C1=O)CC1CCCCC1)=O)CCCN)C[CH]([CH](CO)O)O"
27 "N1([CH](C(N[CH](C1=O)[CH](c1ccccc1)C)=O)CCCCN)Cc1ccc(c(c1)O)O"
28 "N1([CH](C(N[CH](C1=O)CCCN)=O)CO)Cc1ccc(c(c1)O)O"
29 "N1([CH](C(N[CH](C1=O)CC)=O)Cc1cc(c(cc1)O)O)Cc1occc1"
30 "N1([CH](C(N[CH](C1=O)CC)=O)Cc1cc(c(cc1)O)O)Cc1oc(cc1)Br"
31 "N1([CH](C(N[CH](C1=O)CC1CCCCC1)=O)CCCN)CC[CH]([CH](CO)O)O"
32 "N1([CH](C(N[CH](C1=O)Cc1ccc(cc1)Cl)=O)Cc1cc(c(cc1)O)O)CC(=O)C"
33 "N1([CH](C(N[CH](C1=O)CCCCCC)=O)Cc1cc(c(cc1)O)O)Cc1oc(CC)cc1"
34 "N1([CH](C(N[CH](C1=O)CC1CCCCC1)=O)Cc1cc(c(cc1)O)O)C[CH]([CH](C(O)=O)Br)Br"
35 "N1([CH](C(N[CH](C1=O)CCl)=O)CCC(=O)N)Cc1ccc(c(c1)O)O"
36 "N1([CH](C(N[CH](C1=O)CSCc1ccccc1)=O)CCCN)Cc1ccc(c(c1)O)O"
37 "N1([CH](C(N[CH](C1=O)Cc1ccccc1)=O)CCCN)Cc1ccc(c(c1)O)O"
38 "N1([CH](C(N[CH](C1=O)CCCC)=O)Cc1cc(c(cc1)O)O)Cc1oc(cc1)Br"
39 "N1([CH](C(N[CH](C1=O)[CH](c1ccccc1)C)=O)CCCCNC)Cc1ccc(c(OC)c1)O"
40 "N1([CH](C(N[CH](C1=O)CC)=O)CCC(=O)O)Cc1ccc(c(c1)O)O"
41 "N1([CH](C(N[CH](C1=O)Cc1ccc(cc1)I)=O)CCCN)Cc1ccc(c(OC)c1)O"
42 "N1([CH](C(N[CH](C1=O)CN)=O)Cc1cc(c(cc1)O)O)CC(=O)C"
43 "N1([CH](C(N[CH](C1=O)CCSC)=O)Cc1cc(c(cc1)O)O)Cc1oc(c(c1)C)C"
44 "N1([CH](C(N[CH](C1=O)CCSCC)=O)Cc1cc(c(cc1)O)O)Cc1occc1"
45 "N1([CH](C(N[CH](C1=O)Cc1ccccc1)=O)CCCN)Cc1ccc(c(OC)c1)O"
46 "N1([CH](C(N[CH](C1=O)CCSC)=O)Cc1cc(c(cc1)O)O)Cc1ncccc1"
47 "N1([CH](C(N[CH](C1=O)CCC)=O)CC[CH](CN)O)Cc1ccc(c(c1)O)O"
48 "N1([CH](C(N[CH](C1=O)CC(C)C)=O)Cc1cc(c(cc1)O)O)C[CH]([CH](C(O)=O)Br)Br"
49 "N1([CH](C(N[CH](C1=O)CCl)=O)Cc1cc(c(cc1)O)O)Cc1occc1"
50 "N1([CH](C(N[CH](C1=O)[CH](c1ccccc1)C)=O)CCCN)Cc1ccc(c(OC)c1)O"
51 "N1([CH](C(N[CH](C1=O)CC1CCCCC1)=O)Cc1cc(c(cc1)O)O)Cc1oc(cc1)C"
52 "N1([CH](C(N[CH](C1=O)CCN)=O)CO)CCCc1ccc(c(OC)c1)O"
53 "N1([CH](C(N[CH](C1=O)CCl)=O)Cc1cc(c(cc1)O)O)Cc1oc(CC)cc1"
54 "N1([CH](C(NCC1=O)=O)CCCN)Cc1ccc(c(OC)c1)O"
55 "N1([CH](C(N[CH](C1=O)[CH](c1ccccc1)C)=O)CCCN)CC[CH]([CH](CO)O)O"
56 "N1([CH](C(NCC1=O)=O)CC[CH](CN)O)CCCc1ccc(c(OC)c1)O"
57 "N1([CH](C(N[CH](C1=O)C(C)C)=O)CC[CH](CN)O)Cc1ccc(c(c1)O)O"
58 "N1([CH](C(N[CH](C1=O)[CH](O)C)=O)Cc1cc(ccc1)O)C[CH](CO)O"
59 "N1([CH](C(N[CH](C1=O)CC=C)=O)Cc1cc(c(cc1)O)O)Cc1oc(CC)cc1"
60 "N1([CH](C(N[CH](C1=O)CC(C)C)=O)CCC(=O)O)Cc1ccc(c(OC)c1)O"
61 "N1([CH](C(N[CH](C1=O)CC)=O)Cc1cc(c(cc1)O)O)Cc1oc(CC)cc1"
62 "N1([CH](C(N[CH](C1=O)CCSCC)=O)Cc1cc(c(cc1)O)O)Cc1ncccc1"
63 "N1([CH](C(N[CH](C1=O)CCC)=O)Cc1cc(c(cc1)O)O)CC(=O)C"
64 "N1([CH](C(N[CH](C1=O)CCl)=O)[CH](c1cc(c(cc1)O)O)O)CC(=O)C"
65 "N1([CH](C(N[CH](C1=O)CCCC)=O)Cc1cc(c(cc1)O)O)Cc1occc1"
66 "N1([CH](C(N[CH](C1=O)CC1CCCCC1)=O)CCC(=O)N)CC[CH]([CH](CO)O)O"
67 "N1([CH](C(N[CH](C1=O)CCCCCC)=O)[CH](c1cc(c(cc1)O)O)O)CC(=O)C"
68 "N1([CH](C(N[CH](C1=O)CCSC)=O)Cc1cc(c(cc1)O)O)CC(O)=O"
69 "N1([CH](C(N[CH](C1=O)CC1CCCCC1)=O)Cc1cc(c(cc1)O)O)CC(=O)C"
70 "N1([CH](C(N[CH](C1=O)CCC)=O)Cc1cc(c(cc1)O)O)C[CH]([CH](C(O)=O)Cl)Cl"
71 "N1([CH](C(NCC1=O)=O)CC(=O)N)Cc1ccc(c(c1)O)O"
72 "N1([CH](C(N[CH](C1=O)CC)=O)Cc1cc(c(cc1)O)O)C[CH]([CH](C(O)=O)Cl)Cl"
73 "N1([CH](C(N[CH](C1=O)CC1CCCCC1)=O)Cc1cc(c(cc1)O)O)Cc1occc1"
74 "N1([CH](C(N[CH](C1=O)CCC)=O)Cc1cc(c(cc1)O)O)Cc1oc(cc1)C"
75 "N1([CH](C(N[CH](C1=O)c1ccccc1)=O)CCCCN)CCCc1ccc(c(OC)c1)O"
76 "N1([CH](C(N[CH](C1=O)CCO)=O)Cc1cc(c(cc1)O)O)CC(=O)C"
77 "N1([CH](C(N[CH](C1=O)CC1CCCCC1)=O)CCC(=O)N)C[CH]([CH](CO)O)O"
78 "N1([CH](C(N[CH](C1=O)CC)=O)Cc1cc(c(cc1)O)O)C[CH]([CH](C(O)=O)Br)Br"
79 "N1([CH](C(N[CH](C1=O)CC1CCCCC1)=O)Cc1cc(c(cc1)O)O)Cc1sccn1"
80 "N1([CH](C(N[CH](C1=O)C(C)(C)C)=O)CC[CH](CN)O)Cc1ccc(c(c1)O)O"
81 "N1([CH](C(N[CH](C1=O)CCCCCC)=O)Cc1cc(c(cc1)O)O)Cc1occc1"
82 "N1([CH](C(N[CH](C1=O)CN)=O)CCN)Cc1ccc(c(c1)O)O"
83 "N1([CH](C(N[CH](C1=O)CC1CCCCC1)=O)Cc1cc(c(cc1)O)O)CCCC(O)=O"
84 "N1([CH](C(N[CH](C1=O)Cc1ccc(cc1)I)=O)CCCCN)Cc1ccc(c(c1)O)O"
85 "N1([CH](C(N[CH](C1=O)CCCCNC)=O)Cc1cc(c(cc1)O)O)CC(=O)C"
86 "N1([CH](C(N[CH](C1=O)CC[S](=O)C)=O)CCN)Cc1ccc(c(c1)O)O"
87 "N1([CH](C(N[CH](C1=O)CCCC)=O)Cc1cc(c(cc1)O)O)Cc1oc(c(c1)C)C"
88 "N1([CH](C(N[CH](C1=O)CC(C)C)=O)Cc1cc(c(cc1)O)O)CC(O)=O"
89 "N1([CH](C(N[CH](C1=O)C)=O)Cc1cc(c(cc1)O)O)Cc1nc(ccc1)C"
90 "N1([CH](C(N[CH](C1=O)CCCCCC)=O)Cc1cc(c(cc1)O)O)Cc1oc(cc1)C"
91 "N1([CH](C(N[CH](C1=O)CSC)=O)Cc1cc(c(cc1)O)O)Cc1oc(c(c1)C)C"
92 "N1([CH](C(N[CH](C1=O)CCCC)=O)Cc1cc(c(cc1)O)O)Cc1sccn1"
93 "N1([CH](C(N[CH](C1=O)CCCN)=O)Cc1cc(c(cc1)O)O)CC(=O)C"
94 "N1([CH](C(N[CH](C1=O)CC(C)C)=O)Cc1cc(c(cc1)O)O)Cc1sccn1"
95 "N1([CH](C(N[CH](C1=O)CCSCC)=O)Cc1cc(c(cc1)O)O)Cc1nc(ccc1)C"
96 "N1([CH](C(N[CH](C1=O)CC1CCCCC1)=O)CCCN)C[CH]([CH]([CH](CO)O)O)O"
97 "N1([CH](C(N[CH](C1=O)C(C)(C)C)=O)CCC(=O)O)Cc1ccc(c(c1)O)O"
98 "N1([CH](C(N[CH](C1=O)CCSC)=O)Cc1cc(c(cc1)O)O)Cc1oc(CC)cc1"
99 "N1([CH](C(N[CH](C1=O)CC1CCCCC1)=O)CC[CH](CN)O)C[CH]([CH](CO)O)O"
100 "N1([CH](C(NCC1=O)=O)Cc1cc(c(cc1)O)O)CCCC(O)=O"

Friday, December 23, 2005

isocyanide substitutes

Here is an update to the Ugi synthesis. It turns out that 1-isocyanocyclohexene is tough to find commercially. I only identified a potential source from a German company. This problem was addressed in this paper,

The solution phase synthesis of diketopiperazine libraries via the Ugi reaction: Novel application of Armstrong's convertible isonitrile Christopher Hulme*, Matthew M. Morrissette, Francis A. Volz and Christopher J. Burns Tetrahedron Letters
Volume 39, Issue 10 , 5 March 1998, Pages 1113-1116

where they tried alternatives. Benzyl isocyanide seemed to work well and it is obtainable from Sigma-Aldrich fairly cheaply.

Wednesday, December 21, 2005

Ugi DKP synthesis for malaria

I found a nice little review of diketopiperiazine syntheses. One of the methods involved a one pot solution synthesis (Ugi reaction) with a BOC protected amino acid, an amine and an aldehyde. I have drawn the scheme below in such a way that the R1, R2 and R3 groups are positioned in the same way as in our file of target compounds.

The original article is: The solution phase synthesis of diketopiperazine libraries via the Ugi reaction: Novel application of Armstrong's convertible isonitrile Christopher Hulme*, Matthew M. Morrissette, Francis A. Volz and Christopher J. Burns Tetrahedron Letters
Volume 39, Issue 10 , 5 March 1998, Pages 1113-1116

This should be much simpler and much cheaper than the previous solid phase synthesis I proposed earlier. Since the reaction is carried out in methanol, my guess is that there is no need to protect alcohols in any of the starting materials. Does anyone know if phenols would be a problem in the Ugi reaction?

The Synaptic Leap

The Synaptic Leap is another initiative to coordinate open source biomedical research. It looks like they are using Drupal to run a collection of communities. Just as we have, they have identified malaria as an obvious first choice. It will be very interesting to see how all of these efforts co-evolve and leverage each other.

I'll update these open science sites in the UsefulChem wiki as well.

malaria solid support DKP synthesis

Here is a more explicit general scheme (based on this) showing the general synthesis of the diketopiperazines (DKP) of our malaria project requiring 2 BOC-protected amino acids and one aldehyde. If any alcohols, thiols or amines are present they need to be protected (e.g. benzyl, FMOC, etc.) and deprotected in the last step.

Sunday, December 18, 2005

E-Malaria Project

The e-malaria project is an effort similar to our own aiming to process chemical information with students and automation to find and produce novel anti-malaria compounds. It is run from Southampton University. Hopefully the projects with synergize.

Thursday, December 08, 2005

UsefulChem Wiki

I have created a Wiki to keep a concise and updated summary of the major project that we are pursuing in the UsefulChem blog. As we are getting more volunteers it will easier for them to get the bigger picture and understand how they can contribute immediately.

Wednesday, December 07, 2005

Chmoogle accommodates automation

I just received word from Craig James at Chmoogle that they will modify their Terms of Use to accommodate our request for automated queries:

"You may perform an unlimited number of searches on Chmoogle from a standard web browser application which is under your immediate personal control. If you use an automated system such as a script or program to search Chmoogle, you may only access the first one hundred pages (1000 compounds) for each distinct query, and you may not access more than one thousand pages (10000 compounds) in any twenty-four hour period."

This is very nice. Note, however, that they do not make an API available.

Also they will be incorporating the Sigma-Aldrich catalogue in a few weeks - that will take Chmoogle to a new level of usefulness for automated queries.

Sunday, December 04, 2005

diketopiperazine synthesis found

After a bit of digging, I found a published synthesis for a library of diketopiperazines with the same substitution pattern as in our anti-malaria candidates. My initial guess was pretty close and the published synthesis answered some of my concerns.
1) They do use a reductive amination step but not at the last step to avoid having to do the cyclization with a secondary amine.
2) The cyclization is not used to cleave from the solid support. Instead the dipeptide is freed then cyclized in refluxing toluene.

Details can be found on page 571 of this review article. The original article is here Gordon, D.W. and Steele,J. Biomed. Chem. Lett. 1995, 5, p. 47.

Next step on this project: Find a commercial source for the 2 boc protected amino acids, the aldehyde, the resin and necessary reagents. Place each one in the molecule blog.

OpenBabel 2.0 release

For the programmers on the UsefulChem project, this should be a really useful resource. Open Babel 2.0 just got released. It is an extremely comprehensive chemical data conversion tool that has both command line and a GUI interface. And it is all free open source.

As a reminder, what we would like to be able to do is have agents automatically complete information for molecules dumped in our molecule blog. I am asking our other volunteers to manually complete these until we get the automation worked out. The most useful piece of information is probably the commercial availability and price but we also want the other identifiers like the CAS#, InChI, SMILES, etc. so that they will get indexed and found by other researchers. I don't think OpenBabel can generate gifs but ChemSketch can be used for that.

It would be nice if an RSS reader could be used to detect new conversions to be performed as new molecules are dropped in the blog, probably usually as SMILES. Does anyone have an insight into doing that easily? Anyone with CMLRSS experience interested?

Creative Commons Attribution Share-Alike 2.5 License