CombiUgi and Closing the Open Science Loop
A few weeks ago I asked Rikesh to kick off the CombiUgi project, to create lists of commercially available boc-protected amino acids, aldehydes, primary amines and isonitriles. He is now done and the links to purchase each compound is provided, in addition to the SMILES code.
What we need to do now is generate the list of 90,000 Ugi products resulting from their combination (not including enantiomers and diastereomers and the free amines resulting after boc deprotection). This can probably be done fairly easily with VBA in Excel but if anyone wants to pitch in using their favorite software that would be appreciated.
Although we have not been successful in cyclizing our Ugi products to diketopiperazines, from what we learned, I expect that we should be able to make any of these 90,000 compounds in less than a week (including shipping time).
By indexing these compounds in relevant search engines (I am working with Chemspider to make this happen) as UsefulChem molecules available upon request (and justification) we have an opportunity to close the loop on a practical Open Science project.
By the loop I mean a complete iteration from hypothesis to deciding which compounds to make to actually making them and getting testing results. These results will confirm or force a modification of the hypothesis and the cycle goes through another iteration hopefully closer to producing a useful outcome (a good drug lead compound for example).
I imagine that this loop operates in a lot of research groups. But doing the work under Open Science conditions lets it evolve in new ways. First of all, the direction of progress is determined by the collaborators that elect to participate in the process, not necessarily scientific objectives.
An example of that is our recent shift from the testing of our compounds as anti-malarial agents to testing them as tumor inhibitors simply because Dan Zaharevitz from NCI contacted me and suggested that we submit our compounds.
Right after we started to submit our compounds, Dan left this message:
The folks at Indiana have done a lot of cool stuff that is well worth looking at. One thing they have running in a preliminary form is a service that predicts a
compound's activity in cell lines in the screen. This compound is predicted to be inactive in the cell lines in the prediction. I actually don't think that is a bad result. We probably should put up a place to discuss screens and screening strategy, but essentially a prediction tools such as this summarizes what is known. A compound that is predicted to be inactive, but turns out to be active is much more likely to show you something new and interesting than a compound that is predicted to be active and is active.
So that's the last piece that closes the loop. This web service will make a prediction about activity of the compounds generated by the CombiUgi algorithm and rank them. The flagged compounds will be identifed and synthesized then tested via NCI's assays for tumor cell inhibition.
My groups core expertise is the synthetic component. As far as we are concerned the other 2 processes are black boxes. And for scientists involved in the computation and testing, our synthesis operation is probably a black box. But doing everything in the open, hopefully this will allow other researchers to propose other models and create derivative loops of their own.
We'd love to do the same for the anti-malarial assays but we have not found an established system in place like NCI that will do substrate screening routinely at no cost (except shipping of course).
Is it becoming clearer why I think the scientific process can be automated in novel and useful ways with the progressive adoption of Open Science?