Monday, May 03, 2010

ChemSpider SyntheticPages

I recently mentioned the Reaction Attempts project, which aims to collect organic chemistry experiments - especially those that are "failed", in progress or somehow incomplete.

For reactions where the desired product has been obtained and fully characterized, ChemSpider SyntheticPages also offers a very convenient publication vehicle. As I mentioned previously there is a need for enabling the publication of single experiments, especially when these are unlikely to become part of a traditional article.

We are in the process of submitting suitable reactions from the UsefulChem project to CS|SP. This will require some re-formatting of procedures and characterization data as they currently appear in the lab notebook.

Here is an example of one of our Ugi reactions: SyntheticPage 406 (UCEXP176C)


A nice feature of these pages is the automatic rendering of 2D structures upon hovering on top of chemical names.


Here are a few more reasons to use ChemSpider SyntheticPages:
* ChemSpider SyntheticPages takes you directly to a procedure. When you get a hit - you get a procedure.
* ChemSpider SyntheticPages provides information that may not generally be found elsewhere, such as frequently encountered problems, trouble-shooting tips, the number of times the reaction has been carried out, scale-variation etc.
* ChemSpider SyntheticPages is the only interactive chemistry database. Information is constantly updated and validated by comments from the user community (Peer Review in the Public Domain™).
* ChemSpider SyntheticPages can provide you with the most up-to-date method, we aim for 95% of submissions to be processed within 48 hours of submission.
* ChemSpider SyntheticPages is free of charge.
[Disclaimer: I am a member of the editorial group at CS|SP]

Labels: , , , ,

Friday, November 20, 2009

CAS curates strychnine m.p. - ChemInfo Class 9

What is going to distinguish chemistry databases as we move forward in this Web2.0 world?

If I was unsure of it when I started teaching Chemical Information Retrieval 2 months ago, I certainly got my answer yesterday afternoon. Cristian Dumitrescu from CAS contacted me to discuss the problems I had encountered when attempting to use SciFinder to find the melting point of strychnine. He had read my blog post and wanted to make sure he understood the problem. So I had a conference call with him and a CAS colleague and I explained that several m.p. values corresponded to strychnine salts instead of the free base. They agreed to rectify the situation.

Apparently Cristian stays on top of what is being said about CAS products from various sources, including the blogosphere. I think that what will distinguish chemistry databases as we move forward is precisely this type of proactivity and responsiveness.

There are a plethora of databases out there to search for chemical information. Most of them contain surprisingly significant amounts of incorrect data. My students are in the process of demonstrating that with their assignment on finding 5 sources for 5 properties of a chemical of their choice. When they are done in 2 weeks I'll post about that, perhaps doing a top 10 worst data points.

CAS is an example of a commercial database. But the same principle applies to free databases as well.

Consider the glatiramer acetate problem I reported on previously. ChemSpider immediately removed the entry because a random polymer was being incorrectly represented as a physical mixture of amino acids. As far as I know no other free databases have corrected the problem, although contact information for people running various databases was provided by Michael Kuhn and Egon Willighagen on FriendFeed.

I spoke with Cristian about the problem and he said he would look into it. Upon doing a search for glatiramer acetate on SciFinder it appears that there is currently a problem. The text correctly explains that this is a polymer but the empirical formula looks like just a physical mixture of amino acids, with an extra H2O per unit that should not be there after amide formation. But this was minor compared to the problems I reported on previously - for example there were no incorrectly calculated molecular properties, although the images did not represent the structure of the polymer.
This has been a good week for curation. Yesterday Nick successfully completed the evaluation of the stereochemistry of nargenicin and submitted the corrected SMILES to ChemSpider. Tony Williams has already incorporated the fix and now a search for nargenicin on ChemSpider gives just one entry.

Tony has provided several such puzzles for my students and a few are close to resolving the structures. The main problem is that the structures were entered into ChemSpider with at least one undefined stereocenter. Finding the correct structure from the primary literature can be very challenging for structures of this complexity but it certainly puts the chemical information retrieval methods I am teaching my students to good use.

The class itself was short - and covered mainly just details of student assignments - since we won't have much time during the last class on December 3, 2009 for a workshop. Rajarshi Guha and Tony Williams will be my guest lecturers on that day.

Labels: , , , , ,

Thursday, November 05, 2009

Sixth Cheminfo Retrieval class: What is the m.p. of strychnine?

It would seem to be a simple task to find the melting point of a well known alkaloid like strychnine. Our quest to answer that question - and other simple properties - in class using both freely available and commercial databases reveals how treacherous it can be. In the end we don't find an unambiguous answer but we uncover enough information for many applications.

The take home message is that chemists need to be constantly paranoid that their information - whether from their lab or the most prestigious journals - can easily be wrong. Strategies such as finding multiple sources and investigating the experimental details provided in the primary sources are demonstrated to diminish uncertainty. But this is often not easy or quick.

Here is a summary of the lecture:

This is the lecture from the sixth Chemical Information Retrieval class at Drexel University on October 29, 2009. It starts with a review of some of the new questions answered by students from the chemistry publishing FAQ, which covers patent information and accessing electronic journals at Drexel. Tony Williams submitted a puzzle to resolve conflicting structures in ChemSpider, which is too difficult to be a regular assignment. It requires re-analyzing spectroscopic data in papers where stereochemical assignments are determined. An example is paromomycin which has three entries. The regular assignment for the week is then introduced and it involves obtaining 5 different sources each for 5 different properties for a molecule of the student's choosing. To demonstrate how to do the assignment strychnine is chosen as an example. Melting point information is obtained from ChemSpider (ultimately an MSDS sheet), Wikipedia, Wolfram Alpha and in a JACS article via SciFinder. By investigating primary sources several errors are found in SciFinder, where the recorded melting points correspond to salts of the alkaloid. Difficulties in finding primary sources for the melting point from Wikipedia are highlighted. For LD50 information Wikipedia did not even provide proper units (mg instead of mg/kg and no animal or route specified). The importance of ChemSpider predicted values for density and boiling point is demonstrated as a corroborating tool. In the end the reported melting point range of strychnine from the JACS paper did not even overlap with the reference to which it was compared. The exercise is meant to highlight the importance of caution in obtaining values from all available sources. Even the seemingly simple question of determining the melting point of well known alkaloid cannot be answered definitively.

Labels: , , , , , ,

Wednesday, November 04, 2009

Glatiramer Acetate Cheminformatics Problem and Fifth ChemInfo Retrieval Class

It started out innocently enough. One of my students picked the multiple sclerosis drug glatiramer acetate for his project in my Chemical Information Retrieval class. This ultimately resulted in the removal of this substance from ChemSpider.

The problem is that this drug is a polymer but it is represented in many places as a simple mixture of acetic acid and 4 amino acids (L-Ala, L-Glu, L-Lys, and L-Tyr). See for example Wikipedia, PubChem and DrugBank.


The SMILES representation is entered as 5 molecules joined by periods:
CC(O)=O.C[C@H](N)C(O)=O.NCCCC[C@H](N)C(O)=O.N[C@@H](CCC(O)=O)C(O)=O.N[C@@H](CC1=CC=C(O)C=C1)C(O)=O
This is probably the source of all subsequent miscalculations - such as a molecular weight of 623.7 (it actually has an average MW one order of magnitude larger), molecular formula C25H45N5O13, Topological Polar Surface Area of 374, Rotatable Bond Count 13, a 3D structure that is nowhere near reality, etc.

Glatiramer acetate is reported to bind to MHC molecules. If these molecular descriptors are used in any type of QSAR analysis this will just add noise to the models.

ChemSpider does not keep track of polymers, except perhaps for some well defined oligopeptides that can be represented by a single SMILES. Consequently it was removed from the database.

It is difficult to apply common cheminformatics tools to this substance. It might be tempting to try to place it in polypeptide/protein databases such as BioPD. But it does not have a well defined length or composition. In fact it is a random co-polymer so it can not even be represented by a repeating structure, such as one might do for polystyrene.

In order to generate meaningful molecular descriptors for QSAR applications I suppose one strategy would be to generate a collection of SMILES representing the average composition of the drug in terms of ratios of amino acids and molecular weights. Each structure would generate molecular descriptors and 3D structures that are far more realistic than those currently listed. Perhaps it would turn out that only some of these polymer structures interact with MHC molecules. (If this has already been done please forgive the oversight - I didn't research this thoroughly. By the end of the term we should know more from the student's report)

The chronological summary of the lecture is as follows:

The fifth Chemical Information Retrieval class on October 22, 2009 started out with covering the new 3D structure viewer introduced recently at PLoS ONE to provide ideas for students doing a multimedia project this term. The current student answers to the chemistry publishing FAQ are then discussed. The reason for removing glatiramer acetate from ChemSpider is explained and a few databases (Wikipedia, PubChem, DrugBank) are visited that still contain the incorrect SMILES, 3D structure and related properties. An overview of an Open Access site (OAD) suggested by Bill Hooker is provided to suggest additional questions for the FAQ. Examples of questions discussed include primary and secondary sources, peer review, article level metrics (a PLoS ONE article on malaria is used as an example), citation searching, Impact Factors and whether one should use one's real name in the blogosphere. Databases Scirus, Web of Science and PubMed are also reviewed.

Labels: , , , , , , , ,

Tuesday, October 20, 2009

Fourth Cheminfo Retrieval class: ChemSpider and Beilstein Databases

Peggy Dominy, our chemistry librarian at Drexel, was kind enough to teach my third class while I was at NERM. She demonstrated RefWorks - including how to copy and paste the proper formats to Wikispaces - and how to use our ILL (Inter-Library Loan) process.

I'm including a recording of the fourth class on Chemical Information Retrieval on Oct 15, 2009 at Drexel University. It starts with some tips on removing formatting from Wikispaces pages, the Drexel Cisco VPN client for accessing paid subscriptions off campus and how to link to a DOI. The first two assignments for the class are then described. The first involves summarizing each paragraph of an article and an option to use AcaWiki is demonstrated. The second involves filling in an FAQ for publishing in chemistry. FriendFeed is then presented as a resource to help answer questions followed by an extensive overview of available information on ChemSpider, covering SMILES, InChIs, InChIKeys, experimental and predicted properties, linked databases and contributed spectra. Finally a demonstration of Beilstein Crossfire/DiscoveryGate is presented with an emphasis on doing substructure searching.

Labels: , , , , , ,

Thursday, October 15, 2009

NERM 09 session on Chemistry on the Web

Last week, on October 9, 2009 I presented at the ACS NERM conference. Martin Walker hosted a session on Publishing and Promoting Chemistry in the Internet Age. All of the talks were quite interesting and fit perfectly with the topic:
Martin Walker Chemistry on the Internet
Elizabeth Brown The Chemist's Toolkit for Publishing and Promoting Your Work On the Internet
Antony Williams Navigating the Complex Web of Chemistry Using ChemSpider
Jean-Claude Bradley Leveraging Transparency and Crowdsourcing in Chemistry Using Open Notebook Science
My talk consisted of an overview of Open Notebook Science with some new content on solubility prediction algorithms written by Andrew Lang and a few example of students taking a Chemical Information Retrieval class at Drexel University using research logs on a wiki to flesh out their projects.



Labels: , , , , , ,

Thursday, August 20, 2009

My talk at ACS FA09 on Social Networking Tools and Teaching Chemistry

Yesterday (August 19, 2009) I gave my last talk at the American Chemical Society meeting in Washington. I presented on Using social networking tools a la carte for organic chemistry education: Wikis, blogs, Second Life, and more for the Symposium on Using Social Networking Tools to Teach Chemistry, organized by Laura and Henry Pence:
12:05 PM Wikis in chemical education: The best of two worlds
Laura E. Pence
12:25 PM ChemPaths: Learning to meander — an online portal to ChemEd DL resources for intrinsically linked learning
Justin M. Shorb, John W. Moore
12:45 PM ChemEd DL WikiHyperGlossary
Robert E. Belford, J. W. Moore, Daniel Berleant, Michael Bauer, Jon L Holmes, Kyle E. Yancey
1:05 PM Social media: Immersion and its discontents
Elizabeth M. Dorland
1:35 PM Using social networking tools a la carte for organic chemistry education: Wikis, blogs, Second Life, and more
Jean-Claude Bradley, Andrew Lang
1:55 PM SNS, IM, and textng vs. traditional e-mail and voice messaging as a means of facilitating instructor-student contact: Trends and habits of student usage, and techniques to avoid electronic overload (or withdrawal)
Robert B. Gregory
2:15 PM Faculty development, collaborative inquiry, and Web 2.0
Joanne L. Stewart
2:35 PM Are netbooks the next big thing in the chemistry classroom?
Harry E. Pence
2:55 PM Managing laboratory research data using cloud computing as an organizational tool
Harry E. Pence, Jacqueline Bennett
It was a really entertaining symposium. Laura Pence talked about using Wikispaces (the same platform I use) for student projects and emphasized how helpful it is to compare wiki page versions to evaluate each student's contributions. Justin Shorb presented on his work to wikify a chemistry textbook. Despite a broken arm, Bob Belford did a great job in presenting his Wikihyperglossary project. It marks up chemistry terms on web pages, similar to the approach taken by ChemMantis.

Liz Dorland provided a wonderful overview of how Second Life can be used from an educational standpoint, very much complimentary to the content I had on my slides. There is just so much content and so many projects now on that virtual world that it is difficult to appreciate without actually going in and taking a tour but sometimes a good talk can motivate people to give it a closer look.

Robert Gregory's talk was very funny and somewhat shocking: he gave out his cell phone and asked his students to contact him 24/7 - including 2:00 AM when he was sleeping. Joanne Stewart gave an overview of how inorganic teachers kept in contact using various social networking tools and valuable it was for both collaboration and support.

Harry Pence
gave two very humorous talks at the end. The most interesting point for me was his collaboration with Jacqueline Bennett, who used Google Spreadsheets to collect experimental results from her students. An example of that work recently appeared in Green Chemistry, 2009, 11, 166 - 168. This is especially relevant for our research - because we also use Google Spreadsheets to aggregate results - but her reaction involves finding the right solvent for mixing an aldehyde and amine and obtaining a pure imine as a precipitate, exactly the same approach for our preparation of Ugi products. Perhaps there is a future collaboration there.

All of the presentations were recorded and I will post a link when available.

Here is the summary of my talk and the recording:

Jean-Claude Bradley describes the use of social networking tools to teach undergraduate organic chemistry. Public free wikis can be used effectively to manage class information as well as serve as a versatile platforms to process student assignments and provide rapid feedback. Examples of using Second Life to deliver quizzes, play games and offer students an environment to create projects involving 3D molecules, spectra and posters are detailed. The continuously evolving role of blogs, podcasting, screencasting and newer faster interactive platforms such as FriendFeed will be outlined. New technologies create the need for new skills to be taught to students - some relating to networking and some involving knowlege of the language to navigate the chemical webspace (such as SMILES and InChI).


Labels: , , , , , ,

Tuesday, August 18, 2009

Spectral Game talk at ACS Fall 09

Yesterday (August 17, 2009) I gave my talk on the Spectral Game at the Using Technology to Enhance Learning in Organic Chemistry symposium at the American Chemical Society meeting. I was not able to attend the entire symposium but luckily I did catch David Soulby's talk on using Google groups to distribute NMRs for labs that require many students to submit samples. I am a fan of using free and hosted services to simplify workflows of all types.

Also in attendance at the symposium were Liz Dorland and Bob Hanson. It was good to catch up with them. Bob shared a story of how he has been assigning his students tasks in his organic chemistry class which lead to updating Wikipedia. There is so much potential for using the educational infrastructure to create better scientific content for everyone.

My talk on the Spectral Game highlighted the role of openness in teaching and research to create new educational tools, especially for learning NMR. Tony Williams said a few words at the end about ChemSpider, RSC and some upcoming opportunities to publish synthesis articles on ChemSpider.

Labels: , , , , , , , ,

Thursday, January 29, 2009

The ChemSpider Journal and ChemMantis

The ChemSpider Journal of Chemistry is about to go live. This is not just another chemistry journal. Not only does it boast the option of an open peer-review in addition to Open Access, but it takes us tantalizing closer to the promise of Web3.0: the semantic web. This is achieved by a sophisticated mark-up system generated by ChemMantis. The automatic identification of molecules is impressive enough. But it also marks up functional groups, reactions, spectral data and even biological entities.

For an example consider this article, which was actually a proposal that I wrote with Rajarshi Guha and Tony Williams. Simply by hovering over the marked-up text "Ugi reaction", it pulls up a brief summary from Wikipedia. What makes this semantic is that it already knows that this is a chemical reaction and not a molecule or a virus.

When you hover over the name of a molecule it knows to render it accordingly and provide appropriate links. Consider this example:
This makes the experience of reading a chemistry article a lot richer. But another payoff is coming in what machines will do when they are able to associate concepts instead of just text. Most importantly it does not require authors to do any extra work.

Tony has more information on his blog - and new submissions are welcome.

Labels: , ,

Monday, November 10, 2008

ChemSpider Talk at Drexel Nov 12 2008

Antony Williams is giving a talk on ChemSpider at Drexel University at 2:00 PM Wednesday November 12, 2008. (Disque 109, corner of 32nd and Chestnut streets, Philadelphia). Even if you attended his talk at Drexel a few months ago, come back to learn about the newest ChemSpider tools.
Crowdsourcing, Collaborations and Text-Mining in a World of Open Chemistry

There is an increasing availability of free and open access resources for scientists to use on the internet. Coupled with the increasing availability of Open Source software tools we are in the middle of a revolution in data availability and tools to manipulate these data. However, freedom costs and in many cases the cost is quality. ChemSpider has taken on the task of both robotically and manually curating publicly available data sources. This presentation will provide an overview of the issue of quality in many chemistry-related databases, approaches to cleaning up the data and how a curated platform can become the centralized hub for resourcing information about chemical entities. This includes experimental and predicted properties, analytical data, publications, suppliers and integrated databases. I will detail three efforts :1) the curation of chemistry on Wikipedia 2) an examination of structure integrity on the FDA Daily Med website, a web site of medication content and labeling as found in medication package inserts 3) recognizing chemical names in documents and providing a platform for structure-based searching of Open Access chemistry literature.

Labels:

Saturday, August 23, 2008

Tony Williams Drexel Visit

Tony Williams stopped by Drexel on August 21, 2008. After a nice chat with Martin Walker over lunch, Tony presented a demo of ChemSpider. The timing was not great because this was concurrent with the ACS meeting in Philadelphia. However, Tony recorded the session and it is available here as a Flash screencast.

Tony also took the opportunity to make an announcement of some new text mark-up features about to be made available on ChemSpider. For a brief video and description see his blog post.

I met Tony when I was a graduate student at the University of Ottawa and he was running the NMR facilities there. Even though he had not touched an NMR in 12 years he clearly has not lost his touch and got us out of a serious jam with the acquisition of carbon spectra on our Varian instrument. Just like old times.

Once you get to know Tony you'll appreciate why so many of us are willing to support and put our trust in ChemSpider.

Labels: , ,

Friday, May 23, 2008

NMR prediction on ChemSpider

As Tony recently mentioned, there is a new button on ChemSpider to predict H NMR spectra based on the nmrdb.org web service:


To give it a spin I am posting the experimental spectrum of Ugi product UC-150D underneath the predicted one.


This is going to be extremely helpful and yet another reason for using ChemSpider in active chemistry research. However, this tool does not replace the need for understanding how to interpret NMR spectra.

First, two of the predicted peaks - the phenanthrene H at 8.5 ppm and the benzylic H at 5.7 ppm - are off by almost half a ppm. Second, the algorithm does not take into account the diastereotopic nature of the methylene group centered at 4.8 ppm. This is predicted to be a singlet but appears, as expected, as a pair of doublets.

With this new tool there is a danger that students might think that they don't need to learn the finer details of NMR analysis since the predicted spectrum just pops up so conveniently. I hope people will report on what they find to be most and least reliable as they work on real problems.

The beauty of ChemSpider is that both the theoretical and experimental spectra can be stored in the same record. Yet another reason to continue to routinely upload our spectra.

Tag: InChIKey: PBZQTKRWYXTXIS-WLRTZDKTBU

Labels: , ,

Saturday, June 23, 2007

InChIMatic, ChemSpider and UsefulChem

Rich Apodaca wrote about using his InChIMatic service to track molecules in UsefulChem.

Because we use InChIs in blog posts and HTML pages generated automatically from the molecules blog, doing an InChI search in Google is a pretty good way to find molecules of interest to UsefulChem. However, Rich makes the valid point that these pages do not always point to the experiments where they are used.

I was aware of the limitations of using a blog to track molecules when I set it up. Because we were limiting ourselves to a few hundred molecules, the blog served its purpose much as I expected it would.

But now, as we move to the manipulation of tens of thousands (and soon to millions) of molecules, we need to transition to a true database.

I've been working with Tony Williams to use ChemSpider for this purpose. UsefulChem has been a supplier in ChemSpider for several weeks and most of our molecules from the molecules blog have been indexed. In the next few days the first 68,000 molecules from the CombiUgi project should be incorporated as well.

This effectively moves the indexing and searching burden to a free hosted service that is designed to handle it. This is the same logic that I used when choosing Wikispaces to act as our group laboratory notebook.

Lets take a look at an example of how this can work.

Click on the Search button of ChemSpider then hit "Advanced". Under "Search by Data Source" select UsefulChem. Scroll to the top of the page and select "Search by Structure" then "Draw". Select "Substructure" then draw a furan ring


You should get about 10 hits.


Click on the 5-methylfurfurylamine to see its record in ChemSpider.


This record can be curated or annotated. I'm hoping we can use this interface to annotate with links to spectra, references, etc. But for now just click on its InChI and you'll get a Google search finding that molecule on UsefulChem blogs, Chemical Blogspace and an experiment page (EXP086) where its was used.

In order for that to work well, we need the InChIs to be generated for every molecule in every experiment. We've been putting the InChIs in the Tags section of each experiment page and it is now on the highest priority of our Experimental Format page to make sure that it gets done quickly.

Note that these InChI's could be scraped fairly easily from every UsefulChem experiment because of the standard format for specifying the experiment page.

The only issue left to really complete the process is an automated way to add new molecules to ChemSpider. Tony says that will be done soon.

Chemical Blogspace Tags

InChI=1/C6H9NO/c1-5-2-3-6(4-7)8-5/h2-3H,4,7H2,1H3
5-methylfurfurylamine

Labels: , ,

Creative Commons Attribution Share-Alike 2.5 License