Interactive Visualization of ONS Solubility Data
Rajarshi Guha and Andy Lang have been very busy during the past few weeks developing visualization interfaces for the Open Notebook Science Challenge solubility data.
Information in a database is only as useful as the tools to explore it. We are striving to create interfaces that enable synthetic organic chemists to find actionable information in as intuitive a manner as possible.
To search for specific solubility measurements for a given solute in a given solvent (or to search for all solvents for a given solute), we have been using Rajarshi's handy web browser interface providing drop-down menus of available selections. The query results directly link back to the relevant pages from the laboratory notebook for further analysis. (For details of the coding see Rajarshi's blog)
We now have to ability to look for patterns in the dataset. Solubility measurements can now be plotted on a surface using Andy's service. First select a solvent, 2 molecular descriptors, a solubility cutoff value and the maximum point size from drop-down menus:
Then explore the chemical space.
In this example the solvent is methanol and molecular weight is plotted on the x axis and ALOGP is plotted on the y axis. Up to a maximum defined above, the size of the points relates to the value of the solubility, averaged from all available valid measurements. Measurements that have been judged invalid as marked as DONOTUSE in the SolubilitySum spreadsheet. That way researchers can investigate for themselves the reason for rejection of the data point. As an example, insufficient mixing time has been a cause of invalidation.
As synthetic organic chemists, we are mainly interested in how our reagents and products will behave in a given solvent. The compounds in this chemical space are starting materials for Ugi reactions and are color-coded by functionality. Red points are aldehydes and blue points represent carboxylic acids. We selected 2M as the cutoff point for point shape because it is a convenient concentration to use to mix reagents participating in the Ugi reaction, especially when considering automation. (see JoVE article) With this selection, disks are below 2M and diamonds are above.
Some interesting insights can be gleaned quickly from this plot. There are three low solubility disks among a group of diamonds near the top of the plot. The diamonds in this region represent mainly highly soluble aromatic aldehydes. By positioning the mouse over each point we can discover the details of each compound and its solubility. As shown, 2 of the disks are aromatic nitroaldehydes.
We may infer from this that methanol may not be a good solvent choice for nitroaldehydes in general. At the very least we can formulate the hypothesis and follow it up with additional measurements or re-investigate outliers. Such a pattern is difficult to observe when measurements are stuck in tables - or worse - only in lab notebook pages.
One of the limitations of a 2D plot like this is overlap of points. We can control this to some extent by making the points smaller but that doesn't eliminate the problem in all cases. One trick that we can use is introduction of a third molecular descriptor to separate points in 3D space.
Andy has done this recently in Second Life - here is a picture we took yesterday with our friend Viv on Drexel island (SLURL). The balls we are sitting on are the solubility points. The larger the ball the greater the solubility. Clicking on the balls opens a browser window to the measurements in the laboratory notebook.
Our eventual goal is to provide robust quantitative models to predict non-aqueous solubility. To see where we stand on that front see Rajarshi's recent report.
In the meantime I think we can continue to provide intuitive tools to get non-theoretical organic chemists to play with the visualization of solubility. Almost all organic reactions are performed in non-aqueous solvents so solvent selection is a very important part of the process of doing chemistry. This is especially important if one wants to engineer non-chromatographic product isolation.