Regression of 5D solubility space and distributed automation
I recently reported on the plotting of a solubility surface in 3D. Marshall Moritz has now extended his measurements of the solubility of 4-nitrobenzaldehyde in 2 more mixed solvent systems (ONSC-EXP114), giving us 4 solvents and temperature. The results are stored in the SolSumMix spreadsheet.
Andrew Lang has performed a quadratic regression analysis of this space and we have pretty good agreement with the experimental data points (see the "predicted solubility" column in the above SolSumMix spreadsheet).
Although we can't easily represent the entire 5D space intuitively, we can take 3D slices of the regression to assess the fit. For example, consider the plot of mol fraction % chloroform vs. acetonitrile keeping other solvents at zero concentration. What we observe is a nice saddle shape similar to the plot we did earlier with the original data points.
Now consider a slice of mol fraction % toluene vs THF keeping the other 2 solvents at zero. For temperatures above about 0 C we observe an expected rise in solubility with temperature. However going below 0 C the curve reverses and solubility is predicted to go up a bit. This is clearly not right and it simply means that we are missing key data points in that area. A quadratic fit will insert parabolic elements giving this inversion. It is very important to understand that these models will probably do fairly well for intrapolating within our experimental range (about -25C to 40C) but will not be very helpful for extrapolating beyond this region.
Since it is difficult to manually inspect every possible slice of this 5D space Andy has created a service that returns recommendations for the most needed points to be measured next to generate a better model. We already have a DoSol spreadsheet that instructs ONS Challenge students as to the most urgent next solubility measurement to make. This additional "bot" (not quite fully automated as of yet but will be soon) integrates nicely with the collection we already have.
We aim to show that such open distributed mechanisms to requests and execute measurements is a viable way to efficiently leverage crowdsourcing to automate parts of the scientific method. If it can be applied to solubility it can be applied to other problems.