Today MolPort is launching a new initiative with the goal of crowdsourcing screening solubility of compound. This blog post explains why. You are welcome to use one of these quick links for navigation to the right resource.

Quick links:

About the initiative

One of the issues that screening compound buyers bring up most frequently is compound solubility. It used to be discussed at compound management conferences. Scientists and compound managers estimate that 1-5% of the compounds they purchase for their screening projects or compound libraries can not be dissolved. For large library acquisitions this is sort of an annoying fact of life. You spend time to carefully build your library – perform diversity analysis and choose compounds carefully to represent as broad a chemical space as possible. And then when you procure the compounds you get out-of-stock cancellations, QC failures and insoluble compounds. The chemical space puzzle you labored so carefully to cover in full ends up having blank spots. While you do not pay for cancelled compounds and get a refund for QC-failed compounds, there are usually no guarantees that a compound is soluble and therefore you end up paying for samples that you can not use. Therefore, this category is a real budget-waster.
Large compound libraries typically have multiple similar compounds. Cancellation of any single compound may not affect screening results significantly. As in-silico modeling algorithms improve and are augmented by machine learning algorithms, the typical size of a screening library is shrinking. The loss of data points for any reason is therefore more important. Losing a few data points because of compound insolubility may mean that you need to buy another set of compounds, wait for their delivery and screening results.
We are working to increase screening compound sourcing ease and efficiency: reducing lead times, minimizing cancellations with more up-to-date inventory data, reformatting of samples. However, compound insolubility is still a common concern. Software tools predicting solubility compound are constantly improving. We have seen how their use makes a real improvement. Yet, they do not yet have a perfect prediction capability. Why can’t solubility be measured? It is a complex test and it would be expensive and time consuming to run it on 7 million commercially available compounds.
Most often, screening compounds are dissolved in DMSO. Currently 10mM concentration seems to be the industry standard. A precise solubility value is not needed. All one needs to know when buying compounds for screening is whether it is possible to create their 10mM DMSO solution.
Pharmaceutical companies, academic and non-profit screening institutes and other R&D centers have been buying screening compounds for over a decade. It is reasonable to assume that nearly every compound suppliers’ list in their catalogs has been purchased and attempted to dissolve. Information systems of organizations are full of data: what is soluble and what is not. Yet, apart from a few mentions in publications, this information is locked up and inaccessible. Organizations continue to waste their resources on compounds that someone already tried and failed to solubilize.
We at MolPort have discussed compound solubilization issues with our clients. What can we do to help scientists around the world to avoid buying not fit for purpose compounds? One way to do that is to create a solubility of organic compound data bank. Several of our clients agreed to share their data with us. We already know which compounds they ordered, so confidentiality is not an issue. We combined data from multiple sources and incorporated this information on our website and downloadable data. You can read more about it on a documentation page. These data are available free of charge to anyone interested. Currently we have data on about 1/7 of the stock compounds listed in the MolPort database. These data may not be perfect. Use it as a guide. We internally at MolPort do not say a compound is insoluble, we say that solubilization problems were reported by a client. So don’t use this data as a training set for solubility software. It is not meant for that. But the label name aside, it may be safer to choose a soluble analogue of a compound with reported solubilization problems, when that is possible.

How to contribute

And now the important part. We know that not every organization will be able to share their data. But everyone can benefit from knowing compound solubility. We lack data for 6 million compounds. Please, contribute your data to this pre-competitive initiative. If you can’t share data, talk about this project to someone who can share. It does not matter that you have few data – you may have data on compounds that no one else has covered. Help us spread the message. The more organizations that participate, the quicker we will cover all commercially available compounds.

Do you have suggestions for improvements or questions? If so, please, use the blog comments below this post or contact MolPort privately.

A data template, to share solubility data for addition to the online MolPort database, together with more details about database is available on MolPort Solubility Center.

We will use hashtag #CompoundSolubility to share news about this project.