File: README.md

package info (click to toggle)
rdkit 202503.6-3
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid
  • size: 222,000 kB
  • sloc: cpp: 411,111; python: 78,482; ansic: 26,181; java: 8,285; javascript: 4,404; sql: 2,393; yacc: 1,626; lex: 1,267; cs: 1,090; makefile: 581; xml: 229; fortran: 183; sh: 121
file content (14 lines) | stat: -rw-r--r-- 893 bytes parent folder | download | duplicates (4)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
# Description of data files in this folder

## Solubility dataset
- solubility.test.sdf (257 records)
- solubility.train.sdf (1025 records)

The two sdf files(hereby named "solubility dataset") are originated from the Huuskonen dataset. The Huuskonen dataset contains a training set of 884 compounds and a randomly chosen test set of 413 compounds.
- Reference: Huuskonen, J. (2000). Estimation of Aqueous Solubility for a Diverse Set of Organic Compounds Based on Molecular Topology. Journal of Chemical Information and Computer Sciences, 40(3), 773–777. https://doi.org/10.1021/ci9901338

This solubility dataset is originally downloaded from
- http://cheminformatics.org/datasets/huuskonen/index.html

Although cheminformatics.org no longer exists, supplementary file from https://doi.org/10.1021/ci9901338 contains a list of all the structures and the corresponding data in PDF format.