Skip to content

Latest commit

 

History

History
13 lines (10 loc) · 1018 Bytes

README.md

File metadata and controls

13 lines (10 loc) · 1018 Bytes

biodeg-ml

Raw codes for processing chemical biodegradability data using SMILES and RDKit parameters, used to predict ready biodegradability values using regression and GNN-based methods.

This is the cumulative product of research first started at IHPC in mid 2022. The notebook contains scripts and functions for:

  • Loading and saving .sdf and .csv databases containing molecular conformers and SMILES/RDKit parameters respectively.
  • Automated extraction of above-mentioend RDKit parameters as identified.
  • Pre-processing for sk-learn models.
  • Implementation of group-based cross-validataion based on common SMILES (for multiple conformers).

References:

  1. Predicting ready biodegradability in the Japanese Ministry of International Trade and Industry test
  2. Chemprop