Skip to content

Commit

Permalink
[UPDATE] README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
jcorreia11 authored Aug 30, 2023
1 parent 5fe701b commit 15aef9f
Showing 1 changed file with 12 additions and 42 deletions.
54 changes: 12 additions & 42 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -67,30 +67,6 @@ Also, you should install mol2vec and its dependencies:
pip install git+https://github.com/samoturk/mol2vec#egg=mol2vec
```

<!---
### Docker
(IN PREPARATION - NOT FUNCTIONAL YET!)
1. Install [docker](https://docs.docker.com/install/).
2. Pull an existing image (X.XGb to download) from DockerHub:
```bash
docker pull XXX
```
or clone the repository and build it manually:
```bash
git clone https://github.com/BioSystemsUM/DeepMol.git
docker build ...
```
3. Create a container:
```bash
docker run ...
```
--->

### Manually


Expand All @@ -117,9 +93,6 @@ and data splitting. It also provides methods to deal with unbalanced datasets,
do unsupervised exploration of the data and compute feature importance as
shap values.

The DeepMol framework is still under development, and it is currently at a
pre-release version. New models and features will be added in the future.


### Load a dataset from a CSV

Expand All @@ -143,15 +116,15 @@ loader = CSVLoader(dataset_path='../../data/train_dataset.csv',

dataset = loader.create_dataset()

# print shape of the dataset (molecules, X, y)
# print the shape of the dataset (molecules, X, y)
dataset.get_shape()

((1000,), None, (1000,))
```

### Load a dataset from a SDF
### Load a dataset from an SDF

If you want to load a dataset from a SDF file with 3D structures, it is only required to provide
If you want to load a dataset from an SDF file with 3D structures, it is only required to provide
the path to the file. Optionally, it is also possible to provide a field with some ids,
the labels fields.

Expand All @@ -172,11 +145,11 @@ dataset.get_shape()

### Compound Standardization

It is possible to standardize the loaded molecules using three option. Using
It is possible to standardize the loaded molecules using three options. Using
a basic standardizer that only does sanitization (Kekulize, check valencies,
set aromaticity, conjugation and hybridization). A more complex standardizer can
be customized by choosing or not to perform specific tasks such as sanitization,
remove isotope information, neutralize charges, remove stereochemistry and remove
removing isotope information, neutralizing charges, removing stereochemistry and removing
smaller fragments. Another possibility is to use the ChEMBL Standardizer.

```python
Expand Down Expand Up @@ -210,15 +183,15 @@ Seq2Seq and transformer-based are in development and will be added soon.
```python
from deepmol.compound_featurization import MorganFingerprint

# Compute morgan fingerprints for molecules in the previous loaded dataset
# Compute morgan fingerprints for molecules in the previously loaded dataset
MorganFingerprint(radius=2, size=1024).featurize(dataset, inplace=True)
# view the computed features (dataset.X)
dataset.X
```


```python
#print shape of the dataset to see difference in the X shape
#print shape of the dataset to see the difference in the X shape
dataset.get_shape()

((1000,), (1000, 1024), (1000,))
Expand All @@ -237,7 +210,7 @@ from deepmol.feature_selection import LowVarianceFS
# Feature Selection to remove features with low variance across molecules
LowVarianceFS(0.15).select_features(dataset, inplace=True)

# print shape of the dataset to see difference in the X shape (fewer features)
# print shape of the dataset to see the difference in the X shape (fewer features)
dataset.get_shape()

((1000,), (1000, 35), (1000,))
Expand Down Expand Up @@ -285,7 +258,7 @@ test_dataset.get_shape()

### Build, train and evaluate a model

It is possible use pre-built models from Scikit-Learn and DeepChem or build new
It is possible to use pre-built models from Scikit-Learn and DeepChem or build new
ones using keras layers. Wrappers for Scikit-Learn, Keras and DeepChem were
implemented allowing evaluation of the models under a common workspace.

Expand Down Expand Up @@ -533,7 +506,7 @@ shap_calc.feature_explanation_plot(1)
#### Draw relevant features

It is possible to plot the ON bits (or some of them) in a molecule for MACCS Keys,
Morgan and RDK Fingeprints. IT is also possible to draw those bits on the
Morgan and RDK Fingeprints. It is also possible to draw those bits on the
respective molecule. This can be allied with the Shap Values calculation to
highlight the zone of the molecule that most contributed to a certain prediction,
for instance, the substructure in the molecule that most contributed to its
Expand Down Expand Up @@ -576,7 +549,7 @@ train_dataset = SMOTEENN().sample(train_dataset)

DeepMol provides a pipeline to perform almost all the steps above in a sequence without
having to worry about the details of each step. The pipeline can be used to perform
a prediction pipeline (last step is a data predictor) or a data transformation pipeline
a prediction pipeline (the last step is a data predictor) or a data transformation pipeline
(all steps are data transformers). Transformers must implement the _fit and _transform
methods and predictors must implement the _fit and _predict methods.

Expand Down Expand Up @@ -722,10 +695,7 @@ Baptista, Delora, Correia, João, Pereira, Bruno and Rocha, Miguel. "Evaluating

J. Capela, J. Correia, V. Pereira and M. Rocha, "Development of Deep Learning approaches to predict relationships between chemical structures and sweetness," 2022 International Joint Conference on Neural Networks (IJCNN), 2022, pp. 1-8, doi: 10.1109/IJCNN55064.2022.9891992. https://ieeexplore.ieee.org/abstract/document/9891992

<!---
DeepSweet (POSTER) ...
--->

## Licensing

DeepMol is under [BSD-2-Clause License](https://raw.githubusercontent.com/BioSystemsUM/DeepMol/master/LICENSE).
DeepMol is under [BSD-2-Clause License](https://raw.githubusercontent.com/BioSystemsUM/DeepMol/master/LICENSE).

0 comments on commit 15aef9f

Please sign in to comment.