Image Perturbation Autoencoder (IMPA) is a computer vision model designed for style transfer on images of cells subjected to various perturbations. By learning a structured perturbation space, IMPA enables style transfer by conditioning the latent space of an autoencoder. This approach allows the model to transform a given cell image into its predicted appearance under a specified perturbation.
The perturbation space is composed of:
- Perturbation embeddings: Representations capturing compound-specific physiochemical properties, such as drug characteristics.
- Trainable embeddings: Learned representations that are optimized in tandem with the model during training.
Beyond style transfer, IMPA is also effective for batch correction. By training the model to harmonize cell images from different experimental sources, it can standardize data into a unified batch for downstream analysis.
To run the model, clone this repository and create the environment via:
conda env create -f environment.yml
Navigate to the repository and install the Python package.
pip install -e .
All files related to the model are stored in the IMPA
folder.
utils.py
: contains helper functionssolver.py
: contains theSolver
class implementing the model setup, data loading and training loop.model.py
: implements the neural network modules and initialization function.main.py
: calls theSolver
class and implements training supported byseml
andsacred
.checkpoint.py
: implements the util class for handling saving and loading checkpoints.eval/eval.py
: contains the evaluation script used during training by theSolver
class.data/data_loader.py
: implementstorch
dataset and data loader wrappers around the image data.
We trained the models using the seml framework. Configurations can be found in the training_config
folder. IMPA can be trained both with and without the support of seml
. This is possible via two independent main files:
main.py
: train withseml
on theslurm
scheduling systemmain_not_seml.py
: train withoutseml
on theslurm
scheduling system via sbatch files
Scripts to run the code without seml
can be found in the scripts
folder. In a terminal, enter:
sbatch training_config.yaml
And the script will be submitted automatically. The logs of the run will be saved in the training_config/logs
folder.
For other scheduling systems, the user may be required to apply minor modifications to the main.py
file to accept custom configuration files. For training with seml
we redirect the user to the official page of the package.
To train the model with the provided yaml files, adapt the .yaml
files to the experimental setup (i.e. add path strings referencing the used directories).
Datasets are available at:
- BBBC021 https://bbbc.broadinstitute.org/BBBC021
- BBBC025 https://bbbc.broadinstitute.org/BBBC025
- RxRx1 https://www.kaggle.com/c/recursion-cellular-image-classification/overview/resources
Model checkpoints and pre-processed data are made available here.