IMPA

Image Perturbation Autoencoder (IMPA)

Image Perturbation Autoencoder (IMPA) is a computer vision model designed for style transfer on images of cells subjected to various perturbations. By learning a structured perturbation space, IMPA enables style transfer by conditioning the latent space of an autoencoder. This approach allows the model to transform a given cell image into its predicted appearance under a specified perturbation.

The perturbation space is composed of:

Perturbation embeddings: Representations capturing compound-specific physiochemical properties, such as drug characteristics.
Trainable embeddings: Learned representations that are optimized in tandem with the model during training.

Beyond style transfer, IMPA is also effective for batch correction. By training the model to harmonize cell images from different experimental sources, it can standardize data into a unified batch for downstream analysis.

Install repository

To run the model, clone this repository and create the environment via:

conda env create -f environment.yml

Navigate to the repository and install the Python package.

pip install -e .

Codebase description

All files related to the model are stored in the IMPA folder.

utils.py: contains helper functions
solver.py: contains the Solver class implementing the model setup, data loading and training loop.
model.py: implements the neural network modules and initialization function.
main.py: calls the Solver class and implements training supported by seml and sacred.
checkpoint.py: implements the util class for handling saving and loading checkpoints.
eval/eval.py: contains the evaluation script used during training by the Solver class.
data/data_loader.py: implements torch dataset and data loader wrappers around the image data.

Train the models

We trained the models using the seml framework. Configurations can be found in the training_config folder. IMPA can be trained both with and without the support of seml. This is possible via two independent main files:

main.py: train with seml on the slurm scheduling system
main_not_seml.py: train without seml on the slurm scheduling system via sbatch files

Scripts to run the code without seml can be found in the scripts folder. In a terminal, enter:

sbatch training_config.yaml

And the script will be submitted automatically. The logs of the run will be saved in the training_config/logs folder.

For other scheduling systems, the user may be required to apply minor modifications to the main.py file to accept custom configuration files. For training with seml we redirect the user to the official page of the package.

To train the model with the provided yaml files, adapt the .yaml files to the experimental setup (i.e. add path strings referencing the used directories).

Dataset and checkpoints

Datasets are available at:

BBBC021 https://bbbc.broadinstitute.org/BBBC021
BBBC025 https://bbbc.broadinstitute.org/BBBC025
RxRx1 https://www.kaggle.com/c/recursion-cellular-image-classification/overview/resources

Model checkpoints and pre-processed data are made available here.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

IMPA

Image Perturbation Autoencoder (IMPA)

Install repository

Codebase description

Train the models

Dataset and checkpoints

Files

README.md

Latest commit

History

README.md

File metadata and controls

IMPA

Image Perturbation Autoencoder (IMPA)

Install repository

Codebase description

Train the models

Dataset and checkpoints