Skip to content

theislab/IMPA

Repository files navigation

IMPA

Image Perturbation Autoencoder (IMPA)

Image Perturbation Autoencoder (IMPA) is a computer vision model designed for style transfer on images of cells subjected to various perturbations. By learning a structured perturbation space, IMPA enables style transfer by conditioning the latent space of an autoencoder. This approach allows the model to transform a given cell image into its predicted appearance under a specified perturbation.

The perturbation space is composed of:

  • Perturbation embeddings: Representations capturing compound-specific physiochemical properties, such as drug characteristics.
  • Trainable embeddings: Learned representations that are optimized in tandem with the model during training.

Beyond style transfer, IMPA is also effective for batch correction. By training the model to harmonize cell images from different experimental sources, it can standardize data into a unified batch for downstream analysis.

Install repository

To run the model, clone this repository and create the environment via:

conda env create -f environment.yml

Navigate to the repository and install the Python package.

pip install -e .

Codebase description

All files related to the model are stored in the IMPA folder.

  • utils.py: contains helper functions
  • solver.py: contains the Solver class implementing the model setup, data loading and training loop.
  • model.py: implements the neural network modules and initialization function.
  • main.py: calls the Solver class and implements training supported by seml and sacred.
  • checkpoint.py: implements the util class for handling saving and loading checkpoints.
  • eval/eval.py: contains the evaluation script used during training by the Solver class.
  • data/data_loader.py: implements torch dataset and data loader wrappers around the image data.

Train the models

We trained the models using the seml framework. Configurations can be found in the training_config folder. IMPA can be trained both with and without the support of seml. This is possible via two independent main files:

  • main.py: train with seml on the slurm scheduling system
  • main_not_seml.py: train without seml on the slurm scheduling system via sbatch files

Scripts to run the code without seml can be found in the scripts folder. In a terminal, enter:

sbatch training_config.yaml 

And the script will be submitted automatically. The logs of the run will be saved in the training_config/logs folder.

For other scheduling systems, the user may be required to apply minor modifications to the main.py file to accept custom configuration files. For training with seml we redirect the user to the official page of the package.

To train the model with the provided yaml files, adapt the .yaml files to the experimental setup (i.e. add path strings referencing the used directories).

Dataset and checkpoints

Datasets are available at:

Model checkpoints and pre-processed data are made available here.