This project is based on the article Data-modeling the interplay between single cell shape, single cell protein expression, and tissue state. The study combines spatial multiplexed single-cell imaging and machine learning to explore the intricate relationships between cell shape and protein expression within human tissues. The results highlight a bi-directional link between cell shape and protein expression across various cell types and disease states.
First, let's import the necessary modules and process the data. We start with reading the cells.csv and extracting shape features (and more, depends on the mode arg) for every sample we have :
import pandas as pd
###
import utils
from ProcessData import CellsDataSetTNBC
tnbc_df = pd.read_csv(r'cellData.csv')
types = pd.read_csv(r'MIBI_TNBC_idx_cell_to_type.csv')
### for this example, we will only analyze patient 1.
tnbc_df = tnbc_df[tnbc_df['SampleID'].isin([1])]
###tnbc cols to drop, noise columns..
cols_to_drop = ['cellSize','C','Na','Si','P','Ca','Fe','Background','B7H3','OX40','CD163', 'CSF-1R',
'Ta','Au','tumorYN','tumorCluster','Group','immuneCluster','immuneGroup']
tnbc_neighbors_data = CellsDataSetTNBC(data_path = r'.',
cells_data_df = tnbc_df,
types_present_in_csv = True,
cols_to_drop = cols_to_drop,
types_data_df = types,
meta_data_df = None,
mode = 'neighbors_morph')
print()
print('###'*15)
print()
## view example of a cell and it's microenv:
tnbc_neighbors_data.view_neighbors(1, 500)
Next, we'll train and eval a model using the processed data.
import utils
from torch.utils.data import DataLoader
from models import SimpleLinearNet
import copy
import torch
from tqdm import tqdm
from sklearn.metrics import r2_score
## data generator
tnbc_neighbors_loader = DataLoader(tnbc_neighbors_data, batch_size = 64, shuffle = True)
## load to disk for faster training
tnbc_dataset, tnbc_batches, tnbc_patients_ids = utils.buildDataSet(tnbc_neighbors_loader)
## train-test split
tnbc_train_data, tnbc_train_patients_id,tnbc_test_data, tnbc_test_patients_id = utils.train_test_split(tnbc_dataset,
tnbc_patients_ids)
### model HP :
device = 'cuda' if torch.cuda.is_available() else 'cpu'
EPOCHS = 100 # in this demo we will run for 100, adjust as needed..
tnbc_lr = 4e-3 # in this demo we will run for 100, adjust as needed..
##load models :
tnbc_model_full = SimpleLinearNet(in_features = tnbc_train_data[0]['x'].shape[1], out_features = tnbc_train_data[0]['y'].shape[1]).to(device)
tnbc_model_null = SimpleLinearNet(in_features = tnbc_train_data[0]['x'].shape[1] - 12, out_features = tnbc_train_data[0]['y'].shape[1]).to(device)
### params for each model
tnbc_criterion = torch.nn.MSELoss()
tnbc_optimizer_full = torch.optim.Adam(params = tnbc_model_full.parameters(), lr = tnbc_lr)
tnbc_optimizer_null = torch.optim.Adam(params = tnbc_model_null.parameters(), lr = tnbc_lr)
### track loss
tnbc_train_loss = {'null' : [], 'full' : []}
tnbc_val_loss = {'null' : [], 'full' : []}
best_loss_full = 100
best_loss_null = 100
#### Training loop!
tnbc_model_full = utils.train_eval(mode = 'full')
tnbc_model_null = utils.train_eval(model = 'null')
## save best models:
tnbc_models = {'full' : tnbc_model_full, 'null' : tnbc_model_null}
test_trues_b, test_preds_b = utils.getPreds(tnbc_models, tnbc_test_data, device, mode = 'null')
test_trues_bm, test_preds_bm = utils.getPreds(tnbc_models, tnbc_test_data,device, mode = 'full')
df = buildBoxPlotR2()
utils.plot(df)
Next, we'll look at the ft importance.
importance_df = utils.feature_importance(tnbc_model_full.to(device), tnbc_train_data[0]['x'].to(device), num_target_features = 36)
utils.plot_importance(subset = 'Cell State')
Finally, we'll look at the imporvemnt matrix.
### read the saved csv of the cells and shape features:
data = pd.read_csv('cells_plus_shape.csv')
utils.heatmap_create(data,
shape_fts = [area','eccentricity', 'major_axis_length',
'minor_axis_length', 'perimeter',
'equivalent_diameter_area', 'convex_area',
'extent', 'feret_diameter_max','orientation',
'perimeter_crofton', 'solidity', 'cell type'],
proteins = ['dsDNA', 'Vimentin', 'SMA', 'FoxP3', 'Lag3', 'CD4',
'CD16', 'CD56', 'PD1', 'CD31', 'PD-L1', 'EGFR', 'Ki67',
'CD209', 'CD11c', 'CD138', 'CD68', 'CD8', 'CD3', 'IDO',
'Keratin17', 'CD63', 'CD45RO', 'CD20', 'p53', 'Beta catenin',
'HLA-DR', 'CD11b', 'CD45', 'H3K9ac', 'Pan-Keratin', 'H3K27me3',
'phospho-S6', 'MPO', 'Keratin6', 'HLA_Class_1']
For more detailed examples and explanations, please refer to the Shape2Exp_Demo.ipynb notebook included in this repository.
- Fork the repository.
- Create a new branch (
git checkout -b feature-branch
). - Make your changes.
- Commit your changes (
git commit -m 'Add some feature'
). - Push to the branch (
git push origin feature-branch
). - Open a pull request.
This project is licensed under the MIT License - see the LICENSE file for details.