This repo provides the source code for our KDD 2022 paper "Learning Backward Compatible Embeddings". Weihua Hu, Rajas Bansal, Kaidi Cao, Nikhil Rao, Karthik Subbian, Jure Leskovec.
If you make use of the code/experiment in your work, please cite our paper (Bibtex below).
@inproceedings{hu2022learning,
title={Learning Backward Compatible Embeddings},
author={Hu, Weihua and Bansal, Rajas and Cao, Kaidi and Rao, Nikhil and Subbian, Karthik and Leskovec, Jure},
booktitle={Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery \& Data Mining},
year={2022}
}
Problem formulation.
The embedding team trains embedding model
Overview of our framework.
We train a new embedding model
We used the following Python packages for core development. We tested on Python 3.8
.
pytorch 1.10.1+cu102
torch-geometric 2.0.3
tqdm 4.59.0
Then, install our recsys
package as follows.
pip install -e .
cd dataset_preprocessing/amazon
and follow the instruction here. This woud download the raw data and create files/
directory that stores all the pre-processed data.
Generate all the scripts for the intended task by
cd embevolve/intended_task
python run_intended.py
Run all the scripts. This would train embedding models and save the embeddings as well as their intended task performance under embevolve/intended_task/checkpoint/
.
Generate all the scripts for the unintended tasks by
cd embevolve/unintended_task
python run_unintended.py
Run all the scripts. This would train consumer models and save them under embevolve/unintended_task/checkpoint/
.
Furthermore, the saved consumer models will be used to make predictions. The results are saved under embevolve/unintended_task/result/
.
Once everything above is run, you can generate the tables and figures in our paper using our jupyter notebook at embevolve/analyze_all_results.ipynb
.