See our paper here.
Our code is based on a fork of the SEER replication package. To read about their files, please consult SEER_README.md.
real_data_gen/
- This folder contains files related to testing SEER on new data.- To execute our scripts, first run setup.sh to unzip data files and download model weights. Note this execution requires Python 3.6.9 and the script installs requirements so consider activating a virtual environment before executing.
- To reproduce our results of using SEER with the new data, modify gen_output.slurm to include the absolute directory of this folder. Then, execute the file with
sbatch gen_output.slurm
. The department servers will complete this execution in approximately 3 hours.- For creating the tables used in the paper execute
python real_data_gen/project_similarity_analysis.py
as well as the cells in the jupyter notebooklatex.ipynb
.
- For creating the tables used in the paper execute
- To recreate attention analysis results:
- To recreate the attention matrices for Phase 2 unseen data, first navigate to the
attention_analysis
directory withcd attention_analysis
, then runattention_analysis.py
withpython attention_analysis.py --model JointEmbedder --dataset TestOracleInferencePhase2 --gpu_id 0 --fold_number 1 --reload_from 29
. Our results can be found inattention_analysis/attention_weights_images/
andattention_analysis/attention_weights_matrices/
, and this is also where your results should generate. - To recreate the attention matrices for New Data, first navigate to the
attention_analysis
directory withcd attention_analysis
, then runattention_analysis_phase3.py
withpython attention_analysis_phase3.py --model JointEmbedder --dataset TestOracleInferencePhase2 --gpu_id 0 --fold_number 1 --reload_from 29
. Our results can be found inattention_analysis/phase3_no_try_except/attention_weights_images_phase3/
andattention_analysis/attention_weights_matrices_phase3/
, and this is also where your results should generate. (Of note, while running, the script prints to the terminal methods for which it could not create the attention matrix mapping. This print statement is left here purposefully so as to keep track of what mappings are unavailable.) - To recreate the attention analysis results for Phase 2 unseen data, first navigate to the
attention_analysis
directory withcd attention_analysis
, then runmain.java
by compiling withjavac main.java
then running withjava main
. Our results can be found inattention_analysis/phase2_main_analysis_results.txt
, and your results should generate inattention_analysis/phase2_main_analysis.txt
. (Of note, while running, the script prints to the terminal methods for which it could not create the attention matrix mapping. This print statement is left here purposefully so as to keep track of what mappings are unavailable.) - To recreate the attention analysis results for New Data, first navigate to the
attention_analysis
directory withcd attention_analysis
, then runphase3_analysis_main.java
by compiling withjavac phase3_analysis_main.java
then running withjava phase3_analysis_main
. Our results can be found inattention_analysis/phase3_main_analysis_results.txt
, and your results should generate inattention_analysis/phase3_main_analysis.txt
.
- To recreate the attention matrices for Phase 2 unseen data, first navigate to the
- Attention analysis result files that were too large to store in Git were saved here as well as in the Google Drive links in the respective folders. These files will be regenerated by running the scripts above.