Skip to content

gersteinlab/step-back-profiling

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

STEP-BACK PROFILING: Distilling User History for Personalized Scientific Writing

GitHub GitHub repo size GitHub last commit

This repository contains the code and dataset for the paper "STEP-BACK PROFILING: Distilling User History for Personalized Scientific Writing".

Overview

Table of Contents

Dataset Generation

The dataset generation process involves the following steps:

  1. Download the raw data here s2orc_4000.json

  2. Get sampled author list and paper list in JSON format:

    • dataset/data_construction.ipynb
  3. Extract author's research interests:

    • dataset/s2orc-rq.ipynb
  4. Extract research questions from papers:

    • dataset/research_question_extraction.ipynb

PSW Results Generation

For generating results for each task, follow these steps:

  1. Get User Profile:

    • psw_result/author_profiling_cot.ipynb
  2. Generate title for single author:

    • psw_result/single_agent_title_generation.ipynb
  3. Generate results for multiple authors & evaluation for each task:

    • psw_result/task1_solving.ipynb
    • psw_result/task2_solving.ipynb
    • psw_result/task3_solving.ipynb
    • psw_result/task4_solving.ipynb

LaMP Results Generation

LaMP

The lamp_result/ directory contains the following notebooks:

  • lamp_result/cot_generation.ipynb
  • lamp_result/final_output_generation.ipynb
  • lamp_result/user_profile_generation.ipynb

These notebooks are used for generating user profiles and final outputs for the LaMP dataset.

Citation

If you find this work useful, please cite our paper:

@misc{tang2024stepback,
      title={Step-Back Profiling: Distilling User History for Personalized Scientific Writing}, 
      author={Xiangru Tang and Xingyao Zhang and Yanjun Shao and Jie Wu and Yilun Zhao and Arman Cohan and Ming Gong and Dongmei Zhang and Mark Gerstein},
      year={2024},
      eprint={2406.14275},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
}

License

This project is licensed under the MIT License. See the LICENSE file for more details.

About

No description or website provided.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published