The following repository consists of a benchmark of Language Models (GPT-3.5 : text-davinci-003, BERT, and GPT-2) in the case of text summarization task of 115 sentences from the test dataset of the CNN-DailyMail News from Kaggle (https://www.kaggle.com/datasets/gowrishankarp/newspaper-text-summarization-cnn-dailymail).
Before running the scripts, and to be able to use ChatGPT for the experiments, you have to put your api key on openai_key.txt to be able to use the OpenAI models. To get it, you must an OpenAI account available by doing the following steps:
- Navigate to https://platform.openai.com/
- Click on your avatar in the top right-hand corner of the dashboard.
- Select View API Keys.
- Click Create new secret key.
API Key Good Practice Safety are available on OpenAI website : https://help.openai.com/en/articles/5112595-best-practices-for-api-key-safety
For the moment, only BERT, GPT-2 and text-davinci-003 (fine-tuned GPT-3.5) models are evaluated.
Located at the root of the project, you can generate the summarized sentences by writing in your terminal:
python experiments/summarization_task.py
On the following repository, the results are given as results/generated_summary.txt. To get the average value of BLEU, ROUGE and BLEURT metrics over the 115 sentences summarized, you have to write:
python experiments/benchmark.py
Creating a table : model_summarization_score.csv in results folder.
Python 3.11.2
- bert-extractive-summarizer==0.10.1
- openai==0.27.4
- rouge==1.0.1
- tensorflow==2.12.0
- tensorflow-datasets==4.9.2
- torch==2.0.0
- transformers==4.28.1
You also have to install BLEURT metric from https://github.com/google-research/bleurt#readme such as they stated:
pip install --upgrade pip # ensures that pip is current
git clone https://github.com/google-research/bleurt.git
cd bleurt
pip install .