News Summarization

Business Problem: Readers frequently do not have time to read entire articles, and reading merely the headline and subheadings does not provide them with a complete picture of the content. News organizations such as the Associated Press, Bloomberg, and Reuters are actively trying to automate stories in areas such as finance and sports. It is hard for news organizations to produce summaries for every piece they publish. As a result, having in-built tools that summarize stories for users may be a good idea for news apps.

Project Goal

The project's goal is to use different Deep Learning techniques - T5 Transformer, Encoder & Decoder with BiLSTM models, and NLP to generate coherent summaries – to generate brief descriptions of news stories.

Model accuracy of Encoder & Decoder using BiLSTM and Keras embedding layer was 46%. However, summaries generated by pre-trained T5 Transformer were more precise.

Project Structure

Code

evaluation/: Contains the evaluation scripts and data.
- evaluation.ipynb: Jupyter notebook for evaluating the model's performance using cosine similarity and other metrics.
- GPT_Similarity_Scored_Data.csv: Data file containing similarity scores evaluated by GPT.
- predictions_with_cosine_similarity.csv: Data file containing predictions with cosine similarity scores.
- predictions.csv: Data file containing the model's predictions.
LSTM_model.ipynb: Jupyter notebook for training and evaluating the LSTM model.
transformers-summarization-t5.ipynb: Jupyter notebook for fine-tuning the T5 transformer model for the summarization task.

Data

data/: Directory to store datasets used for training and evaluation.

Model

t5_fine_tuned_model.pth: Fine-tuned T5 model checkpoint.

Usage

Training the Model

Fine-Tuning T5 Transformer: Use the transformers-summarization-t5.ipynb notebook to fine-tune the T5 model on the news summary dataset.
Training LSTM Model: Use the LSTM_model.ipynb notebook to train the LSTM model for summarization. This notebook includes data preprocessing, model training, and validation steps.

Evaluating the Model

Evaluation with Cosine Similarity: Use the evaluation/evaluation.ipynb notebook to evaluate the model's performance. This notebook calculates cosine similarity between reference and generated summaries and plots the distribution of similarity scores.
Saving Predictions: The evaluation notebook saves the predictions and similarity scores in CSV files (predictions_with_cosine_similarity.csv and GPT_Similarity_Scored_Data.csv) for further analysis.

Results

The T5 Transformer model produced more precise summaries compared to the Encoder & Decoder model using BiLSTM and Keras embedding layer, which had an accuracy of 46%.

References

Conclusion

This project demonstrates the effectiveness of using advanced NLP models like T5 Transformer for the task of news summarization. The fine-tuned T5 model outperforms traditional LSTM-based models in generating coherent and concise summaries of news articles.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
code		code
data		data
README.md		README.md
Report.docx		Report.docx

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

News Summarization

Project Goal

Project Structure

Code

Data

Model

Usage

Training the Model

Evaluating the Model

Results

References

Conclusion

About

Releases

Packages

Languages

Abhinaykotla/news-summarization-T5-Transformer

Folders and files

Latest commit

History

Repository files navigation

News Summarization

Project Goal

Project Structure

Code

Data

Model

Usage

Training the Model

Evaluating the Model

Results

References

Conclusion

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages