Automated Radiology Report Generation from Chest X-Rays

Introduction

In modern healthcare, radiology plays an essential role in diagnosing and managing numerous medical conditions. Chest X-rays are among the most widely used diagnostic tools to detect abnormalities such as Pneumonia, Hernia, and Cardiomegaly.

Project Motivation:
This project aims to automate the generation of preliminary radiology reports from chest X-ray images by leveraging advanced computer vision techniques and large language models. This system serves as an aid for radiologists by:

Enhancing productivity
Reducing delays
Minimizing errors due to workload fatigue

The report covers:

An overview of the dataset structure and features
Detailed methodology and preprocessing steps
Model design, training, evaluation, and optimization techniques
Performance metrics and analysis
Potential further improvements

Dataset Description

The project uses the MIMIC-CXR dataset, which includes:

15,000 chest X-ray images (originally in DICOM format, converted to PNG)
Associated radiology reports in XML format

Key Dataset Features:

Image File Path: Location/link of the corresponding chest X-ray image.
Findings: Textual descriptions of abnormalities or observations.
Impression: A concise summary of the primary conclusions.

Pathology Labels (14 Total):

Atelectasis
Cardiomegaly
Consolidation
Edema
Enlarged Cardiomediastinum
Fracture
Lung Lesion
Lung Opacity
Pleural Effusion
Pleural Other
Pneumonia
Pneumothorax
Support Devices
No Finding

Methodology

The project is structured into several key stages:

1. Data Collection and Preprocessing

a. Data Extraction

DICOM to PNG Conversion:
A custom script converts the original DICOM images to PNG format, reducing file size while preserving image quality for efficient loading and processing.
CSV Creation:
A dedicated script extracts the following fields:
- image_ID: Unique identifier for each image.
- image_path: Consolidated file paths to each PNG image.
- findings and impressions: Parsed from XML reports.

b. Data Pre-processing

Text Cleaning:
- Expanding abbreviations (e.g., "lat" → "lateral")
- Removing special characters
- Fixing spacing around punctuation
Filtering and Label Mapping:
Invalid or missing entries are removed, and findings are mapped to a list of specific disease labels.
Image Augmentation:
Applied techniques include:
- Resizing to (224, 224)
- Random rotations and flips
- Noise addition
- Normalization

c. Dataset Split

A custom function get_dataloaders creates PyTorch DataLoader objects for training and validation with parameters:

Batch Size: Default is 8.
Train Split: 85% training, 15% validation.
Num Workers: Default is 4 for faster loading.
Collate Function: Custom function to merge samples, particularly for variable-length inputs like text.

2. Extracting Labels Using CheXbert

CheXbert is a transformer-based model fine-tuned for medical text classification using the BERT architecture. It extracts multi-label classifications from chest X-ray radiology reports.

Process:

Text Processing:
- Extract "Findings" and "Impressions" from reports.
- Tokenize and format text for CheXbert.
- Generate high-dimensional contextual embeddings.
Label Extraction:
- A classification layer predicts probabilities for each clinical condition.
- Probabilities are thresholded at 0.5 to produce binary labels.
Dataset Preparation:
The binary labels are integrated into a CSV file to enrich the dataset for multi-label classification.

3. ChexNet for Structural Findings Extraction

ChexNet (based on DenseNet-121) is fine-tuned for multi-label classification of chest X-rays, focusing on structural abnormalities.

Key Points:

Base Model: DenseNet-121 with pre-trained ImageNet weights.
Layer Freezing:
Initial layers are frozen; only the last two dense blocks and the classifier head are fine-tuned.
Custom Classifier:
- Input: 1024 features from DenseNet-121.
- Hidden Layer: 512 units with ReLU activation.
- Dropout: 0.3 for regularization.
- Output: 14 sigmoid-activated nodes for multi-label classification.
Training Procedure:
- Loss Function: Custom Weighted Binary Cross-Entropy Loss (WeightedBCELoss)
- Optimizer: Adam with differential learning rates.
- Scheduler: ReduceLROnPlateau.
- Metric: Achieved an F1-micro score of 0.70.

4. Model Architectures

Two distinct model architectures were experimented with to generate medical reports:

Model 1: BioVilt + Alignment + BioGPT

Components:
- BioVilt:
  - Uses a ResNet backbone (ResNet-50/ResNet-18) for feature extraction.
  - Produces a 512-dimensional global embedding.
- Alignment Module:
  - Bridges image embeddings with textual representations.
- BioGPT:
  - A powerful GPT-2 based language model pre-trained on biomedical literature (approx. 347M parameters).
Configuration:
- BioVilt:
  - Backbone: ResNet-50
  - Output: 512-dimensional embedding.
- Alignment Module:
  - Text encoder: Microsoft BioGPT.
  - Projection layers map image embeddings to BioGPT’s 768-dimensional space.
  - Loss Function: Contrastive Loss.
- BioGPT (PEFT via LoRA):
  - Rank: 16
  - Alpha: 32
  - Dropout: 0.1
- Generation Parameters:
  - max_length: 150 tokens
  - temperature: 0.8
  - top_k: 50
  - top_p: 0.85
Integration and Flow:
- Image Preprocessing: Resize and augment PNG images.
- Image Encoding: BioVilt extracts image features.
- Alignment: Projects image embeddings to align with BioGPT's text embeddings.
- Report Generation: The aligned embeddings are fed into BioGPT to generate the final report.

Model 2: BioVilt + ChexNet + Alignment + BioGPT

Components:
- BioVilt:
  - ResNet-50 based image encoder.
- ChexNet:
  - Multi-label classifier (DenseNet-121) for structural findings.
- Alignment Module:
  - Integrates image and label embeddings with text embeddings.
- BioGPT:
  - Fine-tuned for biomedical report generation.
Configuration:
- BioVilt:
  - Backbone: ResNet-50
  - Output: 512-dimensional embedding.
- ChexNet:
  - Backbone: DenseNet-121
  - Output: Multi-label predictions for 14 clinical findings.
- Alignment Module:
  - Text encoder: Microsoft BioGPT.
  - Projection layers map image embeddings to 768 dimensions and separately project text from the ground truth reports.
  - Loss Function: Contrastive Loss.
- BioGPT (PEFT via LoRA):
  - Rank: 16
  - Alpha: 32
  - Dropout: 0.1
- Generation Parameters:
  - max_length: 150 tokens
  - temperature: 0.8
  - top_k: 50
  - top_p: 0.85
Integration and Flow:
- Image Preprocessing: Resize and augment PNG images.
- Image Encoding: BioVilt extracts image features.
- ChexNet Classification: Identifies structural findings and generates binary labels.
- Alignment: Combines image embeddings with label information and projects them to align with BioGPT’s text embeddings.
- Concatenation: The image embeddings and prompt text embeddings (with a <SEP> token separator) are concatenated.
- Report Generation: The concatenated embeddings are fed into BioGPT to generate the final report.

Results

In this analysis, a comprehensive comparison is conducted between the two distinct models. The ROUGE metric (Recall Oriented Understudy for Gisting Evaluation) is used as the primary evaluation metric, measuring the overlap between generated and reference text across several dimensions such as recall, precision, and F1-score.

ROUGE-L (Longest Common Subsequence):
This metric evaluates the longest common subsequence between the generated and reference texts, giving credit for correctly ordered content even if the content is spread out.

Graph snippets for (BioGPT + Image Encoder) and (BioGPT + Image Encoder + ChexNet Labels) are provided below:

Model 1: BioVilt + Alignment + BioGPT
Model 2: BioVilt + ChexNet + Alignment + BioGPT

Challenges Faced

Limited Computation Power:
Resource constraints affected training and model size selection.
Model Complexity:
Smaller models failed to capture detailed findings, while larger models were required for improved accuracy.
Error Propagation:
The clinical findings extraction model introduces some errors that can impact the final report quality.

Deployment

The model is deployed using Streamlit on an AWS EC2 instance for real-time inference.

References

CheXbert: CheXbert GitHub Repository
ChexNet: ChexNet: Radiologist-Level Pneumonia Detection on Chest X-Rays (arXiv)
BioVilt: BioViLT: A Vision-Language Transformer for Medical Image Report Generation (arXiv)
BioGPT: BioGPT BioGPT: Generative Pre-trained Transformer for Biomedical Text Generation and Mining
PEFT Techniques (LoRA): LoRA: Low-Rank Adaptation for Fast Training of Neural Networks (arXiv)

This project demonstrates a synergistic approach combining computer vision and natural language processing to assist radiologists by generating detailed preliminary reports from chest X-ray images.

Feel free to explore the repository for code, experiments, and further documentation.

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
CheXbert		CheXbert
biovil_t		biovil_t
.DS_Store		.DS_Store
BioGPT_Base_alignment_model.py		BioGPT_Base_alignment_model.py
BioGPT_Base_data_processing.py		BioGPT_Base_data_processing.py
BioGPT_Base_report_generator.py		BioGPT_Base_report_generator.py
BioGPT_Base_train.py		BioGPT_Base_train.py
README.md		README.md
alignment_model.py		alignment_model.py
data_preprocessing.py		data_preprocessing.py
download_folder.sh		download_folder.sh
report_generator.py		report_generator.py
streamlit.py		streamlit.py
train.py		train.py
validate.py		validate.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Automated Radiology Report Generation from Chest X-Rays

Table of Contents

Introduction

Dataset Description