Skip to content

Code for the paper "Gla-AI4BioMed at RRG24: Visual Instruction-tuned Adaptation for Radiology Report Generation" (BioNLP ACL'24)

License

Notifications You must be signed in to change notification settings

X-iZhang/RRG-BioNLP-ACL2024

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 

Repository files navigation

Gla-AI4BioMed at RRG24: Visual Instruction-tuned Adaptation for Radiology Report Generation

hf_space arXiv hf_space License Views

Overview

We introduce a radiology-focused visual language model designed to generate radiology reports from chest X-rays. Building on previous findings that large language models (LLMs) can acquire multimodal capabilities when aligned with pretrained vision encoders, we demonstrate similar potential with chest X-ray images. Our model combines an image encoder with a fine-tuned LLM based on the Vicuna-7B architecture, enabling it to generate different sections of a radiology report with notable accuracy.

architecture

Contents

Install

Please refer to the Libra repository for code and environment details, as this project is compatible with it. Below is a brief outline:

  • Create and activate a new conda environment (e.g., libra).
  • Install the required dependencies (e.g., pip install -e .).
git clone https://github.com/X-iZhang/Libra.git
cd Libra

conda create -n libra python=3.10 -y
conda activate libra
pip install --upgrade pip  # enable PEP 660 support
pip install -e .

Model Weight

Version Base LLM Vision Encoder Checkpoint
Libra-v0.5-impressions Vicuna-7B CLIP libra-v0.5-impressions
Libra-v0.5-findings Vicuna-7B CLIP libra-v0.5-findings

Quick Start

CLI Inference

We support running inference using the CLI. To use our model, run:

python -m libra.serve.cli \
    --model-path X-iZhang/libra-v0.5-impressions  \
    --conv-mode libra_v0 \
    --image-file "./path/to/chest_x_ray.jpg"

Script Inference

You can use the libra_eval function in libra/eval/run_libra.py to easily launch a model trained by yourself or us on local machine or in Google Colab, after installing this repository.

from libra.eval import libra_eval

model_path = "X-iZhang/libra-v0.5-impressions "  # Or "X-iZhang/libra-v0.5-findings " 

# Define the paths to the images. 
image_files = "./path/to/chest_x_ray.jpg"

# Define the prompt to guide the model's response.
prompt = "Provide a detailed description of the impression in the radiology image. " 
# Or  "Provide a detailed description of the findings in the radiology image. " 

# Specify the conversational mode, matching the PROMPT_VERSION used during training.
conv_mode = "libra_v0"

# Call the libra_eval function.
libra_eval(
    model_path=model_path,
    image_file=image_files,
    query=prompt,
    temperature=0.9,
    top_p=0.8,
    conv_mode=conv_mode,
    max_new_tokens=512
)

Data Preparation

We use the officially provided dataset. For information on data structure, preprocessing, and additional script usage, please refer to the instructions in Libra. For detailed formats related to data training or evaluation, see Custom_Data.md.

Acknowledgments 🙏

We extend our gratitude to the BioNLP 2024 RRG24 Shared Task organisers for providing the baseline pipeline ViLMedic and curating these challenging and exciting tasks.

Also, we sincerely thank the following projects for their contributions:

  • LLaVA: A Large Language and Vision Assistant, laying the groundwork for multimodal understanding.
  • FastChat: An Open Platform for Training, Serving, and Evaluating Large Language Model based Chatbots.
  • LLaMA: Open and efficient foundation language models that inspired our core language processing capabilities.

Citation ✒️

If you find our paper useful in your research and applications, please cite using this BibTeX:

@inproceedings{Zhang_2024,
   title={Gla-AI4BioMed at RRG24: Visual Instruction-tuned Adaptation for Radiology Report Generation},
   url={http://dx.doi.org/10.18653/v1/2024.bionlp-1.54},
   DOI={10.18653/v1/2024.bionlp-1.54},
   booktitle={Proceedings of the 23rd Workshop on Biomedical Natural Language Processing},
   publisher={Association for Computational Linguistics},
   author={Zhang, Xi and Meng, Zaiqiao and Lever, Jake and Ho, Edmond S.L.},
   year={2024},
   pages={624–634}
}

About

Code for the paper "Gla-AI4BioMed at RRG24: Visual Instruction-tuned Adaptation for Radiology Report Generation" (BioNLP ACL'24)

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published