Codes for our paper Driving Like Humans: Leveraging Vision Large Language Models for Road Anomaly Detection.
This repository contains the codebase for road anomaly detection using Vision Large Language Models (VLLMs). It includes dataset loaders, model training, and evaluation scripts, targeting the detection of road anomalies in a human-like driving context.
- DataSetConverters/: Tools for dataset conversion (to Florence-2 and PalliGemma format).
- dataloaders/: Dataset loading scripts.
- samples/: Sample inputs and outputs.
- scripts/: Helping scripts.
-
Clone the repository:
git clone https://github.com/abdkhanstd/RAVLLM.git cd RAVLLM
-
Install dependencies:
pip install -r requirements.txt
-
Start training:
python TrainFlorence2OD.OK.py
The model weights and related datasets can be accessed from:
Please cite this work if it aids your research:
@inproceedings{Shafiq2024,
title={Driving Like Humans: Leveraging Vision Large Language Models for Road Anomaly Detection},
author={Sidra Shafiq and Hassan Moatasam Awan and Abdullah Aman Khan and Waqas Amin},
year={2024},
booktitle={2024 3rd International Conference on Emerging Trends in Electrical, Control, and Telecommunication Engineering (ETECTE)}
}
This project is licensed under the MIT License.
You can adjust the links for weights and datasets accordingly.