You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This repository collects research on the hallucination problem of the Multimodal Large Language Model(MLLM), including their papers and codes/datasets.
✈️
The main aspects involved are Surveys, Benchmarks, Hallucination Mitigation methods and some interesting papers that are not directly related to the current topic. Since some of the papers are relatively new and cannot be sure whether they have been included in the specific conferences, they are currently only marked according to the conference acceptance status of the articles that Google Scholar can find.
Besides, we have extracted the name or the core solution's category of each paper for you to read in a targeted manner, while we believe we should re-summarize them to reach a more reasonable classification when a certain number is reached. 🎆
If you find some interesting papers not included, please feel free to contact me. We will continue to update this repository! ☀️
A Survey of Hallucination in “Large” Foundation Models
arxiv(23.09)
➖
⭐
2
A Survey on Hallucination in Large Vision-Language Models
arxiv(24.02)
➖
➖
Benchmarks
Here are some works that could evaluate the hallucination performances of MLLMs, including some popular benchmarks. Most work products fine-tuning using their benchmark dataset, which could reduce the likelihood of hallucinating without sacrificing its performance on other benchmarks. And some papers have designed clever ways to construct such datasets.
Number
Title
Venue
Paper
Repo
Citation
Benchmark Name
1
Evaluating Object Hallucination in Large Vision-Language Models
EMNLP(2023)
➖
🔥
POPE
2
MME: A Comprehensive Evaluation Benchmark for Multimodal Large Language Models
arxiv(23.06)
🔥
MME (comprehensive)
3
MMBench: Is Your Multi-modal Model an All-around Player?
arxiv(23.07)
➖
🔥
MMBench (comprehensive)
4
Evaluation and Analysis of Hallucination in Large Vision-Language Models
arxiv(23.08)
🔷
HaELM
5
Aligning Large Multimodal Models with Factually Augmented RLHF
arxiv(23.09)
🔷
MMHAL-BENCH
6
HALLUSIONBENCH: An Advanced Diagnostic Suite for Entangled Language Hallucination & Visual Illusion in Large Vision-Language Models
arxiv(23.10)
➖
HALLUSIONBENCH
7
Negative object presence evaluation (nope) to measure object hallucination in vision-language models
arxiv(23.10)
➖
➖
NOPE
8
HALLE-SWITCH: CONTROLLING OBJECT HALLUCINATION IN LARGE VISION LANGUAGE MODELS
arxiv(23.10)
➖
CCEval
9
Ferret: Refer and ground anything anywhere at any granularity
arxiv(23.10)
🔷
Ferret-Bench (consider the refer-and-ground capability)
10
Holistic Analysis of Hallucination in GPT-4V(ision):Bias and Interference Challenges
arxiv(23.11)
🔷
Bingo
11
AMBER: An LLM-free Multi-dimensional Benchmark for MLLMs Hallucination Evaluation
arxiv(23.11)
➖
AMBER
12
Faithscore: Evaluating hallucinations in large vision-language models
arxiv(23.11)
➖
Faithscore (metric)
13
Mitigating Hallucination in Visual Language Models with Visual Supervision
arxiv(23.11)
➖
➖
RAHBench
14
Mitigating Open-Vocabulary Caption Hallucinations
arxiv(23.12)
➖
OpenCHAIR
15
RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback
arxiv(23.12)
➖
MHumanEval
16
Ciem: Contrastive instruction evaluation method for better instruction tuning
NeurIPS(2023) Workshop
➖
➖
Ciem (and CIT for mitigation)
17
Mitigating hallucination in large multimodal models via robust instruction tuning
ICLR(2024)
➖
🔷
GAVIE
18
Detecting and Preventing Hallucinations in Large Vision Language Models
AAAI(2024)
🔷
M-HalDetect
19
Mitigating Fine-Grained Hallucination by Fine-Tuning Large Vision-Language Models with Caption Rewrites
MMM(2024)
➖
FGHE/FOHE (An upgraded version of POPE)
20
Evaluation and Enhancement of Semantic Grounding in Large Vision-Language Models
AAAI-ReLM Workshop(2024)
➖
➖
MSG-MCQ
21
Eyes wide shut? exploring the visual shortcomings of multimodal llms
arxiv(24.01)
➖
➖
MMVP
22
Visual Hallucinations of Multi-modal Large Language Models
arxiv(24.02)
➖
two benchmarks generated by VHTest
23
Hal-Eval: A Universal and Fine-grained Hallucination Evaluation Framework for Large Vision Language Models
arxiv(24.02)
➖
➖
Hal-Eval(a new category: Event Hallucination)
24
GenCeption: Evaluate Multimodal LLMs with Unlabeled Unimodal Data
arxiv(24.02)
➖
GenCeption (no need with high-quality annotation)
25
How Easy is It to Fool Your Multimodal LLMs? An Empirical Analysis on Deceptive Prompts
arxiv(24.02)
➖
➖
MAD-Bench (a new category:Visual Confusion)
26
Unified Hallucination Detection for Multimodal Large Language Models
arxiv(24.02)
➖
MHaluBench
27
The Instinctive Bias: Spurious Images lead to Hallucination in MLLMs
arxiv(24.02)
➖
CorrelationQA
28
Definition, Quantification, and Prescriptive Remediations
arxiv(24.03)
➖
➖
VHILT
29
EgoThink: Evaluating First-Person Perspective Thinking Capability of Vision-Language Models
arxiv(24.03)
➖
EgoThink
Hallucination Mitigation methods
Here are some labels that represent the core points of the papers, corresponding to mitigation methods from different angles, you could read the surveys mentioned earlier to further understand these categories: data.: data improvement (most benchmarks) | vis.: vision enhancement |
align.: multimodal alignment |
dec.: decoding optimization | post.: post-process | other.: other kinds
Number
Title
Venue
Paper
Repo
Citation
Core
1
VCoder: Versatile Vision Encoders for Multimodal Large Language Models
CVPR(2024)
➖
vis.
2
Ferret: Refer and ground anything anywhere at any granularity
arxiv(23.10)
🔷
vis.
3
Enhancing the Spatial Awareness Capability of Multi-Modal Large Language Model
arxiv(23.10)
➖
➖
vis.
4
Video-LLaVA: Learning United Visual Representation by Alignment Before Projection
arxiv(23.11)
🔷
vis.
5
Mitigating Hallucination in Visual Language Models with Visual Supervision
arxiv(23.11)
➖
➖
vis. (with SAM -> in-context)
6
LION: Empowering Multimodal Large Language Model with Dual-Level Visual Knowledge
arxiv(23.11)
➖
vis.
7
DualFocus: Integrating Macro and Micro Perspectives in Multi-modal Large Language Models
arxiv(24.02)
➖
vis.
8
LLaVA-UHD: an LMM Perceiving Any Aspect Ratio and High-Resolution Images
arxiv(24.03)
➖
vis.
9
Position-Enhanced Visual Instruction Tuning for Multimodal Large Language Models
arxiv(23.08)
🔷
vis.align.
10
GROUNDHOG : Grounding Large Language Models to Holistic Segmentation
arxiv(24.02)
➖
vis.align.
11
Plausible May Not Be Faithful: Probing Object Hallucination in Vision-Language Pre-training
arxiv(23.08)
➖
🔷
align.
12
Hallucination Augmented Contrastive Learning for Multimodal Large Language Model
arxiv(23.12)
➖
align.
13
OPERA: Alleviating Hallucination in Multi-Modal Large Language Models via Over-Trust Penalty and Retrospection-Allocation
CVPR(2024)
➖
dec.
14
Mitigating Object Hallucinations in Large Vision-Language Models through Visual Contrastive Decoding (VCD)
arxiv(23.11)
➖
dec.
15
Seeing is believing mitigating hallucination in large vision-language models via clip-guided decoding
arxiv(24.02)
➖
➖
dec.
16
IBD: Alleviating Hallucinations in Large Vision-Language Models via Image-Biased Decoding
arxiv(24.02)
➖
➖
dec.
17
HALC: Object Hallucination Reduction via Adaptive Focal-Contrast Decoding
arxiv(24.03)
➖
dec.
18
Woodpecker: Hallucination Correction for Multimodal Large Language Models
arxiv(23.10)
🔷
post.
19
Analyzing and mitigating object hallucination in large vision-language models (LURE)
arxiv(23.10)
🔷
post.
20
TEMPORAL INSIGHT ENHANCEMENT: MITIGATING TEMPORAL HALLUCINATION IN MULTIMODAL LARGE LANGUAGE MODELS
arxiv(24.01)
➖
➖
post. (Correct with Tools)
21
VIGC: Visual Instruction Generation and Correction
arxiv(23.08)
➖
other. (Iterative Generation)
22
Can We Edit Multimodal Large Language Models?
EMNLP(2023)
➖
other. (Model Edition)
23
HALO:Estimation and Reduction of Hallucinations in Open-Source Weak Large Language Models
arxiv(23.08)
➖
other. (Knowledge Injection and Teacher-Student Approaches)
24
VOLCANO: Mitigating Multimodal Hallucination through Self-Feedback Guided Revision
arxiv(23.11)
➖
other. (Self-Feedback as Visual Cues -> in-context)
25
Beyond Hallucinations: Enhancing LVLMs through Hallucination-Aware Direct Preference Optimization (HA-DPO)
arxiv(23.11)
➖
other. (trained to Favor the Non-Hallucinating Response as a Preference Selection Task)
26
SILKIE: Preference Distillation for Large Visual Language Models