Skip to content

Latest commit

 

History

History
135 lines (114 loc) · 29.4 KB

README.md

File metadata and controls

135 lines (114 loc) · 29.4 KB

Awesome MLLM HallucinationAwesome

This repository collects research on the hallucination problem of the Multimodal Large Language Model(MLLM), including their papers and codes/datasets.

✈️ The main aspects involved are Surveys, Benchmarks, Hallucination Mitigation methods and some interesting papers that are not directly related to the current topic. Since some of the papers are relatively new and cannot be sure whether they have been included in the specific conferences, they are currently only marked according to the conference acceptance status of the articles that Google Scholar can find.

Besides, we have extracted the name or the core solution's category of each paper for you to read in a targeted manner, while we believe we should re-summarize them to reach a more reasonable classification when a certain number is reached. 🎆

If you find some interesting papers not included, please feel free to contact me. We will continue to update this repository! ☀️

🔷 citation >= 20   |   ⭐ citation >= 50   |   🔥 citation >= 100

Contents

Papers

Surveys

Number Title Venue Paper Repo Citation
1 A Survey of Hallucination in “Large” Foundation Models arxiv(23.09) arXiv
2 A Survey on Hallucination in Large Vision-Language Models arxiv(24.02) arXiv

Benchmarks

Here are some works that could evaluate the hallucination performances of MLLMs, including some popular benchmarks. Most work products fine-tuning using their benchmark dataset, which could reduce the likelihood of hallucinating without sacrificing its performance on other benchmarks. And some papers have designed clever ways to construct such datasets.

Number Title Venue Paper Repo Citation Benchmark Name
1 Evaluating Object Hallucination in Large Vision-Language Models EMNLP(2023) arXiv 🔥 POPE
2 MME: A Comprehensive Evaluation Benchmark for Multimodal Large Language Models arxiv(23.06) arXiv GitHub Page 🔥 MME (comprehensive)
3 MMBench: Is Your Multi-modal Model an All-around Player? arxiv(23.07) arXiv 🔥 MMBench (comprehensive)
4 Evaluation and Analysis of Hallucination in Large Vision-Language Models arxiv(23.08) arXiv GitHub Page 🔷 HaELM
5 Aligning Large Multimodal Models with Factually Augmented RLHF arxiv(23.09) arXiv GitHub Page 🔷 MMHAL-BENCH
6 HALLUSIONBENCH: An Advanced Diagnostic Suite for Entangled Language Hallucination & Visual Illusion in Large Vision-Language Models arxiv(23.10) arXiv Google Drive HALLUSIONBENCH
7 Negative object presence evaluation (nope) to measure object hallucination in vision-language models arxiv(23.10) arXiv NOPE
8 HALLE-SWITCH: CONTROLLING OBJECT HALLUCINATION IN LARGE VISION LANGUAGE MODELS arxiv(23.10) arXiv GitHub Page CCEval
9 Ferret: Refer and ground anything anywhere at any granularity arxiv(23.10) arXiv GitHub Page 🔷 Ferret-Bench (consider the refer-and-ground capability)
10 Holistic Analysis of Hallucination in GPT-4V(ision):Bias and Interference Challenges arxiv(23.11) arXiv GitHub Page 🔷 Bingo
11 AMBER: An LLM-free Multi-dimensional Benchmark for MLLMs Hallucination Evaluation arxiv(23.11) arXiv GitHub Page AMBER
12 Faithscore: Evaluating hallucinations in large vision-language models arxiv(23.11) arXiv GitHub Page Faithscore (metric)
13 Mitigating Hallucination in Visual Language Models with Visual Supervision arxiv(23.11) arXiv RAHBench
14 Mitigating Open-Vocabulary Caption Hallucinations arxiv(23.12) arXiv GitHub Page OpenCHAIR
15 RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback arxiv(23.12) arXiv GitHub Page MHumanEval
16 Ciem: Contrastive instruction evaluation method for better instruction tuning NeurIPS(2023) Workshop arXiv Ciem (and CIT for mitigation)
17 Mitigating hallucination in large multimodal models via robust instruction tuning ICLR(2024) arXiv 🔷 GAVIE
18 Detecting and Preventing Hallucinations in Large Vision Language Models AAAI(2024) arXiv GitHub Page 🔷 M-HalDetect
19 Mitigating Fine-Grained Hallucination by Fine-Tuning Large Vision-Language Models with Caption Rewrites MMM(2024) arXiv GitHub Page FGHE/FOHE (An upgraded version of POPE)
20 Evaluation and Enhancement of Semantic Grounding in Large Vision-Language Models AAAI-ReLM Workshop(2024) arXiv MSG-MCQ
21 Eyes wide shut? exploring the visual shortcomings of multimodal llms arxiv(24.01) arXiv MMVP
22 Visual Hallucinations of Multi-modal Large Language Models arxiv(24.02) arXiv GitHub Page two benchmarks generated by VHTest
23 Hal-Eval: A Universal and Fine-grained Hallucination Evaluation Framework for Large Vision Language Models arxiv(24.02) arXiv Hal-Eval(a new category: Event Hallucination)
24 GenCeption: Evaluate Multimodal LLMs with Unlabeled Unimodal Data arxiv(24.02) arXiv GitHub Page GenCeption (no need with high-quality annotation)
25 How Easy is It to Fool Your Multimodal LLMs? An Empirical Analysis on Deceptive Prompts arxiv(24.02) arXiv MAD-Bench (a new category:Visual Confusion)
26 Unified Hallucination Detection for Multimodal Large Language Models arxiv(24.02) arXiv GitHub Page MHaluBench
27 The Instinctive Bias: Spurious Images lead to Hallucination in MLLMs arxiv(24.02) arXiv GitHub Page CorrelationQA
28 Definition, Quantification, and Prescriptive Remediations arxiv(24.03) arXiv VHILT
29 EgoThink: Evaluating First-Person Perspective Thinking Capability of Vision-Language Models arxiv(24.03) arXiv GitHub Page EgoThink

Hallucination Mitigation methods

Here are some labels that represent the core points of the papers, corresponding to mitigation methods from different angles, you could read the surveys mentioned earlier to further understand these categories:
data.: data improvement (most benchmarks)   |   vis.: vision enhancement   |   align.: multimodal alignment   |   dec.: decoding optimization   |   post.: post-process   |   other.: other kinds

Number Title Venue Paper Repo Citation Core
1 VCoder: Versatile Vision Encoders for Multimodal Large Language Models CVPR(2024) arXiv GitHub Page vis.
2 Ferret: Refer and ground anything anywhere at any granularity arxiv(23.10) arXiv GitHub Page 🔷 vis.
3 Enhancing the Spatial Awareness Capability of Multi-Modal Large Language Model arxiv(23.10) arXiv vis.
4 Video-LLaVA: Learning United Visual Representation by Alignment Before Projection arxiv(23.11) arXiv GitHub Page 🔷 vis.
5 Mitigating Hallucination in Visual Language Models with Visual Supervision arxiv(23.11) arXiv vis. (with SAM -> in-context)
6 LION: Empowering Multimodal Large Language Model with Dual-Level Visual Knowledge arxiv(23.11) arXiv GitHub Page vis.
7 DualFocus: Integrating Macro and Micro Perspectives in Multi-modal Large Language Models arxiv(24.02) arXiv GitHub Page vis.
8 LLaVA-UHD: an LMM Perceiving Any Aspect Ratio and High-Resolution Images arxiv(24.03) arXiv GitHub Page vis.
9 Position-Enhanced Visual Instruction Tuning for Multimodal Large Language Models arxiv(23.08) arXiv GitHub Page 🔷 vis. align.
10 GROUNDHOG : Grounding Large Language Models to Holistic Segmentation arxiv(24.02) arXiv GitHub Page vis. align.
11 Plausible May Not Be Faithful: Probing Object Hallucination in Vision-Language Pre-training arxiv(23.08) arXiv 🔷 align.
12 Hallucination Augmented Contrastive Learning for Multimodal Large Language Model arxiv(23.12) arXiv GitHub Page align.
13 OPERA: Alleviating Hallucination in Multi-Modal Large Language Models via Over-Trust Penalty and Retrospection-Allocation CVPR(2024) arXiv GitHub Page dec.
14 Mitigating Object Hallucinations in Large Vision-Language Models through Visual Contrastive Decoding (VCD) arxiv(23.11) arXiv GitHub Page dec.
15 Seeing is believing mitigating hallucination in large vision-language models via clip-guided decoding arxiv(24.02) arXiv dec.
16 IBD: Alleviating Hallucinations in Large Vision-Language Models via Image-Biased Decoding arxiv(24.02) arXiv dec.
17 HALC: Object Hallucination Reduction via Adaptive Focal-Contrast Decoding arxiv(24.03) arXiv GitHub Page dec.
18 Woodpecker: Hallucination Correction for Multimodal Large Language Models arxiv(23.10) arXiv GitHub Page 🔷 post.
19 Analyzing and mitigating object hallucination in large vision-language models (LURE) arxiv(23.10) arXiv GitHub Page 🔷 post.
20 TEMPORAL INSIGHT ENHANCEMENT: MITIGATING TEMPORAL HALLUCINATION IN MULTIMODAL LARGE LANGUAGE MODELS arxiv(24.01) arXiv post. (Correct with Tools)
21 VIGC: Visual Instruction Generation and Correction arxiv(23.08) arXiv GitHub Page other. (Iterative Generation)
22 Can We Edit Multimodal Large Language Models? EMNLP(2023) arXiv GitHub Page other. (Model Edition)
23 HALO:Estimation and Reduction of Hallucinations in Open-Source Weak Large Language Models arxiv(23.08) arXiv GitHub Page other. (Knowledge Injection and Teacher-Student Approaches)
24 VOLCANO: Mitigating Multimodal Hallucination through Self-Feedback Guided Revision arxiv(23.11) arXiv GitHub Page other. (Self-Feedback as Visual Cues -> in-context)
25 Beyond Hallucinations: Enhancing LVLMs through Hallucination-Aware Direct Preference Optimization (HA-DPO) arxiv(23.11) arXiv GitHub Page other. (trained to Favor the Non-Hallucinating Response as a Preference Selection Task)
26 SILKIE: Preference Distillation for Large Visual Language Models arxiv(23.12) arXiv GitHub Page other. (Preference Distillation)
27 Mitigating Open-Vocabulary Caption Hallucinations (MOCHa) arxiv(23.12) arXiv GitHub Page other. (Multi-Objective RL)
28 Less is More: Mitigating Multimodal Hallucination from an EOS Decision Perspective arxiv(24.02) arXiv GitHub Page other. (Selective EOS Supervision; Data Filtering)
29 Logical Closed Loop: Uncovering Object Hallucinations in Large Vision-Language Models arxiv(24.02) arXiv GitHub Page other. (through Logical Closed Loops [answer verification])
30 EFUF: Efficient Fine-grained Unlearning Framework for Mitigating Hallucinations in Multimodal Large Language Models arxiv(24.02) arXiv other. (Unlearning)
31 Hal-Eval: A Universal and Fine-grained Hallucination Evaluation Framework for Large Vision Language Models arxiv(24.02) arXiv other. (COT)
32 All in a Single Image: Large Multimodal Models are In-Image Learners arxiv(24.02) arXiv GitHub Page other. (In-Image Learning Mechanism)
33 Mitigating Object Hallucination in Large Vision-Language Models via Classifier-Free Guidance (MARINE) arxiv(24.02) arXiv other. (classifier-free guidance)
34 SKIP \N: A SIMPLE METHOD TO REDUCE HALLUCINATION IN LARGE VISION-LANGUAGE MODELS arxiv(24.02) arXiv GitHub Page other. (Suppress Misleading Sign '\N')
35 Evaluating and Mitigating Number Hallucinations in Large Vision-Language Models: A Consistency Perspective arxiv(24.03) arXiv other. (inconsistency for number hallucination)
36 Not All Contexts Are Equal: Teaching LLMs Credibility-aware Generation arxiv(24.04) arXiv GitHub Page other. (CAG)
37 Enhancing Multimodal Compositional Reasoning of Visual Language Models with Generative Negative Mining WACV(2024) arXiv GitHub Page data.
38 Prescribing the Right Remedy: Mitigating Hallucinations in Large Vision-Language Models via Targeted Instruction Tuning arxiv(24.04) arXiv data.
39 TextSquare: Scaling up Text-Centric Visual Instruction Tuning arxiv(24.04) arXiv data.

Others

Here are some papers that are not directly related to MLLM hallucinations, but may have unexpected inspiration for you.

Number Title Venue Paper Repo Citation
1 Multi-Grained Vision Language Pre-Training: Aligning Texts with Visual Concepts ICML(2022) arXiv GitHub Page 🔥
2 Locating and Editing Factual Associations in GPT NeurIPS(2022) arXiv GitHub Page 🔥
3 Overcoming Language Priors in Visual Question Answering via Distinguishing Superficially Similar Instances COLING(2022) arXiv GitHub Page
4 Hallucination improves the performance of unsupervised visual representation learning ICCV(2023) arXiv
5 Direct Preference Optimization: Your Language Model is Secretly a Reward Model NeurIPS(2023) arXiv 🔥
6 A Survey on Multimodal Large Language Models arxiv(23.06) arXiv GitHub Page 🔥
7 Recognize Anything: A Strong Image Tagging Model arxiv(23.06) arXiv GitHub Page
8 RLAIF: Scaling Reinforcement Learning from Human Feedback with AI Feedback arxiv(23.09) arXiv 🔥
9 Cognitive Mirage: A Review of Hallucinations in Large Language Models arxiv(23.09) arXiv GitHub Page 🔷
10 The Dawn of LMMs: Preliminary Explorations with GPT-4V(ision) arxiv(23.09) arXiv 🔥
11 Monkey: Image Resolution and Text Label Are Important Things for Large Multi-modal Models arxiv(23.11) arXiv GitHub Page 🔷
12 Polos: Multimodal Metric Learning from Human Feedback for Image Captioning CVPR(2024) arXiv GitHub Page
13 Successfully Guiding Humans with Imperfect Instructions by Highlighting Potential Errors and Suggesting Corrections arxiv(24.02) arXiv GitHub Page