Skip to content

Commit

Permalink
[update] title and author
Browse files Browse the repository at this point in the history
  • Loading branch information
yizhilll committed Sep 22, 2024
1 parent aef0d45 commit d33e23a
Show file tree
Hide file tree
Showing 5 changed files with 52 additions and 58 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
.DS_Store
6 changes: 6 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,2 +1,8 @@
# OmniBench
A project for tri-modal LLM benchmarking and instruction tuning.

```shell
git config http.postBuffer 524288000 # need this
# need to specify the push command since it's a orphan branch
git push origin gh-pages
```
103 changes: 45 additions & 58 deletions index.html
Original file line number Diff line number Diff line change
@@ -1,17 +1,17 @@
<!DOCTYPE html>
<html lang="en">
<head>
<title>MMMU</title>
<title>OmniBench</title>
<script src="https://cdn.jsdelivr.net/npm/chart.js"></script>
<script src="https://kit.fontawesome.com/f8ddf9854a.js" crossorigin="anonymous"></script>
<meta charset="utf-8">
<meta name="description"
content="A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI">
<meta name="keywords" content="MMMU, LMM, LMM Evaluation, Vision Language Model, Large Language Model, Large Multimodal Model, artificial intelligence, AI, AGI, artificial general intelligence">
content="Towards The Future of Universal Omni-Language Models">
<meta name="keywords" content="OmniBench, MLLM, MLLM Evaluation, Vision Language Model, Large Language Model, Large Multimodal Model, artificial intelligence, AI">
<meta name="viewport" content="width=device-width, initial-scale=1">
<title> MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI</title>
<title> OmniBench: Towards The Future of Universal Omni-Language Models</title>

<link rel="icon" href="./static/images/mmmu_icon2.png">
<link rel="icon" href="./static/images/logo.png">

<link href="https://fonts.googleapis.com/css?family=Google+Sans|Noto+Sans|Castoro" rel="stylesheet">

Expand Down Expand Up @@ -45,20 +45,11 @@
More Research
</a>
<div class="navbar-dropdown">
<a class="navbar-item" href="https://huggingface.co/datasets/MMMU/MMMU_Pro">
<b>MMMU-Pro</b> <span style="font-size:18px; display: inline; margin-left: 5px;">🔥</span>
<a class="navbar-item" href="https://github.com/multimodal-art-projection/MAP-NEO">
MAP-NEO
</a>
<a class="navbar-item" href="https://tiger-ai-lab.github.io/MAmmoTH/">
MAmmoTH
</a>
<a class="navbar-item" href="https://osu-nlp-group.github.io/TableLlama/">
TableLlama
</a>
<a class="navbar-item" href="https://osu-nlp-group.github.io/MagicBrush/">
MagicBrush
</a>
<a class="navbar-item" href="https://osu-nlp-group.github.io/Mind2Web/">
Mind2Web
<a class="navbar-item" href="https://github.com/yizhilll/MERT">
MERT
</a>
</div>
</div>
Expand All @@ -72,68 +63,67 @@
<div class="columns is-centered">
<div class="column has-text-centered">
<h1 class="title is-1 publication-title is-bold">
<img src="static/images/mmmu_icon2.png" style="width:1em;vertical-align: middle" alt="Logo"/>
<span class="mmmu" style="vertical-align: middle">MMMU</span>
<img src="static/images/logo.png" style="width:1em;vertical-align: middle" alt="Logo"/>
<span class="omnibench" style="vertical-align: middle">OmniBench</span>
</h1>
<h2 class="subtitle is-3 publication-subtitle">
A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI
Towards The Future of Universal Omni-Language Models
</h2>
<div class="is-size-5 publication-authors">
<span class="author-block">
<a href="https://xiangyue9607.github.io/" style="text-decoration: none; color: inherit;">Xiang Yue*<sup style="color:#6fbf73;">†,1</sup></a>,
<a href="https://yizhilll.github.io/" style="text-decoration: none; color: inherit;">Yizhi Li*<sup style="color:#6fbf73;">1,2</sup></a>,
</span>
<span class="author-block">
<a href="https://yuanshengni.github.io/" style="text-decoration: none; color: inherit;">Yuansheng Ni*<sup style="color:#ffac33;">2</sup></a>,
<a href="https://scholar.google.com/citations?user=qyTrq4kAAAAJ&hl" style="text-decoration: none; color: inherit;">Ge Zhang*<sup style="color:#6fbf73;">†1,3</sup></a>,
</span>
<span class="author-block">
<a href="https://drogozhang.github.io/" style="text-decoration: none; color: inherit;">Kai Zhang*<sup style="color:#ed4b82;">3</sup></a>,
<a href="https://nicolaus625.github.io/" style="text-decoration: none; color: inherit;">Yinghao Ma*<sup style="color:#6fbf73;">1,4</sup></a>,
</span>
<span class="author-block">Tianyu Zheng*<sup style="color:#007bff;">4</sup>,</span><br>
<span class="author-block">Ruoqi Liu<sup style="color:#ed4b82;">3</sup>,</span>
<span class="author-block">Ge Zhang<sup style="color:#ffac33;">2</sup>,</span>
<span class="author-block">Samuel Stevens<sup style="color:#ed4b82;">3</sup>,</span>
<span class="author-block">Dongfu Jiang<sup style="color:#ffac33;">2</sup>,</span>
<span class="author-block">Weiming Ren<sup style="color:#ffac33;">2</sup>,</span>
<span class="author-block">Yuxuan Sun<sup style="color:#007bff;">4</sup>,</span>
<span class="author-block">Cong Wei<sup style="color:#ffac33;">2</sup>,</span>
<span class="author-block">Botao Yu<sup style="color:#ed4b82;">3</sup>,</span>
<span class="author-block">Ruibin Yuan<sup style="color:#ffac33;">5</sup>,</span>
<span class="author-block">Renliang Sun<sup style="color:#ffac33;">2</sup>,</span>
<span class="author-block">Ming Yin<sup style="color:#9b51e0;">7</sup>,</span>
<span class="author-block">Boyuan Zheng<sup style="color:#ed4b82;">3</sup>,</span>
<span class="author-block">Zhenzhu Yang<sup style="color:#007bff;">4</sup>,</span>
<span class="author-block">Yibo Liu<sup style="color:#ed4b82;">6</sup>,</span>
<span class="author-block">Wenhao Huang<sup style="color:#007bff;">4</sup>,</span><br>
<span class="author-block">Ruibin Yuan<sup style="color:#6fbf73;">1,5</sup>,</span>
<span class="author-block">Kang Zhu<sup style="color:#6fbf73;">1,3</sup>,</span><br>
<span class="author-block">Hangyu Guo<sup style="color:#6fbf73;">1</sup>,</span>
<span class="author-block">Yiming Liang<sup style="color:#6fbf73;">1</sup>,</span>
<span class="author-block">Jiaheng Liu<sup style="color:#6fbf73;">1</sup>,</span>
<span class="author-block">Jian Yang<sup style="color:#6fbf73;">1</sup>,</span>
<span class="author-block">Siwei Wu<sup style="color:#6fbf73;">1,2</sup>,</span><br>
<span class="author-block">Xingwei Qu<sup style="color:#6fbf73;">1,2</sup>,</span>
<span class="author-block">Jinjie Shi<sup style="color:#6fbf73;">4</sup>,</span>
<span class="author-block">Xinyue Zhang<sup style="color:#6fbf73;">1</sup>,</span>
<span class="author-block">Zhenzhu Yang<sup style="color:#6fbf73;">1</sup>,</span>
<span class="author-block">Xiangzhou Wang<sup style="color:#6fbf73;">1</sup>,</span><br>
<span class="author-block">Zhaoxiang Zhang<sup style="color:#ed4b82;">6</sup>,</span>
<span class="author-block">Zachary Liu<sup style="color:#9b51e0;">7</sup>,</span>
<span class="author-block">
<a href="https://web.cse.ohio-state.edu/~sun.397/" style="text-decoration: none; color: inherit;">Huan Sun*<sup style="color:#ed4b82;">3</sup></a>,
<a href="https://www.eecs.qmul.ac.uk/~emmanouilb/" style="text-decoration: none; color: inherit;">Emmanouil Benetos<sup style="color:#007bff;">4</sup></a>,
</span>
<span class="author-block">
<a href="https://ysu1989.github.io/" style="text-decoration: none; color: inherit;">Yu Su*<sup style="color:#ed4b82;">,3</sup></a>,
<a href="https://scholar.google.com/citations?user=OdE3MsQAAAAJ&hl" style="text-decoration: none; color: inherit;">Wenhao Huang<sup style="color:#6fbf73;">1,3</sup></a>,
</span>
<span class="author-block">
<a href="https://wenhuchen.github.io/" style="text-decoration: none; color: inherit;">Wenhu Chen*<sup style="color:#ffac33;">†,2</sup></a>
<a href="https://chenghualin.wordpress.com/" style="text-decoration: none; color: inherit;">Chenghua Lin<sup style="color:#b433ff;">†,1,2</sup></a>,
</span>

</div>

<br>

<div class="is-size-5 publication-authors">
<span class="author-block"><sup style="color:#6fbf73;">1</sup>IN.AI Research,</span>
<span class="author-block"><sup style="color:#ffac33;">2</sup>University of Waterloo,</span>
<span class="author-block"><sup style="color:#ed4b82;">3</sup>The Ohio State University,</span>
<span class="author-block"><sup style="color:#007bff;">4</sup>Independent,</span><br>
<span class="author-block"><sup style="color:#ffac33;">5</sup>Carnegie Mellon University,</span>
<span class="author-block"><sup style="color:#ed4b82;">6</sup>University of Victoria,</span>
<span class="author-block"><sup style="color:#9b51e0;">7</sup>Princeton University</span>
<span class="author-block"><sup style="color:#6fbf73;">1</sup>m-a-p.ai,</span>
<span class="author-block"><sup style="color:#b433ff;">2</sup>University of Manchester,</span>
<span class="author-block"><sup style="color:#ed4b82;">3</sup>01.ai,</span>
<span class="author-block"><sup style="color:#007bff;">4</sup>Queen Mary University of London,</span><br>
<span class="author-block"><sup style="color:#ffac33;">5</sup>Hongkong University of Science and Technology,</span>
<span class="author-block"><sup style="color:#ed4b82;">6</sup>Nanjing University,</span>
<span class="author-block"><sup style="color:#9b51e0;">7</sup>Dartmouth College</span>
</div>

<br>
<div class="is-size-5 publication-authors">
<span class="author-block">*Core Contributors</span><br>
<span class="author-block">†Corresponding to:</span>
<span class="author-block"><a href="mailto:xiangyue.work@gmail.com">xiangyue.work@gmail.com</a>,</span>
<span class="author-block"><a href="mailto:su.809@osu.edu">su.809@osu.edu</a>,</span>
<span class="author-block"><a href="mailto:wenhuchen@uwaterloo.ca">wenhuchen@uwaterloo.ca</a></span>
<span class="author-block"><a href="mailto:yizhi.li@hotmail.com">yizhi.li@hotmail.com</a>,</span>
<span class="author-block"><a href="mailto:gezhang@umich.edu">gezhang@umich.edu</a>,</span>
<span class="author-block"><a href="mailto:c.lin@manchester.ac.uk">c.lin@manchester.ac.uk</a></span>
</div>

<div class="column has-text-centered">
Expand Down Expand Up @@ -226,16 +216,13 @@ <h2 class="subtitle is-3 publication-subtitle">
<h2 class="title is-3">🔔News</h2>
<div class="content has-text-justified">
<p>
<b>🚀[2024-01-31]: We added Human Expert performance on the <a href="#leaderboard">Leaderboard</a>!🌟</b>
</p>
<p>
<b>🔥[2023-12-04]: Our evaluation server for the test set is now available on <a href="https://eval.ai/web/challenges/challenge-page/2179/overview"><b>EvalAI</b></a>. We welcome all submissions and look forward to your participation! 😆</b>
<b>🔥[2024-09-22]: We release the new benchmark for text, image, and audio large language models!</b>
</p>
</div>
<h2 class="title is-3">Introduction</h2>
<div class="content has-text-justified">
<p>
We introduce MMMU: a new benchmark designed to evaluate multimodal models on massive multi-discipline tasks demanding college-level subject knowledge and deliberate reasoning. MMMU includes <b>11.5K</b> meticulously collected multimodal questions from college exams, quizzes, and textbooks, covering six core disciplines: Art & Design, Business, Science, Health & Medicine, Humanities & Social Science, and Tech & Engineering. These questions span <b>30</b> subjects and <b>183</b> subfields, comprising 30 highly heterogeneous image types, such as charts, diagrams, maps, tables, music sheets, and chemical structures. Unlike existing benchmarks, MMMU focuses on advanced perception and reasoning with domain-specific knowledge, challenging models to perform tasks akin to those faced by experts. Our evaluation of 14 open-source LMMs and the proprietary GPT-4V(ision) highlights the substantial challenges posed by MMMU. Even the advanced GPT-4V only achieves a 56% accuracy, indicating significant room for improvement. We believe MMMU will stimulate the community to build next-generation multimodal foundation models towards expert artificial general intelligence.
Recent advancements in multimodal large language models (MLLMs) have aimed to integrate and interpret data across diverse modalities. However, the capacity of these models to concurrently process and reason about multiple modalities remains inadequately explored, partly due to the lack of comprehensive modality-wise benchmarks. We introduce OmniBench, a novel benchmark designed to rigorously evaluate models' ability to recognize, interpret, and reason across visual, acoustic, and textual inputs simultaneously. We define models capable of such tri-modal processing as omni-language models (OLMs).OmniBench is distinguished by high-quality human annotations, ensuring that accurate responses require integrated understanding and reasoning across all three modalities. Our main findings reveal that: i) open-source OLMs exhibit critical limitations in instruction-following and reasoning capabilities within tri-modal contexts; and ii) these baseline models perform poorly (below 50\% accuracy) even when provided with alternative textual representations of images and audio. These results suggest that the ability to construct a consistent context from text, image, and audio is often overlooked in existing MLLM training paradigms. We advocate for future research to focus on developing more robust tri-modal integration techniques and training strategies to enhance OLM performance across diverse modalities.
</p>
</div>
</div>
Expand Down
Binary file added static/images/logo.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file removed static/images/mmmu_icon2.png
Binary file not shown.

0 comments on commit d33e23a

Please sign in to comment.