Skip to content
/ HERMES Public

HERMES: A Unified Self-Driving World Model for Simultaneous 3D Scene Understanding and Generation

License

Notifications You must be signed in to change notification settings

LMD0311/HERMES

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 

Repository files navigation

HERMES: A Unified Self-Driving World Model for Simultaneous
3D Scene Understanding and Generation

Xin Zhou1*, Dingkang Liang1*†, Sifan Tu1, Xiwu Chen3, Yikang Ding2†, Dingyuan Zhang1, Feiyang Tan3,
Hengshuang Zhao4, Xiang Bai1

1 Huazhong University of Science & Technology, 2 MEGVII Technology,
3 Mach Drive, 4 The University of Hong Kong

(*) Equal contribution. (†) Project leader.

arXiv Project Hits Code License

Check our awesome for the latest World Models! Awesome World Model Stars

📣 News

  • [2025.01.24] Release the demo. Check it out and give it a star 🌟!

  • [2025.01.24] Release the paper.

Abstract

Driving World Models (DWMs) have become essential for autonomous driving by enabling future scene prediction. However, existing DWMs are limited to scene generation and fail to incorporate scene understanding, which involves interpreting and reasoning about the driving environment. In this paper, we present a unified Driving World Model named HERMES1. Through a unified framework, we seamlessly integrate scene understanding and future scene evolution (generation) in driving scenarios. Specifically, HERMES leverages a Bird‘s-Eye View (BEV) representation to consolidate multi-view spatial information while preserving geometric relationships and interactions. Additionally, we introduce world queries, which incorporate world knowledge into BEV features via causal attention in the Large Language Model (LLM), enabling contextual enrichment for both understanding and generation tasks. We conduct comprehensive studies on nuScenes and OmniDrive-nuScenes datasets to validate the effectiveness of our method. HERMES achieves state-of-the-art performance, reducing generation error by 32.4% and improving understanding metrics such as CIDEr by 8.0%.

  1. In Greek mythology, Hermes serves as the messenger of the gods. Similarly, this paper proposes a simple yet effective framework that unifies understanding and generation as a driving world model, facilitating knowledge transfer across tasks. The logo inspired by Hermes’ shoes.

Overview

Demo

Example 1
Example 2
Example 3

Main Results

Getting Started

Coming soon.

To Do

  • Release demo.
  • Release checkpoints.
  • Release training code.

Acknowledgement

This project is based on BEVFormer v2 (paper, code), InternVL (paper, code), UniPAD (paper, code), OminiDrive (paper, code). Thanks for their wonderful works.

Citation

If you find this repository useful in your research, please consider giving a star ⭐ and a citation.

@article{zhou2025hermes,
  title={HERMES: A Unified Self-Driving World Model for Simultaneous 3D Scene Understanding and Generation},
  author={Zhou, Xin and Liang, Dingkang and Tu, Sifan and Chen, Xiwu and Ding, Yikang and Zhang, Dingyuan and Tan, Feiyang and Zhao, Hengshuang and Bai, Xiang},
  journal={arXiv preprint arXiv:2501.14729},
  year={2025}
}

About

HERMES: A Unified Self-Driving World Model for Simultaneous 3D Scene Understanding and Generation

Topics

Resources

License

Stars

Watchers

Forks

Contributors 3

  •  
  •  
  •