I am a Master of HUST (Huazhong University of Science and Technology), supervised by Prof. Changxin Gao and Prof. Nong Sang.
🔭 Reseach-wise, I mainly focus on:
- Multi-modal Large Language Models
- Video Understanding, more specifically, Weakly-supervised Temporal Action Localization (WSTAL) & Weakly-suervised Video Anomaly Detection (WSVAD).
😄 I am open to:
- A internship/job/PhD offer with computer vision/multimodal LLM research and engineering.
📫 Contact me by:
- Email: zhanghuaxin@hust.edu.cn
💬 News:
- 2024-07-01: We release our code and model of "Holmes-VAD: Towards Unbiased and Explainable Video Anomaly Detection via Multi-modal LLM".[project page]
- 2024-06-10: We release our code and model of "Arcana: Improving Multi-modal Large Language Model through Boosting Vision Capabilities".[project page]
- 2024-01-29: I start my internship in Baidu VIS, to do some research on Multi-modal Large Language Model (MLLM).
- 2023-12-09: One paper about point supervised temporal action localization is accepted on AAAI 2024.