CITATION.cff

cff-version: 1.2.0
title: >-
  HEAD: HEtero-Assists Distillation for Heterogeneous
  Object Detectors
message: Please cite this project using these metadata.
type: software
authors:
  - given-names: Luting
    family-names: Wang
    email: wangluting@buaa.edu.cn
    orcid: 'https://orcid.org/0000-0001-8317-226X'
    affiliation: Beihang University
  - given-names: Xiaojie
    family-names: Li
    email: lixiaojie@senseauto.com
    affiliation: SenseTime
  - given-names: Yue
    family-names: Liao
    email: liaoyue.ai@gmail.com
    affiliation: Beihang University
  - given-names: Zeren
    family-names: Jiang
    email: zeren.jiang99@gmail.com
    affiliation: ETH Zurich
  - given-names: Jianlong
    email: jlwu1992@sdu.edu.cn
    affiliation: Shandong University
    family-names: Wu
  - given-names: Fei
    email: wangfei91@mail.ustc.edu.cn
    affiliation: University of Science and Technology of China
    family-names: Wang
  - given-names: Chen
    email: qianchen@sensetime.com
    affiliation: SenseTime
    family-names: Qian
  - given-names: Si
    email: liusi@buaa.edu.cn
    affiliation: Beihang University
    family-names: Liu
identifiers:
  - type: doi
    value: 10.48550/arXiv.2207.05345
    description: arXiv
repository-code: 'https://github.com/LutingWang/HEAD'
abstract: >-
  Conventional knowledge distillation (KD) methods
  for object detection mainly concentrate on
  homogeneous teacher-student detectors. However, the
  design of a lightweight detector for deployment is
  often significantly different from a high-capacity
  detector. Thus, we investigate KD among
  heterogeneous teacher-student pairs for a wide
  application. We observe that the core difficulty
  for heterogeneous KD (hetero-KD) is the significant
  semantic gap between the backbone features of
  heterogeneous detectors due to the different
  optimization manners. Conventional homogeneous KD
  (homo-KD) methods suffer from such a gap and are
  hard to directly obtain satisfactory performance
  for hetero-KD. In this paper, we propose the
  HEtero-Assists Distillation (HEAD) framework,
  leveraging heterogeneous detection heads as
  assistants to guide the optimization of the student
  detector to reduce this gap. In HEAD, the assistant
  is an additional detection head with the
  architecture homogeneous to the teacher head
  attached to the student backbone. Thus, a hetero-KD
  is transformed into a homo-KD, allowing efficient
  knowledge transfer from the teacher to the student.
  Moreover, we extend HEAD into a Teacher-Free HEAD
  (TF-HEAD) framework when a well-trained teacher
  detector is unavailable. Our method has achieved
  significant improvement compared to current
  detection KD methods. For example, on the MS-COCO
  dataset, TF-HEAD helps R18 RetinaNet achieve 33.9
  mAP (+2.2), while HEAD further pushes the limit to
  36.2 mAP (+4.5).
keywords:
  - Knowledge Distillation
  - Object Detection
  - Heterogeneous
license: Apache-2.0