MAGIC: Meta-Ability Guided Interactive Chain-of-Distillation for Effective-and-Efficient Vision-and-Language Navigation
Despite the remarkable developments of recent large models in embodied Artificial Intelligence (E-AI), their application in robotics is hampered by their excessive parameter sizes and computational demands. Towards the Vision-and-Language Navigation (VLN) task, a core task in E-AI, this paper reveals the great potential of using knowledge distillation for obtaining lightweight student models by proposing a Meta-Ability Guided Interactive Chain-of-distillation (MAGIC) method. Specifically, a Meta-Ability Knowledge Distillation (MAKD) framework is proposed for decoupling and refining agents’ meta-abilities. A Meta-Knowledge Randomization Weighting (MKRW) and a Meta-Knowledge Transferable Determination (MKTD) module are incorporated to adjust aggregation weights at the meta-ability and sample levels respectively. Move beyond the traditional one-step unidirectional distillation, an Interactive Chain-of-Distillation (ICoD) strategy is proposed to allow students to give feedback to teachers, forming a new multi-step co-evolution pipeline. Remarkably, on the R2R test-unseen-public leaderboard, our smallest model, MAGIC-S, with only 5% of the teacher's size, outperforms all previous methods under the same training data. Additionally, our largest model, MAGIC-L, surpasses the previous SoTA by 5.84% in SPL and 3.18% in SR. Furthermore, a new dataset was collected and annotated from our living environments, where MAGIC-S demonstrated superior performance and real-time efficiency.
-
Install MatterPort3D Simulator: Start by installing the MatterPort3D simulator from the official repository.
-
Install Python Dependencies: Run the following command to install the necessary Python packages. Make sure to match the versions in
requirements.txt
to avoid compatibility issues, particularly when loading pre-trained weights for fine-tuning.pip install -r requirements.txt
-
Download Resources:
- Datasets and Features:: Links will be updated soon.
- Pre-trained Weights: Links will be updated soon.
- METER Pre-training (Optional): If you wish to pre-train GOAT using METER, download the model
meter_clip16_224_roberta_pretrain.ckpt
from here. - EnvEdit Weights (Optional): Available here.
- RoBERTa Tokenizer: If direct access to Hugging Face models is restricted, manually download
roberta-base
from Hugging Face and store it locally underdatasets/pretrained/roberta
.
Ensure your
datasets
directory follows this structure:datasets ├── R2R │ ├── annotations │ │ ├──pretrain_map │ │ └──RxR │ ├── connectivity │ ├── features │ ├── speaker │ ├── navigator │ ├── pretrain │ ├── test │ └── id_paths.json ├── RxR │ ├── navigator │ ├── pretrain │ └── test ├── EnvEdit └── pretrained ├── METER └── roberta
To pre-train the model, navigate to the pre-training source directory and execute the provided shell script. Replace r2r with the desired dataset name as needed.
cd pretrain_src
bash run_r2r_magic.sh
Please refer to GOAT's repository for confounder feature extraction.
To fine-tune the model, use the command below:
cd map_nav_src
bash scripts/run_r2r.sh
For model validation, execute the following:
cd map_nav_src
bash scripts/run_r2r_valid.sh
Since this article is still under review, we have omitted the model files. We plan to gradually release the training and validation code in the future.
Thank you for your understanding and support!