Fast-Slow Test-time Adaptation for Online Vision-and-Language Navigation
Junyu Gao, Xuan Yao, Changsheng Xu
State Key Laboratory of Multimodal Artificial Intelligence Systems, Institute of Automation, Chinese Academy of Sciences.
- Install Matterport3D simulators: follow instructions here. We use the latest version the same as DUET.
export PYTHONPATH=Matterport3DSimulator/build:$PYTHONPATH
- Install requirements:
conda create --name fsvln python=3.8.5
conda activate fsvln
- Required packages are listed in
requirements.txt
. You can install by running:
pip install -r requirements.txt
-
Please download data from Dropbox, including processed annotations, features and pretrained models of REVERIE datasets and R2R datasets. Before running the code, please put the data in `datasets' directory.
-
Please download pretrained LXMERT model by running:
mkdir -p datasets/pretrained
wget https://nlp.cs.unc.edu/data/model_LXRT.pth -P datasets/pretrained
Combine behavior cloning and auxiliary proxy tasks in pretraining:
cd pretrain_src
bash run_reverie.sh
Use pseudo interative demonstrator to fine-tune the model:
cd map_nav_src
bash scripts/run_reverie.sh
Use pseudo interative demonstrator to equip the model with our FSTTA:
cd map_nav_src
bash scrips/run_reverie_tta.sh
Our implementations are partially based on VLN-DUET, HM3DAutoVLN and VLN-BEVBert. Thanks to the authors for sharing their code.
- Reverie: Remote embodied visual referring expression in real indoor environments
- Beyond the Nav-Graph: Vision-and-Language Navigation in Continuous Environments
If you find this project useful in your research, please consider cite:
@inproceedings{Gao2024Fast,
title={Fast-Slow Test-time Adaptation for Online Vision-and-Language Navigation},
author={Junyu Gao and Xuan Yao and Changsheng Xu},
journal={Proceedings of the 41st International Conference on Machine Learning},
year={2024},
url={}
}