Jun Zhu*, Zihao Du*, Haotian Xu, Fengbo Lan, Zilong Zheng, Bo Ma, Shengjie Wang, Tao Zhang
we develop a VLM-driven method called Navigation-to-Gaze (Navi2Gaze) for efficient navigation and object gazing based on task descriptions. This method uses the VLM to score and select the best pose from numerous candidates automatically. In evaluations on multiple photorealistic simulation benchmarks, Navi2Gaze significantly outperforms existing approaches and precisely determines the optimal orientation relative to target objects.
To begin on your own machine, clone this repository locally
git clone https://github.com/zhujun3753/Navi2Gaze.git
Install requirements:
conda create -n navi2gaze python=3.9
conda activate navi2gaze
conda install pytorch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 pytorch-cuda=11.8 -c pytorch -c nvidia
python -m pip install 'git+https://github.com/facebookresearch/detectron2.git'
conda install habitat-sim -c conda-forge -c aihabitat
pip install -r requirements.txt
# install SEEM
pip install git+https://github.com/UX-Decoder/Segment-Everything-Everywhere-All-At-Once.git@package
# install SAM
pip install git+https://github.com/facebookresearch/segment-anything.git
# install Semantic-SAM
pip install git+https://github.com/UX-Decoder/Semantic-SAM.git@package
# install Deformable Convolution for Semantic-SAM
cd ops && sh make.sh && cd ..
cd thirdparty/octree_map && sh run.sh && cd -
# common error fix:
python -m pip install 'git+https://github.com/MaureenZOU/detectron2-xyz.git'
Download clip-vit-base-patch32 into openai/clip-vit-base-patch32
If necessary, add from mpi4py import MPI
to /home/xxx/anaconda3/envs/navi2gaze/lib/python3.9/site-packages/seem/utils/distributed.py
In order to test object goal navigation and spatial goal navigation tasks with our method, you need to setup an OpenAI API account with the following steps:
- Sign up an OpenAI account, login your account, and bind your account with at least one payment method.
- Get you OpenAI API keys, copy it.
- Open your
file, paste a new lineexport OPENAI_KEY=<your copied key>
, and save the file.
# set instruction in `parsed_results = self.parse_object_goal_instruction("sit on the sofa")`
sh run.sh
If you find the dataset or code useful, please cite:
title={Navi2Gaze: Leveraging Foundation Models for Navigation and Target Gazing},
author={Jun Zhu and Zihao Du and Haotian Xu and Fengbo Lan and Zilong Zheng and Bo Ma and Shengjie Wang and Tao Zhang},
MIT License