Navi2Gaze: Leveraging Foundation Models for Navigation and Target Gazing

Jun Zhu*, Zihao Du*, Haotian Xu, Fengbo Lan, Zilong Zheng, Bo Ma, Shengjie Wang, Tao Zhang

we develop a VLM-driven method called Navigation-to-Gaze (Navi2Gaze) for efficient navigation and object gazing based on task descriptions. This method uses the VLM to score and select the best pose from numerous candidates automatically. In evaluations on multiple photorealistic simulation benchmarks, Navi2Gaze significantly outperforms existing approaches and precisely determines the optimal orientation relative to target objects.

Approach

Quick Start

Dependencies installation

To begin on your own machine, clone this repository locally

git clone https://github.com/zhujun3753/Navi2Gaze.git

Install requirements:

conda create -n navi2gaze python=3.9
conda activate navi2gaze

conda install pytorch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 pytorch-cuda=11.8 -c pytorch -c nvidia
python -m pip install 'git+https://github.com/facebookresearch/detectron2.git'
conda install habitat-sim -c conda-forge -c aihabitat
pip install -r requirements.txt

# install SEEM
pip install git+https://github.com/UX-Decoder/Segment-Everything-Everywhere-All-At-Once.git@package

# install SAM
pip install git+https://github.com/facebookresearch/segment-anything.git

# install Semantic-SAM
pip install git+https://github.com/UX-Decoder/Semantic-SAM.git@package

# install Deformable Convolution for Semantic-SAM
cd ops && sh make.sh && cd ..

cd thirdparty/octree_map && sh run.sh && cd -

# common error fix:
python -m pip install 'git+https://github.com/MaureenZOU/detectron2-xyz.git'

Download `clip-vit-base-patch32`

Download clip-vit-base-patch32 into openai/clip-vit-base-patch32

Fix bug

If necessary, add from mpi4py import MPI to /home/xxx/anaconda3/envs/navi2gaze/lib/python3.9/site-packages/seem/utils/distributed.py

Setup OpenAI

In order to test object goal navigation and spatial goal navigation tasks with our method, you need to setup an OpenAI API account with the following steps:

Sign up an OpenAI account, login your account, and bind your account with at least one payment method.
Get you OpenAI API keys, copy it.
Open your ~/.bashrc file, paste a new line export OPENAI_KEY=<your copied key>, and save the file.

Run Demo

# set instruction in `parsed_results = self.parse_object_goal_instruction("sit on the sofa")`
sh run.sh

Citation

If you find the dataset or code useful, please cite:

@misc{zhu2024navi2gazeleveragingfoundationmodels,
      title={Navi2Gaze: Leveraging Foundation Models for Navigation and Target Gazing}, 
      author={Jun Zhu and Zihao Du and Haotian Xu and Fengbo Lan and Zilong Zheng and Bo Ma and Shengjie Wang and Tao Zhang},
      year={2024},
      eprint={2407.09053},
      archivePrefix={arXiv},
      primaryClass={cs.RO},
      url={https://arxiv.org/abs/2407.09053}, 
}

License

MIT License

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
datasets/HM3D/hm3d_v2		datasets/HM3D/hm3d_v2
media		media
ops		ops
output/rendered/common/00669-DNWbUAJYsPy		output/rendered/common/00669-DNWbUAJYsPy
task_adapter		task_adapter
thirdparty/octree_map		thirdparty/octree_map
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
environment.yaml		environment.yaml
git.sh		git.sh
main.py		main.py
requirements.txt		requirements.txt
run.sh		run.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Navi2Gaze: Leveraging Foundation Models for Navigation and Target Gazing

Approach

Quick Start

Dependencies installation

Download `clip-vit-base-patch32`

Fix bug

Setup OpenAI

Run Demo

Citation

License

About

Releases

Packages

Languages

License

zhujun3753/Navi2Gaze

Folders and files

Latest commit

History

Repository files navigation

Navi2Gaze: Leveraging Foundation Models for Navigation and Target Gazing

Approach

Quick Start

Dependencies installation

Download clip-vit-base-patch32

Fix bug

Setup OpenAI

Run Demo

Citation

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Download `clip-vit-base-patch32`

Packages