V-APT: Action Recognition Based on Image to Video Prompt-Tuning

Paper: https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10660097

1. Introduction

We propose the V-APT method, which incorporates a lightweight LSTM block for temporal relationship modeling before average temporal pooling. Furthermore, our approach also encompasses a deep visual and text prompt interaction block, which guides visual prompts through text prompts in order to ensure mutual coordination between vision and language.

2. Method

3. Experiments

Dataset

The specific experimental code and data will be released after the paper is accepted. Please stay tuned.

Acknowledgment

This research is supported by the "Tianjin University of Technology Postgraduate Scientific Research Innovation Project"(No. YJ2390), thanks to the Tianjin University of Technology for its funding, and thanks to the TECHNICAL COLLEGE FOR THE DEAF for providing better resources.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
.github/workflows		.github/workflows
README.md		README.md
main.py		main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

V-APT: Action Recognition Based on Image to Video Prompt-Tuning

1. Introduction

2. Method

3. Experiments

Dataset

Acknowledgment

About

Releases

Packages

Languages

JiangjiangLan/V-APT

Folders and files

Latest commit

History

Repository files navigation

V-APT: Action Recognition Based on Image to Video Prompt-Tuning

1. Introduction

2. Method

3. Experiments

Dataset

Acknowledgment

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages