Skip to content

This repository is an implementation of inferring the PaliGemma Vision Language Model on Android using Hugging Face-Gradio Client API for tasks such as zero-shot object detection, image captioning and visual question-answering.

License

Notifications You must be signed in to change notification settings

NSTiwari/PaliGemma-Android-HF

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PaliGemma Android HF

This repository is an implementation of inferring the PaliGemma Vision Language Model on Android using Hugging Face-Gradio Client API for tasks such as zero-shot object detection, image captioning and visual question-answering.

Pipeline:

Demo Outputs:

Visual question-answering, zero-shot object detection, image captioning

Reference Expression Segmentation

Model used: Florence-2

Resources:

Citation

If you find this project useful for your work, please cite it using the following BibTeX entry:

@misc{PaliGemma on Android using Hugging Face API,
  authors      = {Nitin Tiwari, Sagar Malhotra, Savio Rodrigues},
  title        = {PaliGemma on Android using Hugging Face API},
  year         = {2024},
  publisher    = {GitHub},
  howpublished = {\url{https://github.com/NSTiwari/PaliGemma-Android-HF}},
}

Acknowledgment

This project was developed during Google's ML Developer Programs AI Sprint. Thanks to the MLDP team for providing Google Cloud credits to support this project.

About

This repository is an implementation of inferring the PaliGemma Vision Language Model on Android using Hugging Face-Gradio Client API for tasks such as zero-shot object detection, image captioning and visual question-answering.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •