This repository is an implementation of inferring the PaliGemma Vision Language Model on Android using Hugging Face-Gradio Client API for tasks such as zero-shot object detection, image captioning and visual question-answering.
Visual question-answering, zero-shot object detection, image captioning
Reference Expression Segmentation
Model used: Florence-2
- Colab notebooks for PaliGemma
- Official Gemma Cookbook
- Medium blog for step-by-step implementation.
- Big Vision HF 🤗 Spaces
If you find this project useful for your work, please cite it using the following BibTeX entry:
@misc{PaliGemma on Android using Hugging Face API,
authors = {Nitin Tiwari, Sagar Malhotra, Savio Rodrigues},
title = {PaliGemma on Android using Hugging Face API},
year = {2024},
publisher = {GitHub},
howpublished = {\url{https://github.com/NSTiwari/PaliGemma-Android-HF}},
}
This project was developed during Google's ML Developer Programs AI Sprint. Thanks to the MLDP team for providing Google Cloud credits to support this project.