PaliGemma2 LitServe

PaliGemma 2 is an updated vision-language model that leverages Gemma 2 and SigLIP for superior performance on various vision-language tasks. It processes images and text to generate multilingual text outputs. This project shows how to create a self-hosted, private API that deploys a PaliGemma 2 vision language model with LitServe, an easy-to-use, flexible serving engine for AI models built on FastAPI.

Project Structure

The project is structured as follows:

server.py: The file containing the main code for the web server.
client.py: The file containing the code for client-side requests.
LICENSE: The license file for the project.
README.md: The README file that contains information about the project.
assets: The folder containing screenshots for working on the application.
images: The folder containing images for testing purposes.
.env.example: The example file for environment variables.
.gitignore: The file containing the list of files and directories to be ignored by Git.

Tech Stack

Python (for the programming language)
PyTorch (for the deep learning framework)
Hugging Face Transformers Library (for the model)
LitServe (for the serving engine)

Getting Started

To get started with this project, follow the steps below:

Run the server: python server.py
Upon running the server successfully, you will see uvicorn running on port 8000.
Open a new terminal window.
Run the client: python client.py

Now, you can see the model's output based on the input request. The model will generate captions in the selected language for the images provided in the images folder.

Note: You need a Hugging Face access token to run the application. You can get the token by signing up on the Hugging Face website and creating a new token from the settings page. After getting the token, you can set it as an environment variable ACCESS_TOKEN in your system by creating a .env file in the project's root directory. Check the .env.example file for reference.

Usage

The project can be used to serve the PaliGemma 2 family of models using LitServe. It particularly allows you to input an image and select a language to generate a caption, suggesting potential use cases in semantic tagging, visual question answering, and more.

Contributing

Contributions are welcome! If you would like to contribute to this project, please raise an issue to discuss the changes you want to make. Once the changes are approved, you can create a pull request.

License

This project is licensed under the Apache-2.0 License.

Contact

If you have any questions or suggestions about the project, feel free to contact me on my GitHub profile.

Happy coding! 🚀

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PaliGemma2 LitServe

Project Structure

Tech Stack

Getting Started

Usage

Contributing

License

Contact

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
assets		assets
images		images
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
client.py		client.py
server.py		server.py

License

sitamgithub-MSIT/paligemma2-litserve

Folders and files

Latest commit

History

Repository files navigation

PaliGemma2 LitServe

Project Structure

Tech Stack

Getting Started

Usage

Contributing

License

Contact

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages