-
Notifications
You must be signed in to change notification settings - Fork 149
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Background Compilation for OWLv2 Vision Model #1026
Conversation
…low/inference into lean/singleton-owlv2-model-compile
Hello @PawelPeczek-Roboflow, In addition to local testing, it was also deployed to the inference-internal of staging and production, and solved the problems we had with the long running in inference for instant models, which previously caused an error because the timeout on the nginx side of the inference-internal was 30s, now it runs around 3s. That's a smarter way to do the previous COMPILE_OWLV2_MODEL thing. Instead of not compiling and always having slower inferences, the first time we return the non-compiled instance and start the compilation process in the background and after compiling we replace the instance, making the next inferences faster. Slack threads for context: results, discussion thread |
Description
Added a singleton manager to handle OWLv2's vision model compilation in background, avoiding the initial 2-minute blocking compilation while maintaining the performance benefits of compiled models.
Changes
Example Usage
Benefits
Potential Risks
Type of change
How has this change been tested, please provide a testcase or example of how you tested the change?
Added tests to verify:
Any specific deployment considerations