-
Notifications
You must be signed in to change notification settings - Fork 75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement lazy loading for traceable models #1105
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
👋 Hi! Thank you for contributing to llm-compressor. Please add the ready label when the PR is ready for review. Note: This is required to complete the testing suite, please only add the label once the PR is code complete and local testing has been performed. |
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
__all__ = ["_AliasableLazyModule"] | ||
|
||
|
||
class _AliasableLazyModule(_LazyModule): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't the registry take care of lazy loading?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not if we want to maintain the api of from llmcompressor.transformers.tracer import TraceableX
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, can we do something like
# examples script
from transformers import LlavaForConditionalGeneration
model = LlavaForConditionalGeneration.from_pretrained(...)
oneshot(model=model, ...)
# in the backend map `LlavaForConditionalGeneration` to your `llmcompressor.transformers.tracer` using registry.
# So, ex.,`LlavaForConditionalGeneration` maps to `TraceableLlavaForConditionalGeneration`
This way the user can just use transformers instead of our llmcompressor Model.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
model = LlavaForConditionalGeneration.from_pretrained(...)
- I'm not really sure what this API is pointing to, as from what you've written, there's no distinction between code which loads the normal vs traceable definitions.
- As we spoke about before, we'd need to change upstream code. These changes almost certainly wouldn't be accepted by the transformers team, as it's outside of the responsibilities of the
transformers
library - What is traceable for LLM Compressor is not what is traceable for other users. This is because there is specialized code in LLM Compressor in order to make tracing easier
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't the registry take care of lazy loading?
We could implement lazy loading using a registry, but this would involve having to add registry code into the traceable definition.
@TracingRegistry.register(name="TraceableLlavaForConditionalGeneration")
class LlavaForConditionalGeneration:
...
And imho this makes for a clunkier top level interface
from llmcompressor.transformers.tracing import TracingRegistry
model = TracingRegistry.load_from_registry("LlavaForConditionalGeneration").from_pretrained("path")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need to push code to HF upstream, we just use what in HF right now.
In llm-comp, we just add code mapping to which multi-model do we point to your tracing model.
Very simply something like
{
"HFMODEL": "YOUR_TRACIABLE_MODEL"
"LlavaForConditionalGeneration": "TraceableLlavaForConditionalGeneration"
}
But using the registry.
Here we can just use HF code in the example UX, but in backend would use your traciable model
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In llm-comp, we just add code mapping to which multi-model do we point to your tracing model.
Here we can just use HF code in the example UX, but in backend would use your traciable model
I understand what a registry dictionary is. I don't understand how it is possible to implement a registry in LLM Compressor while only using the HF library at the top level
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you're referring to dynamically replacing the model definition within oneshot
, I consider that to be an antipattern which makes it unclear to the user what model they're really loading. This also opens up the unintended consequences of loading a model definition twice, such as if the user modifies the model config prior to oneshot
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, traceable definitions are not needed for recipes which do not use the sequential pipeline. Because the traceable definitions include things like processing error checks, they're useful to keep in for most cases to allow the user to better debug their dataloading
Purpose
Changes
_AliasableLazyModule
which extends_LazyModule
to allow aliasesllmcompressor.transformers.tracing
with an instance of_AliasableLazyModule
which lazily loads submodules as they are neededTesting
tests/llmcompressor/transformers/tracing/test_init.py