Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement lazy loading for traceable models #1105

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

kylesayrs
Copy link
Collaborator

@kylesayrs kylesayrs commented Jan 28, 2025

Purpose

  • Some models may contain imports to libraries that are not part of the base installation. We do not want users who want to use one traceable model to be forced to install libraries used by another traceable model
  • Model definitions are large, so it's better to load them only when needed

Changes

  • Implemented _AliasableLazyModule which extends _LazyModule to allow aliases
  • Dynamically replace llmcompressor.transformers.tracing with an instance of _AliasableLazyModule which lazily loads submodules as they are needed

Testing

  • Added passing tests in tests/llmcompressor/transformers/tracing/test_init.py

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Copy link

👋 Hi! Thank you for contributing to llm-compressor. Please add the ready label when the PR is ready for review.

Note: This is required to complete the testing suite, please only add the label once the PR is code complete and local testing has been performed.

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
@kylesayrs kylesayrs self-assigned this Jan 28, 2025
@kylesayrs kylesayrs added the ready When a PR is ready for review label Jan 28, 2025
@kylesayrs kylesayrs marked this pull request as ready for review January 28, 2025 17:37
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
__all__ = ["_AliasableLazyModule"]


class _AliasableLazyModule(_LazyModule):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't the registry take care of lazy loading?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not if we want to maintain the api of from llmcompressor.transformers.tracer import TraceableX

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, can we do something like

# examples script
from transformers import LlavaForConditionalGeneration
model = LlavaForConditionalGeneration.from_pretrained(...)

oneshot(model=model, ...)

# in the backend map `LlavaForConditionalGeneration` to your `llmcompressor.transformers.tracer` using registry.
# So, ex.,`LlavaForConditionalGeneration` maps to `TraceableLlavaForConditionalGeneration`

This way the user can just use transformers instead of our llmcompressor Model.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

model = LlavaForConditionalGeneration.from_pretrained(...)

  1. I'm not really sure what this API is pointing to, as from what you've written, there's no distinction between code which loads the normal vs traceable definitions.
  2. As we spoke about before, we'd need to change upstream code. These changes almost certainly wouldn't be accepted by the transformers team, as it's outside of the responsibilities of the transformers library
  3. What is traceable for LLM Compressor is not what is traceable for other users. This is because there is specialized code in LLM Compressor in order to make tracing easier

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't the registry take care of lazy loading?

We could implement lazy loading using a registry, but this would involve having to add registry code into the traceable definition.

@TracingRegistry.register(name="TraceableLlavaForConditionalGeneration")
class LlavaForConditionalGeneration:
    ...

And imho this makes for a clunkier top level interface

from llmcompressor.transformers.tracing import TracingRegistry

model = TracingRegistry.load_from_registry("LlavaForConditionalGeneration").from_pretrained("path")

Copy link
Collaborator

@horheynm horheynm Jan 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need to push code to HF upstream, we just use what in HF right now.
In llm-comp, we just add code mapping to which multi-model do we point to your tracing model.

Very simply something like

{
"HFMODEL": "YOUR_TRACIABLE_MODEL"
"LlavaForConditionalGeneration": "TraceableLlavaForConditionalGeneration"
}

But using the registry.

Here we can just use HF code in the example UX, but in backend would use your traciable model

Copy link
Collaborator Author

@kylesayrs kylesayrs Jan 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In llm-comp, we just add code mapping to which multi-model do we point to your tracing model.
Here we can just use HF code in the example UX, but in backend would use your traciable model

I understand what a registry dictionary is. I don't understand how it is possible to implement a registry in LLM Compressor while only using the HF library at the top level

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you're referring to dynamically replacing the model definition within oneshot, I consider that to be an antipattern which makes it unclear to the user what model they're really loading. This also opens up the unintended consequences of loading a model definition twice, such as if the user modifies the model config prior to oneshot

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, traceable definitions are not needed for recipes which do not use the sequential pipeline. Because the traceable definitions include things like processing error checks, they're useful to keep in for most cases to allow the user to better debug their dataloading

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready When a PR is ready for review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants