Implement lazy loading for traceable models #1105

kylesayrs · 2025-01-28T15:24:41Z

Purpose

Some models may contain imports to libraries that are not part of the base installation. We do not want users who want to use one traceable model to be forced to install libraries used by another traceable model
Model definitions are large, so it's better to load them only when needed

Changes

Implemented _AliasableLazyModule which extends _LazyModule to allow aliases
Dynamically replace llmcompressor.transformers.tracing with an instance of _AliasableLazyModule which lazily loads submodules as they are needed
- Similar to lazy loading implementation by transformers

Testing

Added passing tests in tests/llmcompressor/transformers/tracing/test_init.py

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

github-actions · 2025-01-28T15:24:55Z

👋 Hi! Thank you for contributing to llm-compressor. Please add the ready label when the PR is ready for review.

Note: This is required to complete the testing suite, please only add the label once the PR is code complete and local testing has been performed.

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

horheynm · 2025-01-28T19:10:08Z

src/llmcompressor/utils/AliasableLazyModule.py

+__all__ = ["_AliasableLazyModule"]
+
+
+class _AliasableLazyModule(_LazyModule):


Doesn't the registry take care of lazy loading?

Not if we want to maintain the api of from llmcompressor.transformers.tracer import TraceableX

I see, can we do something like

# examples script from transformers import LlavaForConditionalGeneration model = LlavaForConditionalGeneration.from_pretrained(...) oneshot(model=model, ...) # in the backend map `LlavaForConditionalGeneration` to your `llmcompressor.transformers.tracer` using registry. # So, ex.,`LlavaForConditionalGeneration` maps to `TraceableLlavaForConditionalGeneration`

This way the user can just use transformers instead of our llmcompressor Model.

model = LlavaForConditionalGeneration.from_pretrained(...)

I'm not really sure what this API is pointing to, as from what you've written, there's no distinction between code which loads the normal vs traceable definitions.

As we spoke about before, we'd need to change upstream code. These changes almost certainly wouldn't be accepted by the transformers team, as it's outside of the responsibilities of the transformers library

What is traceable for LLM Compressor is not what is traceable for other users. This is because there is specialized code in LLM Compressor in order to make tracing easier

Doesn't the registry take care of lazy loading?

We could implement lazy loading using a registry, but this would involve having to add registry code into the traceable definition.

@TracingRegistry.register(name="TraceableLlavaForConditionalGeneration") class LlavaForConditionalGeneration: ...

And imho this makes for a clunkier top level interface

from llmcompressor.transformers.tracing import TracingRegistry model = TracingRegistry.load_from_registry("LlavaForConditionalGeneration").from_pretrained("path")

We don't need to push code to HF upstream, we just use what in HF right now.
In llm-comp, we just add code mapping to which multi-model do we point to your tracing model.

Very simply something like

{ "HFMODEL": "YOUR_TRACIABLE_MODEL" "LlavaForConditionalGeneration": "TraceableLlavaForConditionalGeneration" }

But using the registry.

Here we can just use HF code in the example UX, but in backend would use your traciable model

In llm-comp, we just add code mapping to which multi-model do we point to your tracing model.
Here we can just use HF code in the example UX, but in backend would use your traciable model

I understand what a registry dictionary is. I don't understand how it is possible to implement a registry in LLM Compressor while only using the HF library at the top level

If you're referring to dynamically replacing the model definition within oneshot, I consider that to be an antipattern which makes it unclear to the user what model they're really loading. This also opens up the unintended consequences of loading a model definition twice, such as if the user modifies the model config prior to oneshot

Also, traceable definitions are not needed for recipes which do not use the sequential pipeline. Because the traceable definitions include things like processing error checks, they're useful to keep in for most cases to allow the user to better debug their dataloading

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

implement lazy loading for traceable models

f03037e

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

kylesayrs added 3 commits January 28, 2025 15:50

wip

f3be499

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

extend to support import with aliases

ab052d5

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

clean tests

6862da6

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

kylesayrs self-assigned this Jan 28, 2025

kylesayrs added the ready When a PR is ready for review label Jan 28, 2025

kylesayrs marked this pull request as ready for review January 28, 2025 17:37

rename file

5d5dbd4

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

horheynm reviewed Jan 28, 2025

View reviewed changes

kylesayrs added 2 commits January 29, 2025 05:38

fix typo

d4a36a4

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

add marks

df31d83

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement lazy loading for traceable models #1105

Implement lazy loading for traceable models #1105

kylesayrs commented Jan 28, 2025 •

edited

Loading

github-actions bot commented Jan 28, 2025

horheynm Jan 28, 2025

kylesayrs Jan 28, 2025

horheynm Jan 28, 2025

kylesayrs Jan 28, 2025

kylesayrs Jan 28, 2025

horheynm Jan 28, 2025 •

edited

Loading

kylesayrs Jan 29, 2025 •

edited

Loading

kylesayrs Jan 29, 2025

kylesayrs Jan 29, 2025

		__all__ = ["_AliasableLazyModule"]


		class _AliasableLazyModule(_LazyModule):

Implement lazy loading for traceable models #1105

Are you sure you want to change the base?

Implement lazy loading for traceable models #1105

Conversation

kylesayrs commented Jan 28, 2025 • edited Loading

Purpose

Changes

Testing

github-actions bot commented Jan 28, 2025

horheynm Jan 28, 2025

Choose a reason for hiding this comment

kylesayrs Jan 28, 2025

Choose a reason for hiding this comment

horheynm Jan 28, 2025

Choose a reason for hiding this comment

kylesayrs Jan 28, 2025

Choose a reason for hiding this comment

kylesayrs Jan 28, 2025

Choose a reason for hiding this comment

horheynm Jan 28, 2025 • edited Loading

Choose a reason for hiding this comment

kylesayrs Jan 29, 2025 • edited Loading

Choose a reason for hiding this comment

kylesayrs Jan 29, 2025

Choose a reason for hiding this comment

kylesayrs Jan 29, 2025

Choose a reason for hiding this comment

kylesayrs commented Jan 28, 2025 •

edited

Loading

horheynm Jan 28, 2025 •

edited

Loading

kylesayrs Jan 29, 2025 •

edited

Loading