Converter for Llama based Masked Diffusion Models (Based on Dream) #263

nitsanluke · 2025-05-13T14:31:31Z

✨ Description

This PR provides a converter for Diffusion models based on Llama (and Dream). It complements the mask-diffusion training PR #238 and needs to merge after.

🔍 Type of change

Select all that apply:

🐛 Bug fix (non-breaking change that addresses a specific issue)
🚀 New feature (non-breaking change that adds functionality)
⚠️ Breaking change (a change that could affect existing functionality)
📈 Performance improvement/optimization (improves speed, memory usage, or efficiency)
🛠️ Code refactor (non-functional changes that improve code readability, structure, etc.)
📦 Dependency bump (updates dependencies, including Dockerfile or package changes)
📝 Documentation change (updates documentation, including new content or typo fixes)
🔧 Infrastructure/Build change (affects build process, CI/CD, or dependencies)

📝 Changes

List the key changes introduced in this PR:

DiffusionLlama converter: For us to export our masked-LLM into a HF model
Dream converter: For us to test Dream models

✅ Checklist

Make sure the following tasks are completed before submitting the PR:

General

📜 I have read and followed the contributing guidelines.
🏷️ I am using a clear and descriptive PR title that summarizes the key change or feature introduced.
🎉 The functionality is complete, and I have tested the changes.
📝 I have updated the documentation if needed.
⚠️ The change does not introduce any new issues (e.g., runtime warnings, type checker errors, linting problems, unhandled edge cases).
🧩 I have commented my code, especially in hard-to-understand areas.

Dependencies and Configuration

🐋 I have updated the Docker configuration or dependencies, if applicable.
🔄 I have ensured compatibility with the existing setup after dependency changes.

Testing

🧪 I have added or updated tests to cover my changes.
✔️ New and existing tests pass locally with my changes.
🚦 I have tested these changes on GPUs and verified training stability.
🏋️ I have tested the changes on realistic training workloads, if applicable.

Performance Impact

📊 I have run benchmarks where applicable to evaluate the performance impact.
✅ The benchmarks show no performance regression.
🚀 The benchmarks indicate a potential performance improvement.
⚠️ The benchmarks indicate a potential performance degradation.
📈 I have provided benchmark results and detailed any performance impact below, if applicable.

📊 Performance Impact Details

If there is any impact on performance, describe it and provide benchmark results, if applicable:

🗒️ Additional Notes

tests/test_checkpoint.py ...............s.Fssss.

jlamypoirier · 2025-05-15T17:09:57Z

fast_llm/engine/checkpoint/huggingface.py

@@ -133,6 +133,7 @@ class CustomModelingExportMixin:
    modeling_file: typing.ClassVar[str]
    configuration_file: typing.ClassVar[str]
    configuration_cls: typing.ClassVar[type[PretrainedConfig]]
+    generation_utils_file: typing.ClassVar[typing.Optional[str]] = None


jlamypoirier · 2025-05-15T17:10:39Z

tests/test_checkpoint.py

@@ -121,7 +121,7 @@ def test_convert_distributed_to_fast_llm():


 @pytest.mark.depends(on=["test_convert_distributed_to_fast_llm"])
-def test_convert_fast_llm_to_huggingface():
+def test_convert_fast_llm_to_huggingface():    


Please run the formatting hook.

nitsanluke added 8 commits May 7, 2025 20:27

adding dream modeling

6c9bd4f

add conversion for dream

f58a6a9

Adding Llama diffusion conveter

089b1f0

rename dream

a311de9

update tests

262edb9

update test cases

58bb8d3

add gen utils to export

12a0e1d

update tests

bcf023e

nitsanluke changed the title ~~WIP: Converter for Llama based Masked Diffusion Models (Based on Dream)~~ Converter for Llama based Masked Diffusion Models (Based on Dream) May 14, 2025

nitsanluke added 2 commits May 14, 2025 17:56

update mask token

6824180

Merge branch 'main' into add_diffusion_qwen

f13657c

nitsanluke marked this pull request as ready for review May 14, 2025 17:57

jlamypoirier reviewed May 15, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Converter for Llama based Masked Diffusion Models (Based on Dream) #263

Converter for Llama based Masked Diffusion Models (Based on Dream) #263

nitsanluke commented May 13, 2025 •

edited

Loading

jlamypoirier May 15, 2025

jlamypoirier May 15, 2025

Converter for Llama based Masked Diffusion Models (Based on Dream) #263

Are you sure you want to change the base?

Converter for Llama based Masked Diffusion Models (Based on Dream) #263

Conversation

nitsanluke commented May 13, 2025 • edited Loading

✨ Description

🔍 Type of change

📝 Changes

✅ Checklist

General

Dependencies and Configuration

Testing

Performance Impact

📊 Performance Impact Details

🗒️ Additional Notes

jlamypoirier May 15, 2025

Choose a reason for hiding this comment

jlamypoirier May 15, 2025

Choose a reason for hiding this comment

nitsanluke commented May 13, 2025 •

edited

Loading