-
Notifications
You must be signed in to change notification settings - Fork 32
Support of generate #261
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Support of generate #261
Conversation
Next steps are to crate tests and add docs |
There are 3 open questions:
|
As a side note, to support downstream evals during training, we do not necessarily need |
Let's keep the existing format and not use a new rule.
The
I don't really see the point of |
I just added future parity with However, I meant a different thing here — Fast-LLM's That said, @tscholak is okay with using tuples as keys in |
We still need to support different global batch sizes for the same For example, consider a training scenario with:
This configuration uses However, when performing evaluation during training, we cannot use the same Additionally, |
Yeah, you're right — |
The tuple and dot formats are basically the same, it's just parsed vs unparsed. I don't think there is much use for the unparsed version outside of a runnable, because dicts are better in-code.
Again, Note that sequential micro-batches are irrelevant for inference, it's just separate batches for all practical purposes. |
Thanks for the details.
Still, if you pass a OK, I’ll remove the batch config–related parameters from the constructor. |
I needed to downgrade |
Does this relate to #249? |
Should be a separate issue as However this branch is from May 12 main and it build docs on github |
tests/test_gpt_generate.py
Outdated
|
||
@pytest.fixture(scope="module") | ||
def model_and_tokenizer(): | ||
model = "HuggingFaceTB/SmolLM2-135M-Instruct" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How long does the test take to run? (Including the download time)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Around a minute with download and around 30 sec if already downloaded
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's too long even by slow test standard. Can we find a smaller model for testing, or make a mock one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, only need to address the documentation and slow test issues.
Concerning the doc, I pushed a tentative fix in #271, but we won't know if it works until we merge the PR.
Concerning the tests, I suggest you add one with a dummy checkpoint (ex. the one from test_checkpoint
, and keep the existing one with a skip mark. These longer tests do make sense, and I'd like us to find a way to bring them back without disrupting existing workflows.
@@ -0,0 +1,77 @@ | |||
--- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This won't be published because of @249. I think the problem is missing variables in
Fast-LLM/.github/workflows/docs.yaml
Line 59 in 3ac976b
FLASH_ATTENTION_SKIP_CUDA_BUILD=TRUE FLASH_ATTENTION_FORCE_BUILD=TRUE pip install --no-build-isolation -e ".[CORE,OPTIONAL,DEV,DOCS]" |
(Like those in
Fast-LLM/.github/workflows/docs.yaml
Line 34 in 3ac976b
FLASH_ATTENTION_SKIP_CUDA_BUILD=TRUE FLASH_ATTENTION_FORCE_BUILD=TRUE MAMBA_SKIP_CUDA_BUILD=TRUE MAMBA_FORCE_BUILD=TRUE CAUSAL_CONV1D_FORCE_BUILD=TRUE CAUSAL_CONV1D_SKIP_CUDA_BUILD=TRUE pip install --no-build-isolation -e ".[CORE,OPTIONAL,DEV,DOCS]" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
We can postpone finding a smaller model for the test to another time. The current test should be kept but not run by default.
✨ Description
Adds support for
generate
and extends support forforward
without handlingcache
,past_key_values
,labels
,attention
output, orinputs_embeds
.position_ids
are ignored and reconstructed from the attention mask.Currently, only data-parallel generation is supported.
Closes #217
🔍 Type of change
Select all that apply:
📝 Changes
List the key changes introduced in this PR:
HuggingfacePreTrainedModel
interface to achieve feature parity withFastLLMModel.from_pretrained
, and modified the__init__
method to optionally acceptrunner
andconfig
.forward
to supportgenerate
, aligning behavior with Hugging Face.generate
.✅ Checklist
Make sure the following tasks are completed before submitting the PR:
General
Testing