-
Notifications
You must be signed in to change notification settings - Fork 496
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update model builders #2282
base: main
Are you sure you want to change the base?
Update model builders #2282
Conversation
…ora_gemma functions
…lora_gemma2 functions
…a_llama2, llama2_classifier and lora_llama2_classifier functions
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/2282
Note: Links to docs will display an error until the docs builds have been completed. This comment was automatically generated by Dr. CI and updates every 15 minutes. |
@ebsmothers for now, I have only updated the torchtune/torchtune/modules/transformer.py Line 375 in 779569e
passes all the tests. If this PR looks good, I will:
|
Just one comment about make sure RoPE is instantiated correctly, but the rest looks good |
torchtune/models/t5/_encoder.py
Outdated
self.layers = ( | ||
layers if isinstance(layers, nn.ModuleList) else nn.ModuleList(layers) | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case it's not a big deal since T5 is not yet widely-used in the repo, but fyi we do have to be careful about deprecating num_layers
as it can break people. The proper thing to do is continue supporting it for one release and mark as deprecated (similar to what we have here for functions/classes)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for pointing it out. I will re-introduce support for num_layers
and add deprecated
decorator with message "num_layers
argument will be deprecated in upcoming release."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ebsmothers Do you want me to add deprecated
decorator to TransformerDecoder
class as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Ankur-singh actually I wouldn’t worry about adding the decorator to the class. I think we need a separate utility to log only when the to-be-deprecated argument is passed. Otherwise everyone will see the warning about deprecation of num_layers even if they aren’t using it
[0.3383, 0.3150], | ||
[0.3727, 0.2892], | ||
[0.3996, 0.2653], | ||
[0.4958, 0.4845], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit surprised that the expected values need to change here. I would have thought that uniform initialization iterating over model.parameters()
shouldn't be affected by whether we use _get_clones
or not
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After making all the changes, I tried pytest tests
with:
- with
_get_clones
: success - Without
_get_clones
: failed
Hence, I thought the difference is because of _get_clones
. Ran the following script to test the hypothesis:
import torch
from torchtune.modules.transformer import _get_clones
from torchtune.modules.peft import LoRALinear
def main():
loras_loop = [None] * 4
for i in range(4):
loras_loop[i] = LoRALinear(in_dim=16, out_dim=16, rank=4, alpha=1.0)
loras_cloned = _get_clones(
LoRALinear(in_dim=16, out_dim=16, rank=4, alpha=1.0), 4
)
loop_max_diff = torch.max(torch.abs(loras_loop[0].lora_a.weight - loras_loop[3].lora_a.weight))
cloned_max_diff = torch.max(torch.abs(loras_cloned[0].lora_a.weight - loras_cloned[3].lora_a.weight))
print(f"Max diff between layers using for-loop: {loop_max_diff}")
print(f"Max diff between layers using _get_clones: {cloned_max_diff}")
input = torch.randn(1, 16)
output1 = input.clone()
for layer in loras_loop:
output1 = layer(output1)
output2 = input.clone()
for layer in loras_cloned:
output2 = layer(output2)
cloned_max_diff = torch.max(torch.abs(output1 - output2))
print(f"Max diff between outputs from two approach: {cloned_max_diff}")
if __name__ == "__main__":
main()
# ----
# Max diff between layers using for-loop: 0.4660979211330414
# Max diff between layers using _get_clones: 0.0
# Max diff between outputs from two approach: 0.3515825569629669
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I suspect this is due to something with the random seed. Let me take a closer look to confirm but otherwise I think updating the expected values is fine
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also dug into this one a bit more: the changes to T5 change the order in which modules are registered, so when we do
for param in model.parameters():
param.data.uniform_(0, 1)
we again wind up with different state for the rng. So again it's fine to change the expected values here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for making these changes! I left a few comments (mainly in response to some of @RdoubleA's points about RoPE -- TLDR is don't worry too much for this PR provided everything works). Otherwise no major concerns though
@ebsmothers & @RdoubleA I have made the requested changes. GPU test fails with the error message: FAILED tests/recipes/dev/test_generate_v2.py::TestGenerateV2::test_llama2_generate_results - AssertionError: assert 'Country maior Connection Kohćutsójcustomulas Sometimes Security' in 'INFO torchtune.utils._logging:_utils.py:28 Running InferenceRecipe with resolved config:\n\ncheckpointer:\n _component_: torchtune.training.FullModelTorchTuneCheckpointer\n checkpoint_dir: /tmp/test-artifacts\n checkpoint_files:\n - /tmp/test-artifacts/small-ckpt-tune-03082024.pt\n model_type: LLAMA2\n output_dir: /tmp/pytest-of-ec2-user/pytest-0/test_llama2_generate_results0\ndevice: cpu\ndtype: fp32\nlog_level: INFO\nmax_new_tokens: 10\nmodel:\n _component_: torchtune.models.llama2.llama2\n embed_dim: 256\n max_seq_len: 2048\n norm_eps: 1.0e-05\n num_heads: 16\n num_kv_heads: 8\n num_layers: 4\n vocab_size: 32000\noutput_dir: /tmp/pytest-of-ec2-user/pytest-0/test_llama2_generate_results0\nprompt:\n system: You are a helpful and creative AI assistant.\n user: What is the capital of France?\nseed: 123\ntemperature: 0.6\ntokenizer:\n _component_: torchtune.models.llama2.llama2_tokenizer\n max_seq_len: 2048\n path: /tmp/test-artifacts/tokenizer.model\ntop_k: 300\n\nINFO torchtune.utils._logging:generate_v2.py:94 Model was initialized with precision torch.float32.\nINFO torchtune.utils._logging:generate_v2.py:208 \n\nPietroместkap щotimes rivers cache НиtringindexPathNAME\n\nINFO torchtune.utils._logging:generate_v2.py:112 Time for inference: 0.08 sec total, 135.68 tokens/sec\nINFO torchtune.utils._logging:generate_v2.py:115 Bandwidth achieved: 9.92 GiB/s\n'
====== 1 failed, 761 passed, 7 skipped, 14 warnings in 2548.47s (0:42:28) ====== It's because of
I'm assuming it's because of initialization (refer #2282 (comment)). I had to update the out tensors for T5 encoder as well. I think we will have to take a closer look sometime in future. |
Hi @Ankur-singh thanks for your patience, I wanted to sanity check the failing test to make sure it's expected. In our Anyways TLDR for this PR is that you are not breaking anything, you can safely just update the expected value for this test. Separately we can think about whether it's clearer to call |
Hi @ebsmothers thanks for clarifying. I'm a bit confused, after initializing the model, we load the
I also tried out this small script to see if calling a random operation before setting the seed affects the outcome (looks like it does not): import torch
# Case 1: No random operations before setting seed
torch.manual_seed(42)
x1 = torch.randn(3)
print("Case 1:", x1)
_ = torch.randn(5) # Consumes RNG state
_ = torch.randn(5)
# Case 2: Random operation before setting seed
torch.manual_seed(42)
x2 = torch.randn(3)
print("Case 2:", x2)
print(f"Case 1 and Case 2 are equal: {torch.allclose(x1, x2)}")
-----
# Output:
# Case 1: tensor([0.3367, 0.1288, 0.2345])
# Case 2: tensor([0.3367, 0.1288, 0.2345])
# Case 1 and Case 2 are equal: True TLDR, as long as we are loading the model weight from same checkpoint and calling PS: replacing the text here
"Pietroместkap щotimes rivers cache НиtringindexPathNAME" gets a green light. Tested it by running pytest tests/recipes/dev/test_generate_v2.py --ignore tests/torchtune/modules/_export --with-integration --durations=20 -vv
|
Context
What is the purpose of this PR? Is it to
Please link to any issues this PR addresses #2270
Changelog
What are the changes made in this PR?
component_builder
to passnn.ModuleList
toTransformerDecoder
instead of (layer
+num_layers
).T5
modelTest plan
Please make sure to do each of the following if applicable to your PR. If you're unsure about any one of these just ask and we will happily help. We also have a contributing page for some guidance on contributing.
pre-commit install
)pytest tests
pytest tests -m integration_test
UX
If your function changed a public API, please add a dummy example of what the user experience will look like when calling it.
Here is a docstring example
and a tutorial example