Skip to content

Commit

Permalink
mup set to false by default
Browse files Browse the repository at this point in the history
  • Loading branch information
gordicaleksa committed Jul 1, 2024
1 parent 14fdd84 commit 8d3f9b9
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion train_gpt2.py
Original file line number Diff line number Diff line change
Expand Up @@ -592,7 +592,7 @@ def print0(*args, **kwargs):
parser.add_argument("--weight_decay", type=float, default=0.0, help="weight decay")
parser.add_argument("--grad_clip", type=float, default=1.0, help="maximum gradient magnitude")
# mup - maximum update parametrization
parser.add_argument("--use_mup", type=int, default=1, help="should we use maximum update parametrization")
parser.add_argument("--use_mup", type=int, default=0, help="should we use maximum update parametrization")
parser.add_argument("--mup_width_mult", type=float, default=1.0, help="width multiplier - ratio of width to base model width")
parser.add_argument("--mup_base_attn_mult", type=float, default=1.0, help="base attention multiplier")
# evaluation
Expand Down

0 comments on commit 8d3f9b9

Please sign in to comment.