-
Notifications
You must be signed in to change notification settings - Fork 194
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
A bug in switch_gate #199
Comments
That is a good point. I think you are right. Can you please open a pull request on this? Thanks. BTW, I am also wondering if the capacity calculation in |
Hi, guys! The following is my code: import torch
from fmoe.gates import *
device = torch.device("cuda:0")
sg = SwitchGate(d_model=64, num_expert=5, world_size=2)
sg = sg.to(device)
input = torch.rand(128, 64) # (batch_size, d_model)
input = input.to(device)
idx, val = sg(input)
print(idx, idx.shape)
print(val, val.shape) Parameter |
@Peg-Wu As you are not using torch distributed, |
谢谢您的回复~ 如果我想用DDP进行加速, |
非常感谢! |
Describe the bug
In
fmoe/gates/switch_gate.py
line 45:capacity = math.ceil(cap_rate * inp.shape[0])
should be:
capacity = math.ceil(cap_rate * inp.shape[0] / self.num_expert)
?The text was updated successfully, but these errors were encountered: