A bug in switch_gate #199

Heihaierr · 2024-03-07T11:25:28Z

Describe the bug
In fmoe/gates/switch_gate.py line 45:
capacity = math.ceil(cap_rate * inp.shape[0])

should be:
capacity = math.ceil(cap_rate * inp.shape[0] / self.num_expert) ?

The text was updated successfully, but these errors were encountered:

laekov · 2024-03-11T07:31:39Z

That is a good point. I think you are right. Can you please open a pull request on this? Thanks.

BTW, I am also wondering if the capacity calculation in GShardGate is wrong. @zms1999

Peg-Wu · 2024-04-06T04:26:22Z

Hi, guys!
Thanks for your fantastic work.
I met a problem when I use class SwitchGate, can you take a look at it for me?

The following is my code:

import torch
from fmoe.gates import *

device = torch.device("cuda:0")

sg = SwitchGate(d_model=64, num_expert=5, world_size=2)
sg = sg.to(device)

input = torch.rand(128, 64) # (batch_size, d_model)
input = input.to(device)

idx, val = sg(input)
print(idx, idx.shape)
print(val, val.shape)

Parameter word_size can only set to 1, or it will occur the error "Segmentation fault (core dumped)".

laekov · 2024-04-07T10:25:15Z

@Peg-Wu As you are not using torch distributed, world_size has to be 1.

Peg-Wu · 2024-04-08T04:50:12Z

谢谢您的回复~

如果我想用DDP进行加速，
我应该怎样修改代码，
可以使用pytorch官方的DDP并行吗

laekov · 2024-04-08T07:44:09Z

@Peg-Wu Refer to this test

Peg-Wu · 2024-04-08T07:54:11Z

非常感谢！

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A bug in switch_gate #199

A bug in switch_gate #199

Heihaierr commented Mar 7, 2024

laekov commented Mar 11, 2024

Peg-Wu commented Apr 6, 2024

laekov commented Apr 7, 2024

Peg-Wu commented Apr 8, 2024

laekov commented Apr 8, 2024

Peg-Wu commented Apr 8, 2024

A bug in switch_gate #199

A bug in switch_gate #199

Comments

Heihaierr commented Mar 7, 2024

laekov commented Mar 11, 2024

Peg-Wu commented Apr 6, 2024

laekov commented Apr 7, 2024

Peg-Wu commented Apr 8, 2024

laekov commented Apr 8, 2024

Peg-Wu commented Apr 8, 2024