You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
class Buffer:
"""
The core expert-parallel (EP) communication buffers for Mixture of Experts (MoE) model, which supports:
- high-throughput intranode all-to-all (dispatch and combine, using NVLink)
- high-throughput internode all-to-all (dispatch and combine, using RDMA without AR)
- low-latency all-to-all (dispatch and combine, using RDMA, AR supported)
Attributes:
num_sms: the SMs used in high-throughput kernels.
rank: the local rank number.
group_size: the number of ranks in the group.
group: the communication group.
num_nvl_bytes: the buffer size for intranode NVLink communication.
num_rdma_bytes: the buffer size for internode (also for intranode with low-latency mode) RDMA communication.
runtime: the C++ runtime.
"""
num_sms: int = 20
def __init__(self, group: dist.ProcessGroup,
num_nvl_bytes: int = 0, num_rdma_bytes: int = 0,
low_latency_mode: bool = False, num_qps_per_rank: int = 1) -> None:
"""
Initialize the communication buffer.
Arguments:
group: the communication group.
num_nvl_bytes: the buffer size for intranode NVLink communication.
num_rdma_bytes: the buffer size for internode (also for intranode with low-latency mode) RDMA communication.
low_latency_mode: whether to enable low-latency mode.
num_qps_per_rank: the number of QPs for RDMA, the low-latency mode requires that this number equals
to the number of local experts.
"""
# Initialize the CPP runtime
print("=============================================", flush=True)
self.rank = group.rank()
self.group_size = group.size()
self.group = group
The text was updated successfully, but these errors were encountered:
class Buffer:
"""
The core expert-parallel (EP) communication buffers for Mixture of Experts (MoE) model, which supports:
- high-throughput intranode all-to-all (dispatch and combine, using NVLink)
- high-throughput internode all-to-all (dispatch and combine, using RDMA without AR)
- low-latency all-to-all (dispatch and combine, using RDMA, AR supported)
The text was updated successfully, but these errors were encountered: