[Research] Custom Deepseek Routing #1070

kylesayrs · 2025-01-14T20:07:46Z

Purpose

Provide research the ability to calibration the deepseekv2 model as if it was in moe training mode. This means that the model performs a forward pass with all experts instead of just the top k.
- Using the moe_top_k_activation option, the user can choose between using all experts to compute outputs (matching train-time activations) or just the top-k (matching inference-time activations)

class DeepseekV2MoE(nn.Module):
    def forward(self, hidden_states):
        ...
        if True:  #if self.training:
            hidden_states = hidden_states.repeat_interleave(
                self.num_experts_per_tok, dim=0
            )
            y = torch.empty_like(hidden_states)
            for i, expert in enumerate(self.experts):
                y[flat_topk_idx == i] = expert(hidden_states[flat_topk_idx == i])
            y = (y.view(*topk_weight.shape, -1) * topk_weight.unsqueeze(-1)).sum(dim=1)
            y = y.to(hidden_states.dtype).view(*orig_shape)
            y = AddAuxiliaryLoss.apply(y, aux_loss)
        # TRACING: Give option to calibrate with top_k experts, as if in inference time
        #else:
        if self.config.moe_top_k_activation:
            y = self.moe_infer(hidden_states, topk_idx, topk_weight).view(*orig_shape)
        if self.config.n_shared_experts is not None:
            y = y + self.shared_experts(identity)
        return y

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

github-actions · 2025-01-14T20:07:56Z

👋 Hi! Thank you for contributing to llm-compressor. Please add the ready label when the PR is ready for review.

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

modify model definition

74ac982

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

markurtz marked this pull request as ready for review January 14, 2025 20:19

kylesayrs added 2 commits January 14, 2025 20:53

add moe_top_k_activation

2d4791c

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

apply style

a28f231

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

dsikka marked this pull request as draft January 23, 2025 12:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Research] Custom Deepseek Routing #1070

[Research] Custom Deepseek Routing #1070

kylesayrs commented Jan 14, 2025 •

edited

Loading

github-actions bot commented Jan 14, 2025

[Research] Custom Deepseek Routing #1070

Are you sure you want to change the base?

[Research] Custom Deepseek Routing #1070

Conversation

kylesayrs commented Jan 14, 2025 • edited Loading

Purpose

github-actions bot commented Jan 14, 2025

kylesayrs commented Jan 14, 2025 •

edited

Loading