Add Hidden Size for DeepSpeed integration #23

infosechoudini · 2023-10-05T16:18:12Z

Utilizing DeepSpeed requires model.hidden_size to be available to use auto values in zero optimization for zero.reduce_bucket_size. I'm guessing that config.decoder_embed_dim is the hidden_size.

So we'd just need to add the following to model.init

    def __init__(self, config: RetNetConfig, embed_tokens: nn.Embedding = None):
        super().__init__(config)
        self.config = config

        self.dropout_module = torch.nn.Dropout(config.dropout)

        self.embed_dim = config.decoder_embed_dim
        self.embed_scale = 1.0 if config.no_scale_embedding else math.sqrt(self.embed_dim)

        ## NEW CODE FOR DEEPSPEED
        self.hidden_size = config.decoder_embed_dim
        ## NEW CODE FOR DEEPSPEED

The text was updated successfully, but these errors were encountered:

infosechoudini · 2023-10-05T16:26:09Z

Correction: it needs to be added to the configuration.json and configuration.py

syncdoth · 2023-10-08T06:43:38Z

Could you please make it a PR? I am not using DEEPSPEED at the moment and can't confirm where the changes must occur.

infosechoudini changed the title ~~Add Hidden Size~~ Add Hidden Size for DeepSpeed integration Oct 5, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Hidden Size for DeepSpeed integration #23

Add Hidden Size for DeepSpeed integration #23

infosechoudini commented Oct 5, 2023

infosechoudini commented Oct 5, 2023

syncdoth commented Oct 8, 2023

Add Hidden Size for DeepSpeed integration #23

Add Hidden Size for DeepSpeed integration #23

Comments

infosechoudini commented Oct 5, 2023

infosechoudini commented Oct 5, 2023

syncdoth commented Oct 8, 2023