Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding the new feature of FPDT #441

Open
wants to merge 8 commits into
base: main
Choose a base branch
from
Open

Conversation

YJHMITWEB
Copy link

@YJHMITWEB YJHMITWEB commented Aug 29, 2024

FPDT can only be work with this version of DeepSpeed.

@delock
Copy link

delock commented Aug 30, 2024

Hi @YJHMITWEB , is FPDT referring to this paper? https://ui.adsabs.harvard.edu/abs/2023JARS...17b6510H/abstract

@tohtana
Copy link

tohtana commented Aug 30, 2024

@YJHMITWEB Do we need changes in gpt2-merge.txt / gpt2-vocab.json? I'm not sure if we should check them in.

@@ -349,9 +349,12 @@ def _warmup_jit_function():
dtype = torch.float32

# Warmup fused bias+gelu
seq_length = args.seq_length
if args.ds_sequence_parallel_fpdt:
seq_length = 8192
Copy link

@tohtana tohtana Aug 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you define this as another variable like "FPDT_SEQ_LEN" and give a description in a comment why we have this setting?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fixed by setting it to be ds_sequence_parallel_fpdt_chunk_size if FPDT is enabled.

@@ -32,7 +35,9 @@ def forward(self, max_seq_len, offset=0):
emb = torch.cat((freqs, freqs), dim=-1)
# emb [seq_length, .., dim]
from einops import rearrange
return rearrange(emb, 'n d -> n 1 1 d')
base = rearrange(emb, 'n d -> n 1 1 d')
Copy link

@inkcherry inkcherry Aug 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will this change the output when use --use-rotary-position-embeddings, in llama style model?
FYI https://github.com/microsoft/Megatron-DeepSpeed/blob/main/examples_deepspeed/pretrain_llama2_distributed.sh

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have tested both GPT and Llama models, this works well with both.

@samadejacobs
Copy link

Hi @YJHMITWEB , is FPDT referring to this paper? https://ui.adsabs.harvard.edu/abs/2023JARS...17b6510H/abstract

@delock, no, FPDT refers to this paper, aka Ulysses-Offload

@YJHMITWEB
Copy link
Author

Hi @YJHMITWEB , is FPDT referring to this paper? https://ui.adsabs.harvard.edu/abs/2023JARS...17b6510H/abstract

@delock, no, FPDT refers to this paper, aka Ulysses-Offload

Thanks @samadejacobs for pointing.

@YJHMITWEB
Copy link
Author

@microsoft-github-policy-service agree

@YJHMITWEB
Copy link
Author

@microsoft-github-policy-service agree

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants