Deepspeed-Domino #929

zhangsmallshark · 2024-09-23T15:22:07Z

Hello team, Deepspeed-Domino contains all related files for Domino project.

GuanhuaWang · 2024-09-26T21:20:57Z

training/Deepspeed-Domino/domino/__init__.py

First thing, change name folder Deepspeed-Domino to DeepSpeed-Domino

GuanhuaWang · 2024-09-26T21:31:54Z

training/Deepspeed-Domino/domino/microbatches.py

are we using any function in this file? if not, delete it

GuanhuaWang · 2024-09-26T21:32:54Z

training/Deepspeed-Domino/domino/modules/.DS_Store

Again, please remove all ._DS_Store or irrelevant files.

GuanhuaWang · 2024-09-26T21:33:58Z

training/Deepspeed-Domino/domino/modules/distributed.py

+        return buffer_tensor
+
+
+class DistributedDataParallel(torch.nn.Module):


is this different from pytorch ddp? if so do we really need the diff part?

it is different from pytorch ddp

GuanhuaWang · 2024-09-26T21:35:42Z

training/Deepspeed-Domino/domino/modules/module.py

pytorch already support native fp32, fp16 dtype transfer, do we really need these?

we can use native function to replace this one.

GuanhuaWang · 2024-09-26T21:36:21Z

training/Deepspeed-Domino/domino/modules/utils.py

+        linear_layer.bias.zero_()
+    return linear_layer
+
+def param_is_not_shared(param):


are we supporting not_shared param group??

GuanhuaWang · 2024-09-26T21:40:04Z

training/Deepspeed-Domino/domino/tensor_parallel/random.py

+_MODEL_PARALLEL_RNG_TRACKER_NAME = 'model-parallel-rng'
+
+
+def _set_cuda_rng_state(new_state, device=-1):


are we using cuda RNG? I remember it cannot be used together with cudagraph, but can be used together if cudagraph not enabled.

we are using it. it cannot be used together with cudagraph.

GuanhuaWang · 2024-09-26T21:40:56Z

training/Deepspeed-Domino/domino/utils.py

+    return get_attr_wrapped_model(model, 'config', allow_none=False)
+
+
+def param_is_not_shared(param):


same question as above, do we support this "param not shared" feature?

GuanhuaWang · 2024-09-26T21:41:47Z

training/Deepspeed-Domino/domino/utils.py

+    return averaged_losses
+
+
+def _kernel_make_viewless_tensor(inp, requires_grad):


I am not sure, but I remember we discussed this before, make viewless tensor slower e2e time thus we disabled it? can @zhangsmallshark you confirm this?

I have command places where we call viewless functions. I will remove it.

GuanhuaWang · 2024-09-26T21:43:33Z

training/Deepspeed-Domino/pretrain_gpt.sh

+# export NCCL_SOCKET_NTHREADS=4
+# export NCCL_NSOCKS_PERTHREAD=8
+
+# cd /work/guanhua/domino


please clean up more thoroughly

GuanhuaWang

Thx @zhangsmallshark and @shenzheyu for great work.

added a few high level comments, we need to make loss and iter time both fixed! thx

GuanhuaWang · 2024-09-26T21:47:57Z

@zhangsmallshark , regarding to fix loss commit da0c63b

Maybe I miss something, but I don't see any real code change regarding to fwd/bwd/step. The only changes in this commit just add timers, comment some printout vals. idk how loss is fixed in this commit

chengming-zhang and others added 3 commits September 18, 2024 09:38

add domino

4f56482

use transformer from deepspeed

a6e0559

clean args

c348644

zhangsmallshark requested review from tjruwase, awan-10, eltonzheng, duli2012 and arashb as code owners September 23, 2024 15:22

chengming-zhang and others added 4 commits September 25, 2024 16:35

mega opt

034270a

add opt & timer

f867064

add opt

edab567

fix loss

da0c63b

GuanhuaWang reviewed Sep 26, 2024

View reviewed changes

folder name

069f638

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deepspeed-Domino #929

Deepspeed-Domino #929

zhangsmallshark commented Sep 23, 2024

GuanhuaWang Sep 26, 2024

zhangsmallshark Sep 30, 2024

GuanhuaWang Sep 26, 2024

zhangsmallshark Sep 30, 2024

GuanhuaWang Sep 26, 2024

zhangsmallshark Sep 30, 2024

GuanhuaWang Sep 26, 2024

zhangsmallshark Sep 30, 2024

GuanhuaWang Sep 26, 2024

zhangsmallshark Sep 30, 2024

GuanhuaWang Sep 26, 2024

GuanhuaWang Sep 26, 2024

zhangsmallshark Sep 30, 2024

GuanhuaWang Sep 26, 2024

GuanhuaWang Sep 26, 2024 •

edited

Loading

zhangsmallshark Sep 30, 2024

GuanhuaWang Sep 26, 2024

zhangsmallshark Sep 30, 2024

GuanhuaWang left a comment

GuanhuaWang commented Sep 26, 2024

		return buffer_tensor


		class DistributedDataParallel(torch.nn.Module):

		_MODEL_PARALLEL_RNG_TRACKER_NAME = 'model-parallel-rng'


		def _set_cuda_rng_state(new_state, device=-1):

		return get_attr_wrapped_model(model, 'config', allow_none=False)


		def param_is_not_shared(param):

		return averaged_losses


		def _kernel_make_viewless_tensor(inp, requires_grad):

Deepspeed-Domino #929

Are you sure you want to change the base?

Deepspeed-Domino #929

Conversation

zhangsmallshark commented Sep 23, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

GuanhuaWang Sep 26, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

GuanhuaWang left a comment

Choose a reason for hiding this comment

GuanhuaWang commented Sep 26, 2024

GuanhuaWang Sep 26, 2024 •

edited

Loading