-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deepspeed-Domino #929
base: master
Are you sure you want to change the base?
Deepspeed-Domino #929
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First thing, change name folder Deepspeed-Domino to DeepSpeed-Domino
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are we using any function in this file? if not, delete it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, please remove all ._DS_Store or irrelevant files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed
return buffer_tensor | ||
|
||
|
||
class DistributedDataParallel(torch.nn.Module): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this different from pytorch ddp? if so do we really need the diff part?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is different from pytorch ddp
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pytorch already support native fp32, fp16 dtype transfer, do we really need these?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can use native function to replace this one.
linear_layer.bias.zero_() | ||
return linear_layer | ||
|
||
def param_is_not_shared(param): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are we supporting not_shared param group??
_MODEL_PARALLEL_RNG_TRACKER_NAME = 'model-parallel-rng' | ||
|
||
|
||
def _set_cuda_rng_state(new_state, device=-1): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are we using cuda RNG? I remember it cannot be used together with cudagraph, but can be used together if cudagraph not enabled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we are using it. it cannot be used together with cudagraph.
return get_attr_wrapped_model(model, 'config', allow_none=False) | ||
|
||
|
||
def param_is_not_shared(param): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same question as above, do we support this "param not shared" feature?
return averaged_losses | ||
|
||
|
||
def _kernel_make_viewless_tensor(inp, requires_grad): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure, but I remember we discussed this before, make viewless tensor slower e2e time thus we disabled it? can @zhangsmallshark you confirm this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have command places where we call viewless functions. I will remove it.
# export NCCL_SOCKET_NTHREADS=4 | ||
# export NCCL_NSOCKS_PERTHREAD=8 | ||
|
||
# cd /work/guanhua/domino |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please clean up more thoroughly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thx @zhangsmallshark and @shenzheyu for great work.
added a few high level comments, we need to make loss and iter time both fixed! thx
@zhangsmallshark , regarding to fix loss commit da0c63b Maybe I miss something, but I don't see any real code change regarding to fwd/bwd/step. The only changes in this commit just add timers, comment some printout vals. idk how loss is fixed in this commit |
Hello team, Deepspeed-Domino contains all related files for Domino project.