Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add to_device() to VarBuilder to load weights to a specific GPU #1388

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

mokeyish
Copy link
Contributor

@mokeyish mokeyish commented Nov 30, 2023

This makes it easier to load transformer blocks onto different GPUs.

fn load(vb: VarBuilder) {

  // ...

  let blocks: Vec<_> = (0..cfg.n_layers)
      .map(|i| {
          let dev_ordinal = i / per_device_layers;
          log::debug!("load block {i} into GPU{dev_ordinal}");

          let vb = vb.to_device(&Device::new_cuda(dev_ordinal).unwrap());

          Block::load(
              vb.pp(&format!("model.layers.{i}")),
              cache,
              &cfg,
              comm.clone(),
          )
          .unwrap()

      })
      .collect();

     // ...
}

@mokeyish
Copy link
Contributor Author

mokeyish commented Dec 2, 2023

@LaurentMazare Hi, there two checks failed because the credentials could not be obtained.

But what do you think of this feature? Or are there any other alternatives?

@LaurentMazare
Copy link
Collaborator

I'm not really sure about this, I'm a bit afraid this makes the VarBuilder harder to reason about, e.g. currently when using a VarBuilder backed by a VarMap the tensors stored in the map are on the device specified by the VarBuilder and can be returned on subsequent calls with the same name. With this change, if to_device was called on the VarBuilder it's a bit unclear what should be done, the current version would return the old tensors even if they are not on the same device, so we should at least check for this.

@mokeyish
Copy link
Contributor Author

mokeyish commented Dec 4, 2023

For some models, the memory of GPU is not enough to fit the entire model, so it needs to be divided by layers and placed on different GPU devices.

Because candle does not have model.to(device) in torch. If you want to put some layers in the transformer to a specific GPU, it will be more troublesome. But with this PR, we can

let vb4layer = vb.pp(&format!("transformer.layers.{i}")).to_device(device)?;

let block = BlockLayer::load(vb4layer)

Without this PR, we can only load the weight into the layer first, and then write a lot of code to_device one by one.

@LaurentMazare
Copy link
Collaborator

I agree that the use case you mentioned is not well covered at the moment. What I was just saying is that the changes proposed by this PR have some drawbacks that we have to be a bit more careful about - I would have to think a bit more about it to see if there is a good way around this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants