Support loading BF16 models in Cuda on T4 #1581

ltouati · 2024-01-13T15:35:02Z

NVidia T4 does not support B16 floating types natively. Add a method to cast BF16 to F32 or F16 so that mistral can be loaded on T4

…natively

Narsil

I'm personnally not a big fan of all these hoops.

supports_native_bf16 as a function could be interesting (so users could create logics as presented in this PR), but it probably doesn't need to launch an actual kernel for it.

I fail to see the need for the extra copy pasted kernel. Maybe we could just relax the CUDA_ARCH__ >= 800 instead of creating a new function ?

if we implement supports_native_bf16 we need to make it correct on other devices too (can be done in a followup)

Narsil · 2024-01-15T15:45:03Z

candle-core/src/cuda_backend.rs

+        let elem_count = 1;
+        let cfg = LaunchConfig::for_num_elems(elem_count as u32);
+        let data = unsafe { self.alloc::<u8>(elem_count) }.w()?;
+        let func = self.get_or_load_func("support_native_bf16", kernels::CAST)?;
+        let v: u8 = 0;
+        let params = (&data, v, elem_count);
+        unsafe { func.launch(cfg, params) }.w()?;
+
+        let ret = self.dtoh_sync_copy(&data).w()?;
+        match ret.first() {
+            None => Ok(false),
+            Some(r) => Ok(*r == 1u8),
+        }
+    }


Could we not launch an actual kernel ?

We could for instance define support_native_bf16 only on those targets and just check for the existence of the function.

Suggested change

let elem_count = 1;

let cfg = LaunchConfig::for_num_elems(elem_count as u32);

let data = unsafe { self.alloc::<u8>(elem_count) }.w()?;

let func = self.get_or_load_func("support_native_bf16", kernels::CAST)?;

let v: u8 = 0;

let params = (&data, v, elem_count);

unsafe { func.launch(cfg, params) }.w()?;

let ret = self.dtoh_sync_copy(&data).w()?;

match ret.first() {

None => Ok(false),

Some(r) => Ok(*r == 1u8),

}

}

self.get_or_load_func("support_native_bf16", kernels::CAST).is_ok()

}

Narsil · 2024-01-15T15:45:22Z

candle-core/src/device.rs

@@ -166,6 +166,13 @@ impl Device {
    pub fn is_cuda(&self) -> bool {
        matches!(self, Self::Cuda(_))
    }
+    pub fn support_native_bf16(&self) -> Result<bool> {
+        match self {
+            Self::Cpu => Ok(true),


Most CPU actually do no support bf16.

Narsil · 2024-01-15T15:45:41Z

candle-core/src/device.rs

+        match self {
+            Self::Cpu => Ok(true),
+            Self::Cuda(c) => c.support_native_bf16(),
+            Self::Metal(_) => Ok(true),


Only some metal versions support bf16

Narsil · 2024-01-15T15:46:32Z

candle-examples/examples/falcon/main.rs

@@ -172,8 +172,10 @@ fn main() -> Result<()> {
    let start = std::time::Instant::now();
    let dtype = if args.use_f32 {
        DType::F32
-    } else {
+    } else if device.support_native_bf16()? {


I think we should just switch that args into an actual enum --dtype float16.

Narsil · 2024-01-15T15:49:30Z

candle-kernels/src/cast.cu

+    }  \
+} \
+
+LOCAL_CAST_OP(__nv_bfloat16, float, cast_bf16_f32)


If i'm not mistaken this is undefined for some low enough cuda version. Also I fail to see the difference between this version and the one in >= 800 path.

Support loading BF16 models in cuda devices that do not support BF16 …

fe3ae92

…natively

Narsil reviewed Jan 15, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support loading BF16 models in Cuda on T4 #1581

Support loading BF16 models in Cuda on T4 #1581

ltouati commented Jan 13, 2024

Narsil left a comment

Narsil Jan 15, 2024

Narsil Jan 15, 2024

Narsil Jan 15, 2024

Narsil Jan 15, 2024

Narsil Jan 15, 2024

Support loading BF16 models in Cuda on T4 #1581

Are you sure you want to change the base?

Support loading BF16 models in Cuda on T4 #1581

Conversation

ltouati commented Jan 13, 2024

Narsil left a comment

Choose a reason for hiding this comment

Narsil Jan 15, 2024

Choose a reason for hiding this comment

Narsil Jan 15, 2024

Choose a reason for hiding this comment

Narsil Jan 15, 2024

Choose a reason for hiding this comment

Narsil Jan 15, 2024

Choose a reason for hiding this comment

Narsil Jan 15, 2024

Choose a reason for hiding this comment