Integrate with Mamba #2

jeromeku · 2024-01-13T18:28:18Z

Awesome work! Always appreciate the wonderful contributions of OSS advancing the frontiers of research.

I know you've done a number of experiments comparing various scan implementations in your other repo nanokitchen -- would it make sense to integrate accelerated-scan as an alternative backend to Mamba? Would be happy to work on this if you think it makes sense.

The text was updated successfully, but these errors were encountered:

proger · 2024-01-13T21:44:15Z

@jeromeku thank you for the kind words! Glad you checked out nanokitchen as well.

It would be indeed possible to use Accelerated Scan for Mamba as is, however would work best for experimental purposes — Mamba kernel is already designed to achieve the best performance for that architecture.

Concretely, Mamba kernel fuses cub::BlockScan with SSM state expansion operations: A matrix expands every gate dimension (a gate is called delta in Mamba, it's expected that those deltas are stored in log space) into a 16-dimensional SSM, B respectively expands every input token dimension to match gate expansion and C collapses every SSM back. Accelerated Scan will have to accept expanded SSMs as inputs and waste precious memory bandwidth.

Mamba's review gives a hint that memory footprint could be improved for that kernel — a good direction would be to understand why is that the case. Reference: https://openreview.net/forum?id=AL1fq05o7H&noteId=T6WJZb30sz

proger · 2024-01-13T22:38:59Z

I found that @srush has done this exact fusion of the SSM bits into the Triton forward kernel here: srush/annotated-mamba#1 (comment)

srush · 2024-01-13T22:52:13Z

Yeah thanks! Your repo was super helpful for that, we couldn't figure out how to do the two value scan.

Unfortunately I'm stuck now on the backwards. Need to do the scan right-to-left. I see that you do it by loading values in reverse order. Unfortunately we need to reverse the tensor in local memory (or repeat a lot of computation).

Any ideas? I think I might try making an LxL matrix and doing a dot? It seems like overkill, but I'm stuck for other methods.

jeromeku · 2024-01-14T01:55:05Z

@proger @srush
will take a closer look and report back...

proger · 2024-01-14T12:09:11Z

There's some discussion about making a reverse tl.associative_scan in triton-lang/triton#2930

srush · 2024-01-14T21:55:33Z

Yes, that's issue is from me as well.

The reading the memory reverse trick is nice in your codebase. The problem is that for Mamba, you need to run the backward scan on an intermediately calculated tensor that is too large to store. Therefore you need to either reverse it in memory or have a reverse associative scan.

jeromeku · 2024-01-19T00:01:24Z

@proger

Any luck integrating a reverse option to the Triton backend?

Trying to get up to speed with MLIR :)

srush · 2024-01-19T14:42:15Z

I sent them a PR for a flip function at the triton level which should be okay: triton-lang/triton#2954 Although would be interesting to do something more low-level

jeromeku · 2024-01-19T16:46:27Z

@srush

Thanks -- saw that PR and agree that a more low-level approach would be a worthwhile exercise. Always helps to understand how things work underneath the hood. MLIR is a bit of a beast.

FYI, this series of tutorials is a great intro to MLIR. Also, NVIDIA's cutlass library has similar abstractions (i.e., GEMM hierarchy) as triton, though triton is clearly more extensible to a wider variety of problems and backends.

Jokeren · 2024-01-19T17:18:31Z

It might be something similar to convert_layout from distribute to distribute in most cases. Feel free to take a look at relevant code.

Jokeren · 2024-01-19T17:19:00Z

I do think the current python solution is more elegant though.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Integrate with Mamba #2

Integrate with Mamba #2

jeromeku commented Jan 13, 2024

proger commented Jan 13, 2024

proger commented Jan 13, 2024

srush commented Jan 13, 2024

jeromeku commented Jan 14, 2024

proger commented Jan 14, 2024

srush commented Jan 14, 2024 •

edited

Loading

jeromeku commented Jan 19, 2024

srush commented Jan 19, 2024

jeromeku commented Jan 19, 2024

Jokeren commented Jan 19, 2024

Jokeren commented Jan 19, 2024

Integrate with Mamba #2

Integrate with Mamba #2

Comments

jeromeku commented Jan 13, 2024

proger commented Jan 13, 2024

proger commented Jan 13, 2024

srush commented Jan 13, 2024

jeromeku commented Jan 14, 2024

proger commented Jan 14, 2024

srush commented Jan 14, 2024 • edited Loading

jeromeku commented Jan 19, 2024

srush commented Jan 19, 2024

jeromeku commented Jan 19, 2024

Jokeren commented Jan 19, 2024

Jokeren commented Jan 19, 2024

srush commented Jan 14, 2024 •

edited

Loading