Add gpu pallas flash attention for inference. Collobrate with tohaowu. #1292

jwyang-google · 2025-02-20T23:27:44Z

Reduce prefill time from 123ms to 77ms for llama70b on 8 H100 chips.

Description

Start with a short description of what the PR does and how this is a change from
the past.

The rest of the description includes relevant details and context, examples:

why is this change being made,
the problem being solved and any relevant context,
why this is a good solution,
some information about the specific implementation,
shortcomings of the solution and possible future improvements.

If the change fixes a bug or a Github issue, please include a link, e.g.,:
FIXES: b/123456
FIXES: #123456

Tests

Please describe how you tested this change, and include any instructions and/or
commands to reproduce.

Checklist

Before submitting this PR, please make sure (put X in square brackets):

I have performed a self-review of my code.
I have necessary comments in my code, particularly in hard-to-understand areas.
I have run end-to-end tests tests and provided workload links above if applicable.
I have made or will make corresponding changes to the doc if needed.

Reduce prefill time from 123ms to 77ms for llama70b on 8 H100 chips.

Add gpu pallas flash attention for inference. Collobrate with tohaowu.

9a32622

Reduce prefill time from 123ms to 77ms for llama70b on 8 H100 chips.

jwyang-google requested review from gobbleturk, khatwanimohit, bvandermoon, vipannalla and RissyRan as code owners February 20, 2025 23:27

tohaowu approved these changes Feb 20, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add gpu pallas flash attention for inference. Collobrate with tohaowu. #1292

Add gpu pallas flash attention for inference. Collobrate with tohaowu. #1292

jwyang-google commented Feb 20, 2025

Add gpu pallas flash attention for inference. Collobrate with tohaowu. #1292

Are you sure you want to change the base?

Add gpu pallas flash attention for inference. Collobrate with tohaowu. #1292

Conversation

jwyang-google commented Feb 20, 2025

Description

Tests

Checklist