Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resolve Speculative Decode RTE #823

Open
wants to merge 1 commit into
base: habana_main
Choose a base branch
from

Conversation

tannervoas742
Copy link

@tannervoas742 tannervoas742 commented Feb 13, 2025

  • Speculative decoding fails when batch size exceeds 1 due to incorrect handling of mixed speculative and non-speculative sequences in the batch.
  • This PR corrects batch expansion ordering and accounts for padding sequences.

Below is a reference to the line where we combine speculative and non-speculative. The order is clearly non-speculative first followed by speculative.

target_seq_group_metadata_list = non_spec_seqs + spec_expanded_seqs

Below is a reference to the line where we pad the batch with dummy sequences. These also must be accounted for.

batch_size_padded = self.bucketing_ctx.get_padded_batch_size(
real_batch_size, False)
batch_size_padding = batch_size_padded - real_batch_size
if batch_size_padding > 0:
encoder_seq_lens.extend(encoder_seq_lens[0]
for _ in range(batch_size_padding))

Below is the error that is encountered without this fix.
image

With this fix significantly higher throughputs can be achieved and accuracy is unimpacted (examined bert F1 accuracy; tested llama-3.1-8B with n-gram speculative decoding).

- Speculative decoding fails when batch size is greater than 1 due to
  incorrect handling of mixed speculative and non-speculative sequences
  in the batch.
- This PR corrects batch expansion ordering and accounts for padding
  sequences.

Signed-off-by: Voas, Tanner <tanner.voas@intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant