Allow user to specify length of prompt #13

AlexWertheim · 2023-05-08T22:39:49Z

Updated description:

I've added an argument called prompt-len which the user can specify via the command line. The prompt is now set to be the sentence composed of prompt-len many copies of token 8 ("the").

I've also made the following modifications:

max_gen_len is now an exposed parameter with default 256
max_seq_len is still an exposed parameter, now with default 2048 (previously was 512)
total_len is now calculated via the same formula as in the original LLaMA repo, i.e.
total_len = min(params.max_seq_len, max_gen_len + max_prompt_size), where max_prompt_size is the size of the largest prompt in prompts.
The program now prints exactly how many tokens are generated, which is total_len-1. Unless you set max_seq_len to be less than max_gen_len+prompt_len+1, this should just be max_gen_len+prompt_len many tokens generated, since there is a beginning-of-sentence token added to the prompt in the decoding->encoding process.

cc @JackCaoG @miladm @Liyang90

Update:
Optimization for long prompts is also included.

JackCaoG · 2023-05-08T22:58:39Z

example_xla.py

@@ -96,9 +97,11 @@ def main(
        ckpt_dir, tokenizer_path, rank, world_size, max_seq_len, max_batch_size, dim, n_layers, n_heads
    )

-    prompts = [
+    prompts = [generator.tokenizer.decode(list(range(prompt_len)))]
+    print(prompts)


is this for debug only?

Yeah, I'll remove the print statement once all other comments are settled.

alanwaketan · 2023-05-09T17:41:36Z

example_xla.py

@@ -159,11 +163,12 @@ def mp_main(
    dim: int = 4096,
    n_layers: int = 32,
    n_heads: int = 32,
+    prompt_len: int = 6,


Do we know why the default is 6? Also, I believe that prompt_len has something to do with max_batch_size: https://github.com/pytorch-tpu/llama/blob/stable/llama/generation.py#L57.

So I'm not sure how this could work with bs=1... On the other hand, I'm not sure if this is even the right solution.

I set the default to 6 because I wanted the default to be the same number of input tokens as our previous prompt "I believe the meaning of life is", only I miscounted (that has 7 words), and there's not a 1-1 mapping between words and tokens necessarily, so it's still wrong regardless. We can change the default if the current one is not right - any suggestions on alternatives?

If I'm reading the code right, max_batch_size is related to the total number of prompts, not the number of tokens in each prompt. The goal of the prompt_len is to allow the user to specify a variable number of input tokens in a single prompt. (It is reasonable to point out that this does not currently support multiple prompts.) I'm not sure I understand your comments about bs = 1 or about whether this is the right solution. (Whether what is the right solution?) Could you please clarify?

Okay, now I get it. But from the discussion in gchat, do you still need this approach?

If you mean our discussion about max_seq_len and max_gen_len, this is something quite separate, right? This PR allows the user to modify the length of the input prompt. max_seq_len controls the size allocated for the output (and in our repo, the total number of tokens generated), and max_gen_len controls the number of tokens displayed. I think we still need this for the user to modify input - please let me know if you had some other discussion in mind.

- Made `max_gen_len` an exposed parameter - Set the default for `max_seq_len` to 2048 from 512 - Change `total_len` to be set to be max of `max_seq_len` and `max_gen_len+max_prompt_size`

alanwaketan

LGTM. Please update the user guide and the README in this branch.

- To avoid decoding->encoding errors, `prompts` is now set to be `prompt_len` many copies of the fixed 8th token ("the") - Removed a print debugging statement

Liyang90 · 2023-05-11T17:48:05Z

LGTM

Optimization for long prompt

JackCaoG · 2023-05-12T19:56:33Z

@AlexWertheim do you want to just merge this pr or you want to put it on hold for now

AlexWertheim · 2023-05-12T20:14:06Z

@AlexWertheim do you want to just merge this pr or you want to put it on hold for now

I'm fine either way. I think it'd be best to merge the changes into the stable branch, especially with Liyang's improvements now merged on top of mine. I think @miladm wanted to give a review though, so I was waiting on his feedback to merge.

llama/generation.py

miladm

Thanks. Can we add a few words on the details of our measurement methodology before and after this change?

llama/generation.py

miladm · 2023-05-15T13:52:37Z

example_xla.py


 def mp_main(
    mp: bool,
    tokenizer_path: str,
    temperature: float = 0.8,
    top_p: float = 0.95,
-    max_seq_len: int = 512,
+    max_seq_len: int = 2048,


can we add a comment to define max_seq_len, prompt_len, max_gen_len to clarify for the user in plain English?

Turn `temperature` and `top_p` into tensors

minor update

AlexWertheim added 4 commits May 8, 2023 20:57

Added argument for prompt generation of fixed token length

01408fb

Commented out old prompt

e2591a7

Cast Tuple to List when creating prompt string

ef0452e

Fix list cast, correct size

b721b83

JackCaoG reviewed May 8, 2023

View reviewed changes

alanwaketan reviewed May 9, 2023

View reviewed changes

Make max_gen_len an exposed parameter

8b27dfd

- Made `max_gen_len` an exposed parameter - Set the default for `max_seq_len` to 2048 from 512 - Change `total_len` to be set to be max of `max_seq_len` and `max_gen_len+max_prompt_size`

alanwaketan approved these changes May 10, 2023

View reviewed changes

AlexWertheim added 2 commits May 10, 2023 22:11

Reintroduced max_prompt_size

10d9c0b

Modified how prompts is generated

984fb68

- To avoid decoding->encoding errors, `prompts` is now set to be `prompt_len` many copies of the fixed 8th token ("the") - Removed a print debugging statement

Liyang90 added 9 commits May 11, 2023 21:02

bucketize_prompt_len

12a2c53

update

6a7c6f1

update

f58733f

update

c54802b

tmp test

ee3a349

tmp test

7b28736

clean up

4029650

adjust scale factor

74e120c

Merge pull request #15 from pytorch-tpu/liyanglu/bucketized_prompt_len

f61383e

Optimization for long prompt

miladm reviewed May 13, 2023

View reviewed changes

llama/generation.py Show resolved Hide resolved

miladm reviewed May 15, 2023

View reviewed changes

llama/generation.py Show resolved Hide resolved

miladm reviewed May 15, 2023

View reviewed changes

llama/generation.py Show resolved Hide resolved

miladm reviewed May 15, 2023

View reviewed changes

llama/generation.py Show resolved Hide resolved

miladm reviewed May 15, 2023

View reviewed changes

Liyang90 added 2 commits May 19, 2023 16:57

turn temperature and top_p into tensors

94f19e9

tmp test

d0cc999

Liyang90 added 11 commits May 19, 2023 17:42

tmp update

644c88b

update

a50045c

tmp experiment

a185dda

update

799ed7d

tmp test

36f17d8

update

7220c02

recover tmp changes

ddb7a5e

add comment

8ab9f48

Merge pull request #22 from pytorch-tpu/liyanglu/tensorfy_temp_top_p

d2fb888

Turn `temperature` and `top_p` into tensors

minor update

516351f

Merge pull request #23 from pytorch-tpu/liyanglu/tensorfy_temp_top_p

9222169

minor update

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow user to specify length of prompt #13

Allow user to specify length of prompt #13

AlexWertheim commented May 8, 2023 •

edited by Liyang90

Loading

JackCaoG May 8, 2023

AlexWertheim May 8, 2023

alanwaketan May 9, 2023

AlexWertheim May 9, 2023

alanwaketan May 9, 2023

AlexWertheim May 10, 2023

alanwaketan left a comment

Liyang90 commented May 11, 2023

JackCaoG commented May 12, 2023

AlexWertheim commented May 12, 2023

miladm left a comment

miladm May 15, 2023

Allow user to specify length of prompt #13

Are you sure you want to change the base?

Allow user to specify length of prompt #13

Conversation

AlexWertheim commented May 8, 2023 • edited by Liyang90 Loading

JackCaoG May 8, 2023

Choose a reason for hiding this comment

AlexWertheim May 8, 2023

Choose a reason for hiding this comment

alanwaketan May 9, 2023

Choose a reason for hiding this comment

AlexWertheim May 9, 2023

Choose a reason for hiding this comment

alanwaketan May 9, 2023

Choose a reason for hiding this comment

AlexWertheim May 10, 2023

Choose a reason for hiding this comment

alanwaketan left a comment

Choose a reason for hiding this comment

Liyang90 commented May 11, 2023

JackCaoG commented May 12, 2023

AlexWertheim commented May 12, 2023

miladm left a comment

Choose a reason for hiding this comment

miladm May 15, 2023

Choose a reason for hiding this comment

AlexWertheim commented May 8, 2023 •

edited by Liyang90

Loading