Add code for evaluating pass @ k to inference_and_check #64

erictang000 · 2025-02-04T22:33:06Z

Fixes minor bug in perform_check and adds code for checking pass @ k metric for n > 1 samples.

For example if we run the following with a saved file DeepSeek-R1-Distill-Qwen-7B_aime_train_None_False_0_-1.json with n=128 examples per question

python inference_and_check.py --model deepseek-ai/DeepSeek-R1-Distill-Qwen-7B --task aime  --split train --max_tokens 32768   --inference --n 128  --temperatures 0.6 --tp 1 --check

We will get the following output now:

Temperature: [0.6]
Loaded 30 existing results.
Found 3840 responses requiring reject sampling...
Processing Reject Sampling: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3840/3840 [00:02<00:00, 1432.24it/s]
Final reject-sampling accuracy: 2052/3840
Actual accuracy: 0.534375
Final pass @ k:
k: 128, pass @ k: 90.0
k: 64, pass @ k: 84.999
k: 32, pass @ k: 82.379
k: 16, pass @ k: 80.524
k: 8, pass @ k: 78.576
k: 4, pass @ k: 74.26
k: 2, pass @ k: 65.496
k: 1, pass @ k: 53.438
Temperature 0.6 acc: 27/30 (0.9)

…to rayllm

… + ray functions)

… HEAD

erictang000 added 30 commits January 28, 2025 19:19

add rayllm batch path

bf56116

fix typo

fe0d518

temp make mmlu pro smaller

94e9009

update vllm version, rayllm config, add repartition

88ad60f

Add submodule repo

efc6a01

move evalworkload code to outside module

1fadec9

Update submodule to latest commit

cf6a9fd

remove [:4000] for mmlupro

ed759e7

updates to inference_and_save path

6429201

fix small issues

7d2f39e

disable n > 1 for inference and save rayllm path

91a7d5a

Merge branch 'rayllm' of https://github.com/erictang000/SkyThought in…

ea7694a

…to rayllm

inference_and_save works for n = 1 use_rayllm

f69406a

Remove submodule

461a43c

remove submodule and rename to pipeline

f47dcfa

remove unnecessary model_id

7a0ee7b

remove .gitmodules

ec1b192

rename main to pipeline

224bbab

add support for n > 1

d419fc3

remove old code

a92dfe4

fix unflatten logic for n > 1

bd27b8d

merge

972e6e5

finish merge stuff

00bfc5b

split small datasets

dd615ae

address some comments (add response object and add separate inference…

e02d0d9

… + ray functions)

small comment use_ray

c37cf6d

resolve some more comments

4b15c25

reduce workload

9ef28f9

changes

50e79a0

fix ProcessPoolExecutor + response ray sigsev bug

cc7dd8b

erictang000 added 4 commits February 4, 2025 12:34

Merge branch 'main' of https://github.com/erictang000/SkyThought into…

1b6843e

… HEAD

add comment

2045247

add pass @ k and minor fix to check code

a1be6c7

formatting

d52799e

erictang000 marked this pull request as draft February 4, 2025 22:33

erictang000 closed this Feb 4, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add code for evaluating pass @ k to inference_and_check #64

Add code for evaluating pass @ k to inference_and_check #64

erictang000 commented Feb 4, 2025

Add code for evaluating pass @ k to inference_and_check #64

Add code for evaluating pass @ k to inference_and_check #64

Conversation

erictang000 commented Feb 4, 2025