Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reference experiment results #2

Merged
merged 1 commit into from
Aug 23, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
comment: 1x A100 SXM4 80GB
experiment: vllm_llama_3_70b_instruct_awq
experiment_hash: exp_hash_v1:7aa490
run_id: vllm_llama_3_70b_instruct_awq/2024-08-22_12-16-19_1x_a100_sxm4_80gb
slug: 1x_a100_sxm4_80gb
timestamp: 2024-08-22_12-16-19
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
Count to 1000, skip unpopular numbers: 5fa4c4a18a1534b96c2eb2c5a30f63da0237b338aebf745d27d3d73dbc8dedfa2aed7070799440ac37e8610f9dd4926371f77a98e79c50a2c8b5b583cbf7c86e
Describe justice system in UK vs USA in 2000-5000 words: 83c0ec6b7f37d53b798093724f72a40195572be308b65471e8d2aae18379ef79655233858eb842ebf73967b058c38685fbea9543a3d1b3b4f41684b5fd95eede
Describe schooling system in UK vs USA in 2000-5000 words: f5d13dd9ee6b6b0540bd3e4adf6baec37ff5d4dc3e1158344f5ab2c690880de0ac1263d3f2691d6b904271298ba0b023adf541ba2f7fb1add50ba27f7a67d3a1
Explain me some random problem for me in 2000-5000 words: 143fc78fb373d10e8b27bdc3bcd5a5a9b5154c8a9dfeb72102d610a87cf47d5cfeb7a4be0136bf0ba275e3fa46e8b6cfcbeb63af6c45714abcd2875bb7bd577c
Tell me entire history of USA: 210fa7578650d083ad35cae251f8ef272bdc61c35daa08eb27852b3ddc59262718300971b1ac9725c9ac08f63240a1a13845d6c853d2e08520567288d54b5518
Write a ballad. Pick a random theme.: 21c8744c38338c8e8c4a9f0efc580b9040d51837573924ef731180e7cc2fb21cb96968c901803abad6df1b4f035096ec0fc75339144f133c754a8303a3f378e3
Write an epic story about a dragon and a knight: 81ff9b82399502e2d3b0fd8f625d3c3f6141c4c179488a247c0c0cc3ccd77828f0920c3d8c03621dfe426e401f58820a6094db5f3786ab7f12bfb13d6224ef94
Write an essay about being a Senior developer.: 0921d5c3b2e04616dbb655e6ba4648911b9461a4ecdb0d435ebf190d903a92c20cf1343d98de65b6e9690f5e6b1c8f3bfc58e720168fa54dc0e293f0f595505c
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
2024-08-22 12:16:19,452 - __main__ - INFO - Starting experiment vllm_llama_3_70b_instruct_awq with comment: 1x A100 SXM4 80GB
2024-08-22 12:16:19,455 - __main__ - INFO - Local log file: /home/rooter/dev/bac/deterministic-ml/tests/integration/results/vllm_llama_3_70b_instruct_awq/2024-08-22_12-16-19_1x_a100_sxm4_80gb/run.local.log
2024-08-22 12:16:19,564 - paramiko.transport - INFO - Connected (version 2.0, client OpenSSH_8.9p1)
2024-08-22 12:16:19,769 - paramiko.transport - INFO - Auth banner: b'Welcome to vast.ai. If authentication fails, try again after a few seconds, and double check your ssh key.\nHave fun!\n'
2024-08-22 12:16:19,772 - paramiko.transport - INFO - Authentication (publickey) successful!
2024-08-22 12:16:19,774 - __main__ - INFO - Syncing files to remote
2024-08-22 12:16:19,961 - tools.ssh - INFO - Command: 'mkdir -p ~/experiments/vllm_llama_3_70b_instruct_awq/2024-08-22_12-16-19_1x_a100_sxm4_80gb/output' stdout: '' stderr: '' status_code: 0
2024-08-22 12:16:22,432 - __main__ - INFO - Setting up remote environment
2024-08-22 12:16:25,588 - tools.ssh - INFO - Command: '\n set -exo pipefail\n \n curl -LsSf https://astral.sh/uv/install.sh | sh\n export PATH=$HOME/.cargo/bin:$PATH\n \n cd ~/experiments/vllm_llama_3_70b_instruct_awq/2024-08-22_12-16-19_1x_a100_sxm4_80gb\n uv venv -p python3.11 --python-preference managed\n source .venv/bin/activate \n uv pip install ./deterministic_ml*.whl pyyaml -r vllm_llama_3_70b_instruct_awq/requirements.txt\n ' stdout: "installing to /root/.cargo/bin\n uv\n uvx\neverything's installed!\n" stderr: "+ curl -LsSf https://astral.sh/uv/install.sh\n+ sh\ndownloading uv 0.3.1 x86_64-unknown-linux-gnu\n+ export PATH=/root/.cargo/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin\n+ PATH=/root/.cargo/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin\n+ cd /root/experiments/vllm_llama_3_70b_instruct_awq/2024-08-22_12-16-19_1x_a100_sxm4_80gb\n+ uv venv -p python3.11 --python-preference managed\nUsing Python 3.11.9\nCreating virtualenv at: .venv\nActivate with: source .venv/bin/activate\n+ source .venv/bin/activate\n++ '[' -n x ']'\n++ SCRIPT_PATH=.venv/bin/activate\n++ '[' .venv/bin/activate = bash ']'\n++ deactivate nondestructive\n++ unset -f pydoc\n++ '[' -z '' ']'\n++ '[' -z '' ']'\n++ hash -r\n++ '[' -z '' ']'\n++ unset VIRTUAL_ENV\n++ unset VIRTUAL_ENV_PROMPT\n++ '[' '!' nondestructive = nondestructive ']'\n++ VIRTUAL_ENV=/root/experiments/vllm_llama_3_70b_instruct_awq/2024-08-22_12-16-19_1x_a100_sxm4_80gb/.venv\n++ '[' linux-gnu = cygwin ']'\n++ '[' linux-gnu = msys ']'\n++ export VIRTUAL_ENV\n++ _OLD_VIRTUAL_PATH=/root/.cargo/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin\n++ PATH=/root/experiments/vllm_llama_3_70b_instruct_awq/2024-08-22_12-16-19_1x_a100_sxm4_80gb/.venv/bin:/root/.cargo/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin\n++ export PATH\n++ '[' x2024-08-22_12-16-19_1x_a100_sxm4_80gb '!=' x ']'\n++ VIRTUAL_ENV_PROMPT=2024-08-22_12-16-19_1x_a100_sxm4_80gb\n++ export VIRTUAL_ENV_PROMPT\n++ '[' -z '' ']'\n++ '[' -z '' ']'\n++ _OLD_VIRTUAL_PS1=\n++ PS1='(2024-08-22_12-16-19_1x_a100_sxm4_80gb) '\n++ export PS1\n++ alias pydoc\n++ true\n++ hash -r\n+ uv pip install ./deterministic_ml-0.1.dev2+g218f083.d20240822-py3-none-any.whl pyyaml -r vllm_llama_3_70b_instruct_awq/requirements.txt\nResolved 108 packages in 57ms\nPrepared 1 package in 2ms\nInstalled 108 packages in 473ms\n + aiohappyeyeballs==2.4.0\n + aiohttp==3.10.5\n + aiosignal==1.3.1\n + annotated-types==0.7.0\n + anyio==4.4.0\n + attrs==24.2.0\n + certifi==2024.7.4\n + charset-normalizer==3.3.2\n + click==8.1.7\n + cloudpickle==3.0.0\n + cmake==3.30.2\n + datasets==2.21.0\n + deterministic-ml==0.1.dev2+g218f083.d20240822 (from file:///root/experiments/vllm_llama_3_70b_instruct_awq/2024-08-22_12-16-19_1x_a100_sxm4_80gb/deterministic_ml-0.1.dev2+g218f083.d20240822-py3-none-any.whl)\n + dill==0.3.8\n + diskcache==5.6.3\n + distro==1.9.0\n + fastapi==0.112.1\n + filelock==3.15.4\n + frozenlist==1.4.1\n + fsspec==2024.6.1\n + h11==0.14.0\n + httpcore==1.0.5\n + httptools==0.6.1\n + httpx==0.27.0\n + huggingface-hub==0.24.6\n + idna==3.7\n + interegular==0.3.3\n + jinja2==3.1.4\n + jiter==0.5.0\n + jsonschema==4.23.0\n + jsonschema-specifications==2023.12.1\n + lark==1.2.2\n + llvmlite==0.43.0\n + lm-format-enforcer==0.10.3\n + markupsafe==2.1.5\n + mpmath==1.3.0\n + msgpack==1.0.8\n + multidict==6.0.5\n + multiprocess==0.70.16\n + nest-asyncio==1.6.0\n + networkx==3.3\n + ninja==1.11.1.1\n + numba==0.60.0\n + numpy==1.26.4\n + nvidia-cublas-cu12==12.1.3.1\n + nvidia-cuda-cupti-cu12==12.1.105\n + nvidia-cuda-nvrtc-cu12==12.1.105\n + nvidia-cuda-runtime-cu12==12.1.105\n + nvidia-cudnn-cu12==9.1.0.70\n + nvidia-cufft-cu12==11.0.2.54\n + nvidia-curand-cu12==10.3.2.106\n + nvidia-cusolver-cu12==11.4.5.107\n + nvidia-cusparse-cu12==12.1.0.106\n + nvidia-ml-py==12.560.30\n + nvidia-nccl-cu12==2.20.5\n + nvidia-nvjitlink-cu12==12.6.20\n + nvidia-nvtx-cu12==12.1.105\n + openai==1.42.0\n + outlines==0.0.46\n + packaging==24.1\n + pandas==2.2.2\n + pillow==10.4.0\n + prometheus-client==0.20.0\n + prometheus-fastapi-instrumentator==7.0.0\n + protobuf==5.27.3\n + psutil==6.0.0\n + py-cpuinfo==9.0.0\n + pyairports==2.1.1\n + pyarrow==17.0.0\n + pycountry==24.6.1\n + pydantic==2.8.2\n + pydantic-core==2.20.1\n + python-dateutil==2.9.0.post0\n + python-dotenv==1.0.1\n + pytz==2024.1\n + pyyaml==6.0.2\n + pyzmq==26.2.0\n + ray==2.34.0\n + referencing==0.35.1\n + regex==2024.7.24\n + requests==2.32.3\n + rpds-py==0.20.0\n + safetensors==0.4.4\n + sentencepiece==0.2.0\n + setuptools==73.0.1\n + six==1.16.0\n + sniffio==1.3.1\n + starlette==0.38.2\n + sympy==1.13.2\n + tiktoken==0.7.0\n + tokenizers==0.19.1\n + torch==2.4.0\n + torchvision==0.19.0\n + tqdm==4.66.5\n + transformers==4.44.1\n + triton==3.0.0\n + typing-extensions==4.12.2\n + tzdata==2024.1\n + urllib3==2.2.2\n + uvicorn==0.30.6\n + uvloop==0.20.0\n + vllm==0.5.4\n + vllm-flash-attn==2.6.1\n + watchfiles==0.23.0\n + websockets==13.0\n + xformers==0.0.27.post2\n + xxhash==3.5.0\n + yarl==1.9.4\n" status_code: 0
2024-08-22 12:16:25,608 - __main__ - INFO - Gathering system info
2024-08-22 12:16:28,471 - tools.ssh - INFO - Command: '\n set -exo pipefail\n \n cd ~/experiments/vllm_llama_3_70b_instruct_awq/2024-08-22_12-16-19_1x_a100_sxm4_80gb\n export PATH=$HOME/.cargo/bin:$PATH\n source .venv/bin/activate;\n python -m deterministic_ml._internal.sysinfo > ~/experiments/vllm_llama_3_70b_instruct_awq/2024-08-22_12-16-19_1x_a100_sxm4_80gb/output/sysinfo.yaml' stdout: '' stderr: "+ cd /root/experiments/vllm_llama_3_70b_instruct_awq/2024-08-22_12-16-19_1x_a100_sxm4_80gb\n+ export PATH=/root/.cargo/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin\n+ PATH=/root/.cargo/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin\n+ source .venv/bin/activate\n++ '[' -n x ']'\n++ SCRIPT_PATH=.venv/bin/activate\n++ '[' .venv/bin/activate = bash ']'\n++ deactivate nondestructive\n++ unset -f pydoc\n++ '[' -z '' ']'\n++ '[' -z '' ']'\n++ hash -r\n++ '[' -z '' ']'\n++ unset VIRTUAL_ENV\n++ unset VIRTUAL_ENV_PROMPT\n++ '[' '!' nondestructive = nondestructive ']'\n++ VIRTUAL_ENV=/root/experiments/vllm_llama_3_70b_instruct_awq/2024-08-22_12-16-19_1x_a100_sxm4_80gb/.venv\n++ '[' linux-gnu = cygwin ']'\n++ '[' linux-gnu = msys ']'\n++ export VIRTUAL_ENV\n++ _OLD_VIRTUAL_PATH=/root/.cargo/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin\n++ PATH=/root/experiments/vllm_llama_3_70b_instruct_awq/2024-08-22_12-16-19_1x_a100_sxm4_80gb/.venv/bin:/root/.cargo/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin\n++ export PATH\n++ '[' x2024-08-22_12-16-19_1x_a100_sxm4_80gb '!=' x ']'\n++ VIRTUAL_ENV_PROMPT=2024-08-22_12-16-19_1x_a100_sxm4_80gb\n++ export VIRTUAL_ENV_PROMPT\n++ '[' -z '' ']'\n++ '[' -z '' ']'\n++ _OLD_VIRTUAL_PS1=\n++ PS1='(2024-08-22_12-16-19_1x_a100_sxm4_80gb) '\n++ export PS1\n++ alias pydoc\n++ true\n++ hash -r\n+ python -m deterministic_ml._internal.sysinfo\n" status_code: 0
2024-08-22 12:16:28,485 - __main__ - INFO - Running experiment code on remote
2024-08-22 12:20:56,768 - tools.ssh - INFO - Command: '\n set -exo pipefail\n \n cd ~/experiments/vllm_llama_3_70b_instruct_awq/2024-08-22_12-16-19_1x_a100_sxm4_80gb\n export PATH=$HOME/.cargo/bin:$PATH\n source .venv/bin/activate;\n python -m vllm_llama_3_70b_instruct_awq ~/experiments/vllm_llama_3_70b_instruct_awq/2024-08-22_12-16-19_1x_a100_sxm4_80gb/output | tee ~/experiments/vllm_llama_3_70b_instruct_awq/2024-08-22_12-16-19_1x_a100_sxm4_80gb/output/stdout.txt' stdout: "gpu_count=1\nStarting model loading\nINFO 08-22 10:16:34 awq_marlin.py:89] The model is convertible to awq_marlin during runtime. Using awq_marlin kernel.\nINFO 08-22 10:16:34 llm_engine.py:174] Initializing an LLM engine (v0.5.4) with config: model='casperhansen/llama-3-70b-instruct-awq', speculative_config=None, tokenizer='casperhansen/llama-3-70b-instruct-awq', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, rope_scaling=None, rope_theta=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.float16, max_seq_len=8192, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=1, pipeline_parallel_size=1, disable_custom_all_reduce=False, quantization=awq_marlin, enforce_eager=True, kv_cache_dtype=auto, quantization_param_path=None, device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='outlines'), observability_config=ObservabilityConfig(otlp_traces_endpoint=None), seed=0, served_model_name=casperhansen/llama-3-70b-instruct-awq, use_v2_block_manager=False, enable_prefix_caching=False)\nINFO 08-22 10:16:35 model_runner.py:720] Starting to load model casperhansen/llama-3-70b-instruct-awq...\nINFO 08-22 10:16:36 weight_utils.py:225] Using model weights format ['*.safetensors']\nINFO 08-22 10:17:10 model_runner.py:732] Loading model weights took 37.0561 GB\nINFO 08-22 10:17:16 gpu_executor.py:102] # GPU blocks: 6068, # CPU blocks: 819\nmodel loading took 46.38 seconds\nStarting 8 responses generation\n8 responses generation took 213.59 seconds\n{'Count to 1000, skip unpopular numbers': '5fa4c4a18a1534b96c2eb2c5a30f63da0237b338aebf745d27d3d73dbc8dedfa2aed7070799440ac37e8610f9dd4926371f77a98e79c50a2c8b5b583cbf7c86e',\n 'Describe justice system in UK vs USA in 2000-5000 words': '83c0ec6b7f37d53b798093724f72a40195572be308b65471e8d2aae18379ef79655233858eb842ebf73967b058c38685fbea9543a3d1b3b4f41684b5fd95eede',\n 'Describe schooling system in UK vs USA in 2000-5000 words': 'f5d13dd9ee6b6b0540bd3e4adf6baec37ff5d4dc3e1158344f5ab2c690880de0ac1263d3f2691d6b904271298ba0b023adf541ba2f7fb1add50ba27f7a67d3a1',\n 'Explain me some random problem for me in 2000-5000 words': '143fc78fb373d10e8b27bdc3bcd5a5a9b5154c8a9dfeb72102d610a87cf47d5cfeb7a4be0136bf0ba275e3fa46e8b6cfcbeb63af6c45714abcd2875bb7bd577c',\n 'Tell me entire history of USA': '210fa7578650d083ad35cae251f8ef272bdc61c35daa08eb27852b3ddc59262718300971b1ac9725c9ac08f63240a1a13845d6c853d2e08520567288d54b5518',\n 'Write a ballad. Pick a random theme.': '21c8744c38338c8e8c4a9f0efc580b9040d51837573924ef731180e7cc2fb21cb96968c901803abad6df1b4f035096ec0fc75339144f133c754a8303a3f378e3',\n 'Write an epic story about a dragon and a knight': '81ff9b82399502e2d3b0fd8f625d3c3f6141c4c179488a247c0c0cc3ccd77828f0920c3d8c03621dfe426e401f58820a6094db5f3786ab7f12bfb13d6224ef94',\n 'Write an essay about being a Senior developer.': '0921d5c3b2e04616dbb655e6ba4648911b9461a4ecdb0d435ebf190d903a92c20cf1343d98de65b6e9690f5e6b1c8f3bfc58e720168fa54dc0e293f0f595505c'}\n" stderr: "+ cd /root/experiments/vllm_llama_3_70b_instruct_awq/2024-08-22_12-16-19_1x_a100_sxm4_80gb\n+ export PATH=/root/.cargo/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin\n+ PATH=/root/.cargo/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin\n+ source .venv/bin/activate\n++ '[' -n x ']'\n++ SCRIPT_PATH=.venv/bin/activate\n++ '[' .venv/bin/activate = bash ']'\n++ deactivate nondestructive\n++ unset -f pydoc\n++ '[' -z '' ']'\n++ '[' -z '' ']'\n++ hash -r\n++ '[' -z '' ']'\n++ unset VIRTUAL_ENV\n++ unset VIRTUAL_ENV_PROMPT\n++ '[' '!' nondestructive = nondestructive ']'\n++ VIRTUAL_ENV=/root/experiments/vllm_llama_3_70b_instruct_awq/2024-08-22_12-16-19_1x_a100_sxm4_80gb/.venv\n++ '[' linux-gnu = cygwin ']'\n++ '[' linux-gnu = msys ']'\n++ export VIRTUAL_ENV\n++ _OLD_VIRTUAL_PATH=/root/.cargo/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin\n++ PATH=/root/experiments/vllm_llama_3_70b_instruct_awq/2024-08-22_12-16-19_1x_a100_sxm4_80gb/.venv/bin:/root/.cargo/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin\n++ export PATH\n++ '[' x2024-08-22_12-16-19_1x_a100_sxm4_80gb '!=' x ']'\n++ VIRTUAL_ENV_PROMPT=2024-08-22_12-16-19_1x_a100_sxm4_80gb\n++ export VIRTUAL_ENV_PROMPT\n++ '[' -z '' ']'\n++ '[' -z '' ']'\n++ _OLD_VIRTUAL_PS1=\n++ PS1='(2024-08-22_12-16-19_1x_a100_sxm4_80gb) '\n++ export PS1\n++ alias pydoc\n++ true\n++ hash -r\n+ python -m vllm_llama_3_70b_instruct_awq /root/experiments/vllm_llama_3_70b_instruct_awq/2024-08-22_12-16-19_1x_a100_sxm4_80gb/output\n+ tee /root/experiments/vllm_llama_3_70b_instruct_awq/2024-08-22_12-16-19_1x_a100_sxm4_80gb/output/stdout.txt\n\rLoading safetensors checkpoint shards: 0% Completed | 0/9 [00:00<?, ?it/s]\n\rLoading safetensors checkpoint shards: 11% Completed | 1/9 [00:01<00:12, 1.55s/it]\n\rLoading safetensors checkpoint shards: 22% Completed | 2/9 [00:03<00:14, 2.06s/it]\n\rLoading safetensors checkpoint shards: 33% Completed | 3/9 [00:06<00:13, 2.31s/it]\n\rLoading safetensors checkpoint shards: 44% Completed | 4/9 [00:09<00:12, 2.42s/it]\n\rLoading safetensors checkpoint shards: 56% Completed | 5/9 [00:11<00:09, 2.44s/it]\n\rLoading safetensors checkpoint shards: 67% Completed | 6/9 [00:15<00:08, 2.97s/it]\n\rLoading safetensors checkpoint shards: 78% Completed | 7/9 [00:17<00:05, 2.72s/it]\n\rLoading safetensors checkpoint shards: 89% Completed | 8/9 [00:21<00:03, 3.06s/it]\n\rLoading safetensors checkpoint shards: 100% Completed | 9/9 [00:22<00:00, 2.40s/it]\n\rLoading safetensors checkpoint shards: 100% Completed | 9/9 [00:22<00:00, 2.51s/it]\n\n/root/experiments/vllm_llama_3_70b_instruct_awq/2024-08-22_12-16-19_1x_a100_sxm4_80gb/.venv/lib/python3.11/site-packages/vllm/model_executor/layers/sampler.py:287: UserWarning: cumsum_cuda_kernel does not have a deterministic implementation, but you set 'torch.use_deterministic_algorithms(True, warn_only=True)'. You can file an issue at https://github.com/pytorch/pytorch/issues to help us prioritize adding deterministic support for this operation. (Triggered internally at ../aten/src/ATen/Context.cpp:83.)\n probs_sum = probs_sort.cumsum(dim=-1)\n\rProcessed prompts: 0%| | 0/8 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s]\rProcessed prompts: 12%|█▎ | 1/8 [03:33<24:55, 213.59s/it, est. speed input: 0.15 toks/s, output: 19.18 toks/s]\rProcessed prompts: 100%|██████████| 8/8 [03:33<00:00, 26.70s/it, est. speed input: 1.32 toks/s, output: 153.42 toks/s]\n" status_code: 0
2024-08-22 12:20:56,801 - __main__ - INFO - Syncing output back to local
2024-08-22 12:20:57,304 - __main__ - INFO - Done
Loading