Skip to content

Problems with all tasks execution #2

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
JohnConnor123 opened this issue Apr 2, 2025 · 6 comments
Open

Problems with all tasks execution #2

JohnConnor123 opened this issue Apr 2, 2025 · 6 comments

Comments

@JohnConnor123
Copy link

When running the bash run_tests.sh command from the evaluation folder, the test starts on context 250, and then hangs instead of moving to contexts 500, 1000, 2000,...

Here is the Traceback when the keyboard is interrupted:

Results saved at ./results/CL250/0408_T04_C02_twohop2/minimax-01_book_0408_T04_C02_twohop2_1743612171.json
100%|███████████████████████████████████████████████| 26/26 [00:00<00:00, 35.32it/s]
100%|███████████████████████████████████████████| 26/26 [00:00<00:00, 146378.39it/s]
Results saved at ./results/CL250/0408_T05_C02_twohop2/minimax-01_book_0408_T05_C02_twohop2_1743612175.json
100%|███████████████████████████████████████████████| 26/26 [00:00<00:00, 34.85it/s]
^[[A
^CTraceback (most recent call last):
  File "/home/calibri/experiments/RULER/LongContext/NoLiMa/evaluation/run_tests.py", line 113, in <module>
    tester.evaluate()
  File "/home/calibri/experiments/RULER/LongContext/NoLiMa/evaluation/async_evaluate.py", line 235, in evaluate
    responses = loop.run_until_complete(asyncio.gather(*async_tasks))
  File "/home/calibri/.pyenv/versions/3.10.16/lib/python3.10/asyncio/base_events.py", line 636, in run_until_complete
    self.run_forever()
  File "/home/calibri/.pyenv/versions/3.10.16/lib/python3.10/asyncio/base_events.py", line 603, in run_forever
    self._run_once()
  File "/home/calibri/.pyenv/versions/3.10.16/lib/python3.10/asyncio/base_events.py", line 1871, in _run_once
    event_list = self._selector.select(timeout)
  File "/home/calibri/.pyenv/versions/3.10.16/lib/python3.10/selectors.py", line 469, in select
    fd_event_list = self._selector.poll(timeout, max_ev)
KeyboardInterrupt
^[[A^CException ignored in: <module 'threading' from '/home/calibri/.pyenv/versions/3.10.16/lib/python3.10/threading.py'>
Traceback (most recent call last):
  File "/home/calibri/.pyenv/versions/3.10.16/lib/python3.10/threading.py", line 1567, in _shutdown
    lock.acquire()
KeyboardInterrupt:
@amodaresi
Copy link
Collaborator

Which model are you testing? Is it running locally (e.g. via vLLM) or served via a cloud-based API?

@JohnConnor123
Copy link
Author

JohnConnor123 commented Apr 4, 2025

Which model are you testing? Is it running locally (e.g. via vLLM) or served via a cloud-based API?

Qwen-0.5B-Instruct and llama3.1-8B-Instruct, running locally via vLLM, and MiniMaxAI/MiniMax-Text-01, running via OpenRouter (for MiniMaxAI/MiniMax-Text-01 I specified vLLM and changed the tokenizer to the repository from hugging face, so that it loaded without errors)

@JohnConnor123
Copy link
Author

Which model are you testing? Is it running locally (e.g. via vLLM) or served via a cloud-based API?

Is it possible to fix this bug?

@amodaresi
Copy link
Collaborator

Have you tried lowering down the timeout? (e.g. using 120 instead of 700 seconds; we opted for the larger number in longer contexts)

It is possible that the hang is just some API request failing and then ends up with retries with long timeouts.

@JohnConnor123
Copy link
Author

Have you tried lowering down the timeout? (e.g. using 120 instead of 700 seconds; we opted for the larger number in longer contexts)

NoLiMa/evaluation/model_configs/llama_3.3_70b.json

Line 9 in a02da41

"timeout": 700,
It is possible that the hang is just some API request failing and then ends up with retries with long timeouts.

So your solution is to change timeout from 700 to 120, right?

@amodaresi
Copy link
Collaborator

amodaresi commented Apr 10, 2025

Yes, exactly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants