We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
llamafactory
[rank3]: Traceback (most recent call last): [rank3]: File "/home/hx/Qwen/Qwen2.5-VL-7B-machine/LLaMA-Factory/src/llamafactory/launcher.py", line 23, in <module> [rank3]: launch() [rank3]: File "/home/hx/Qwen/Qwen2.5-VL-7B-machine/LLaMA-Factory/src/llamafactory/launcher.py", line 19, in launch [rank3]: run_exp() [rank3]: File "/home/hx/Qwen/Qwen2.5-VL-7B-machine/LLaMA-Factory/src/llamafactory/train/tuner.py", line 107, in run_exp [rank3]: _training_function(config={"args": args, "callbacks": callbacks}) [rank3]: File "/home/hx/Qwen/Qwen2.5-VL-7B-machine/LLaMA-Factory/src/llamafactory/train/tuner.py", line 69, in _training_function [rank3]: run_sft(model_args, data_args, training_args, finetuning_args, generating_args, callbacks) [rank3]: File "/home/hx/Qwen/Qwen2.5-VL-7B-machine/LLaMA-Factory/src/llamafactory/train/sft/workflow.py", line 52, in run_sft [rank3]: model = load_model(tokenizer, model_args, finetuning_args, training_args.do_train) [rank3]: File "/home/hx/Qwen/Qwen2.5-VL-7B-machine/LLaMA-Factory/src/llamafactory/model/loader.py", line 135, in load_model [rank3]: model = load_unsloth_pretrained_model(config, model_args) [rank3]: File "/home/hx/Qwen/Qwen2.5-VL-7B-machine/LLaMA-Factory/src/llamafactory/model/model_utils/unsloth.py", line 55, in load_unsloth_pretrained_model [rank3]: model, _ = FastLanguageModel.from_pretrained(**unsloth_kwargs) [rank3]: File "/home/hx/miniconda3/envs/Qwen/lib/python3.10/site-packages/unsloth/models/loader.py", line 308, in from_pretrained [rank3]: return FastModel.from_pretrained( [rank3]: File "/home/hx/miniconda3/envs/Qwen/lib/python3.10/site-packages/unsloth/models/loader.py", line 714, in from_pretrained [rank3]: model, tokenizer = FastBaseModel.from_pretrained( [rank3]: File "/home/hx/miniconda3/envs/Qwen/lib/python3.10/site-packages/unsloth/models/vision.py", line 258, in from_pretrained [rank3]: model_type_arch = model_types[0] [rank3]: TypeError: 'NoneType' object is not subscriptable [rank2]: Traceback (most recent call last): [rank2]: File "/home/hx/Qwen/Qwen2.5-VL-7B-machine/LLaMA-Factory/src/llamafactory/launcher.py", line 23, in <module> [rank2]: launch() [rank2]: File "/home/hx/Qwen/Qwen2.5-VL-7B-machine/LLaMA-Factory/src/llamafactory/launcher.py", line 19, in launch [rank2]: run_exp() [rank2]: File "/home/hx/Qwen/Qwen2.5-VL-7B-machine/LLaMA-Factory/src/llamafactory/train/tuner.py", line 107, in run_exp [rank2]: _training_function(config={"args": args, "callbacks": callbacks}) [rank2]: File "/home/hx/Qwen/Qwen2.5-VL-7B-machine/LLaMA-Factory/src/llamafactory/train/tuner.py", line 69, in _training_function [rank2]: run_sft(model_args, data_args, training_args, finetuning_args, generating_args, callbacks) [rank2]: File "/home/hx/Qwen/Qwen2.5-VL-7B-machine/LLaMA-Factory/src/llamafactory/train/sft/workflow.py", line 52, in run_sft [rank2]: model = load_model(tokenizer, model_args, finetuning_args, training_args.do_train) [rank2]: File "/home/hx/Qwen/Qwen2.5-VL-7B-machine/LLaMA-Factory/src/llamafactory/model/loader.py", line 135, in load_model [rank2]: model = load_unsloth_pretrained_model(config, model_args) [rank2]: File "/home/hx/Qwen/Qwen2.5-VL-7B-machine/LLaMA-Factory/src/llamafactory/model/model_utils/unsloth.py", line 55, in load_unsloth_pretrained_model [rank2]: model, _ = FastLanguageModel.from_pretrained(**unsloth_kwargs) [rank2]: File "/home/hx/miniconda3/envs/Qwen/lib/python3.10/site-packages/unsloth/models/loader.py", line 308, in from_pretrained [rank2]: return FastModel.from_pretrained( [rank2]: File "/home/hx/miniconda3/envs/Qwen/lib/python3.10/site-packages/unsloth/models/loader.py", line 714, in from_pretrained [rank2]: model, tokenizer = FastBaseModel.from_pretrained( [rank2]: File "/home/hx/miniconda3/envs/Qwen/lib/python3.10/site-packages/unsloth/models/vision.py", line 258, in from_pretrained [rank2]: model_type_arch = model_types[0] [rank2]: TypeError: 'NoneType' object is not subscriptable Unsloth: WARNING `trust_remote_code` is True. Are you certain you want to do remote code execution? [rank1]: Traceback (most recent call last): [rank1]: File "/home/hx/Qwen/Qwen2.5-VL-7B-machine/LLaMA-Factory/src/llamafactory/launcher.py", line 23, in <module> [rank1]: launch() [rank1]: File "/home/hx/Qwen/Qwen2.5-VL-7B-machine/LLaMA-Factory/src/llamafactory/launcher.py", line 19, in launch [rank1]: run_exp() [rank1]: File "/home/hx/Qwen/Qwen2.5-VL-7B-machine/LLaMA-Factory/src/llamafactory/train/tuner.py", line 107, in run_exp [rank1]: _training_function(config={"args": args, "callbacks": callbacks}) [rank1]: File "/home/hx/Qwen/Qwen2.5-VL-7B-machine/LLaMA-Factory/src/llamafactory/train/tuner.py", line 69, in _training_function [rank1]: run_sft(model_args, data_args, training_args, finetuning_args, generating_args, callbacks) [rank1]: File "/home/hx/Qwen/Qwen2.5-VL-7B-machine/LLaMA-Factory/src/llamafactory/train/sft/workflow.py", line 52, in run_sft [rank1]: model = load_model(tokenizer, model_args, finetuning_args, training_args.do_train) [rank1]: File "/home/hx/Qwen/Qwen2.5-VL-7B-machine/LLaMA-Factory/src/llamafactory/model/loader.py", line 135, in load_model [rank1]: model = load_unsloth_pretrained_model(config, model_args) [rank1]: File "/home/hx/Qwen/Qwen2.5-VL-7B-machine/LLaMA-Factory/src/llamafactory/model/model_utils/unsloth.py", line 55, in load_unsloth_pretrained_model [rank1]: model, _ = FastLanguageModel.from_pretrained(**unsloth_kwargs) [rank1]: File "/home/hx/miniconda3/envs/Qwen/lib/python3.10/site-packages/unsloth/models/loader.py", line 308, in from_pretrained [rank1]: return FastModel.from_pretrained( [rank1]: File "/home/hx/miniconda3/envs/Qwen/lib/python3.10/site-packages/unsloth/models/loader.py", line 714, in from_pretrained [rank1]: model, tokenizer = FastBaseModel.from_pretrained( [rank1]: File "/home/hx/miniconda3/envs/Qwen/lib/python3.10/site-packages/unsloth/models/vision.py", line 258, in from_pretrained [rank1]: model_type_arch = model_types[0] [rank1]: TypeError: 'NoneType' object is not subscriptable [rank0]: Traceback (most recent call last): [rank0]: File "/home/hx/Qwen/Qwen2.5-VL-7B-machine/LLaMA-Factory/src/llamafactory/launcher.py", line 23, in <module> [rank0]: launch() [rank0]: File "/home/hx/Qwen/Qwen2.5-VL-7B-machine/LLaMA-Factory/src/llamafactory/launcher.py", line 19, in launch [rank0]: run_exp() [rank0]: File "/home/hx/Qwen/Qwen2.5-VL-7B-machine/LLaMA-Factory/src/llamafactory/train/tuner.py", line 107, in run_exp [rank0]: _training_function(config={"args": args, "callbacks": callbacks}) [rank0]: File "/home/hx/Qwen/Qwen2.5-VL-7B-machine/LLaMA-Factory/src/llamafactory/train/tuner.py", line 69, in _training_function [rank0]: run_sft(model_args, data_args, training_args, finetuning_args, generating_args, callbacks) [rank0]: File "/home/hx/Qwen/Qwen2.5-VL-7B-machine/LLaMA-Factory/src/llamafactory/train/sft/workflow.py", line 52, in run_sft [rank0]: model = load_model(tokenizer, model_args, finetuning_args, training_args.do_train) [rank0]: File "/home/hx/Qwen/Qwen2.5-VL-7B-machine/LLaMA-Factory/src/llamafactory/model/loader.py", line 135, in load_model [rank0]: model = load_unsloth_pretrained_model(config, model_args) [rank0]: File "/home/hx/Qwen/Qwen2.5-VL-7B-machine/LLaMA-Factory/src/llamafactory/model/model_utils/unsloth.py", line 55, in load_unsloth_pretrained_model [rank0]: model, _ = FastLanguageModel.from_pretrained(**unsloth_kwargs) [rank0]: File "/home/hx/miniconda3/envs/Qwen/lib/python3.10/site-packages/unsloth/models/loader.py", line 308, in from_pretrained [rank0]: return FastModel.from_pretrained( [rank0]: File "/home/hx/miniconda3/envs/Qwen/lib/python3.10/site-packages/unsloth/models/loader.py", line 714, in from_pretrained [rank0]: model, tokenizer = FastBaseModel.from_pretrained( [rank0]: File "/home/hx/miniconda3/envs/Qwen/lib/python3.10/site-packages/unsloth/models/vision.py", line 258, in from_pretrained [rank0]: model_type_arch = model_types[0] [rank0]: TypeError: 'NoneType' object is not subscriptable [rank0]:[W325 08:29:49.570195329 ProcessGroupNCCL.cpp:1496] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) W0325 08:29:51.881845 2541207 site-packages/torch/distributed/elastic/multiprocessing/api.py:897] Sending process 2541332 closing signal SIGTERM W0325 08:29:51.882242 2541207 site-packages/torch/distributed/elastic/multiprocessing/api.py:897] Sending process 2541333 closing signal SIGTERM W0325 08:29:51.883206 2541207 site-packages/torch/distributed/elastic/multiprocessing/api.py:897] Sending process 2541335 closing signal SIGTERM E0325 08:29:54.067641 2541207 site-packages/torch/distributed/elastic/multiprocessing/api.py:869] failed (exitcode: 1) local_rank: 2 (pid: 2541334) of binary: /home/hx/miniconda3/envs/Qwen/bin/python Traceback (most recent call last): File "/home/hx/miniconda3/envs/Qwen/bin/torchrun", line 8, in <module> sys.exit(main()) File "/home/hx/miniconda3/envs/Qwen/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 355, in wrapper return f(*args, **kwargs) File "/home/hx/miniconda3/envs/Qwen/lib/python3.10/site-packages/torch/distributed/run.py", line 918, in main run(args) File "/home/hx/miniconda3/envs/Qwen/lib/python3.10/site-packages/torch/distributed/run.py", line 909, in run elastic_launch( File "/home/hx/miniconda3/envs/Qwen/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 138, in __call__ return launch_agent(self._config, self._entrypoint, list(args)) File "/home/hx/miniconda3/envs/Qwen/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 269, in launch_agent raise ChildFailedError( torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /home/hx/Qwen/Qwen2.5-VL-7B-machine/LLaMA-Factory/src/llamafactory/launcher.py FAILED ------------------------------------------------------------ Failures: <NO_OTHER_FAILURES> ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2025-03-25_08:29:51 host : g1a6000 rank : 2 (local_rank: 2) exitcode : 1 (pid: 2541334) error_file: <N/A> traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html ============================================================
在训练时总是显存使用不均,所以选择使用unsloth加速,但是出现了现在的错误
The text was updated successfully, but these errors were encountered:
No branches or pull requests
Reminder
System Info
llamafactory
version: 0.9.3.dev0Reproduction
Others
在训练时总是显存使用不均,所以选择使用unsloth加速,但是出现了现在的错误
The text was updated successfully, but these errors were encountered: