Replies: 3 comments 14 replies
-
After I modified the machine.json file, the problem became like this: |
Beta Was this translation helpful? Give feedback.
-
What is the version of dpdispatcher? |
Beta Was this translation helpful? Give feedback.
-
|
Beta Was this translation helpful? Give feedback.
-
Hello, I have the following problem when I run in_bulk, I would like to ask how to solve it,.Thank you very much for your help.
2022-09-21 18:44:34,389 - INFO : info:check_all_finished: False
2022-09-21 18:44:34,390 - INFO : job: 7648d4bdfa62b5f62c3b9ea3794a2c7a97215c43 unsubmitted; submit it
2022-09-21 18:44:34,410 - INFO : job: 7648d4bdfa62b5f62c3b9ea3794a2c7a97215c43 submit; job_id is 33898
2022-09-21 18:45:05,517 - INFO : job: 7648d4bdfa62b5f62c3b9ea3794a2c7a97215c43 33898 terminated;fail_cout is 1; resubmitting job
2022-09-21 18:45:05,541 - INFO : job:7648d4bdfa62b5f62c3b9ea3794a2c7a97215c43 re-submit after terminated; new job_id is 33911
2022-09-21 18:45:05,613 - INFO : job:7648d4bdfa62b5f62c3b9ea3794a2c7a97215c43 job_id:33911 after re-submitting; the state now is 4
2022-09-21 18:45:35,717 - INFO : job: 7648d4bdfa62b5f62c3b9ea3794a2c7a97215c43 33911 terminated;fail_cout is 2; resubmitting job
2022-09-21 18:45:35,739 - INFO : job:7648d4bdfa62b5f62c3b9ea3794a2c7a97215c43 re-submit after terminated; new job_id is 33934
2022-09-21 18:45:35,815 - INFO : job:7648d4bdfa62b5f62c3b9ea3794a2c7a97215c43 job_id:33934 after re-submitting; the state now is 4
2022-09-21 18:46:05,919 - INFO : job: 7648d4bdfa62b5f62c3b9ea3794a2c7a97215c43 33934 terminated;fail_cout is 3; resubmitting job
Traceback (most recent call last):
File "/root/data/dpgen/lib/python3.8/site-packages/dpdispatcher/submission.py", line 215, in handle_unexpected_submission_state
job.handle_unexpected_job_state()
File "/root/data/dpgen/lib/python3.8/site-packages/dpdispatcher/submission.py", line 532, in handle_unexpected_job_state
raise RuntimeError(f"job:{self.job_hash} {self.job_id} failed {self.fail_count} times.job_detail:{self}")
RuntimeError: job:7648d4bdfa62b5f62c3b9ea3794a2c7a97215c43 33934 failed 3 times.job_detail:{'7648d4bdfa62b5f62c3b9ea3794a2c7a97215c43': {'job_task_list': [{'command': 'mpirun -n 4 vasplws', 'task_work_path': 'sys-0108-0108', 'forward_files': ['POSCAR', 'INCAR', 'POTCAR'], 'backward_files': ['OUTCAR', 'CONTCAR'], 'outlog': 'fp.log', 'errlog': 'fp.log'}], 'resources': {'number_node': 1, 'cpu_per_node': 4, 'gpu_per_node': 0, 'queue_name': 'CPU', 'group_size': 125, 'custom_flags': [], 'strategy': {'if_cuda_multi_devices': False}, 'para_deg': 1, 'module_unload_list': [], 'module_list': [], 'source_list': [], 'envs': {}, 'kwargs': {'gpu_usage': False}}, 'job_state': <JobStatus.terminated: 4>, 'job_id': 33934, 'fail_count': 3}}
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/root/data/dpgen/bin/dpgen", line 10, in
sys.exit(main())
File "/root/data/dpgen/lib/python3.8/site-packages/dpgen/main.py", line 175, in main
args.func(args)
File "/root/data/dpgen/lib/python3.8/site-packages/dpgen/data/gen.py", line 778, in gen_init_bulk
run_vasp_relax(jdata, mdata)
File "/root/data/dpgen/lib/python3.8/site-packages/dpgen/data/gen.py", line 637, in run_vasp_relax
submission.run_submission()
File "/root/data/dpgen/lib/python3.8/site-packages/dpdispatcher/submission.py", line 182, in run_submission
self.handle_unexpected_submission_state()
File "/root/data/dpgen/lib/python3.8/site-packages/dpdispatcher/submission.py", line 219, in handle_unexpected_submission_state
f"Meet errors will handle unexpected submission state.\n"
AttributeError: 'Submission' object has no attribute 'remote_root'
Beta Was this translation helpful? Give feedback.
All reactions