-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multi species #724
Multi species #724
Conversation
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## main #724 +/- ##
==========================================
- Coverage 73.71% 73.67% -0.04%
==========================================
Files 99 99
Lines 26992 27180 +188
Branches 5620 5687 +67
==========================================
+ Hits 19896 20025 +129
- Misses 5698 5757 +59
Partials 1398 1398
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JintaoWu98, thanks for this first major contribution to ARC!
The PR looks good and well thought of. I added comments, most of them on improving writing style, some with questions.
arc/scheduler.py
Outdated
@@ -419,7 +422,10 @@ def __init__(self, | |||
# conformers weren't asked for, assign initial_xyz | |||
species.initial_xyz = species.conformers[0] | |||
if species.label not in self.running_jobs: | |||
self.running_jobs[species.label] = list() # initialize before running the first job | |||
if not species.multi_species: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can change these 4 lines into a more compact form:
self.running_jobs[species.label if not species.multi_species else species.multi_species] = list()
arc/job/adapters/gaussian.py
Outdated
if self.run_multi_species == False: | ||
input_dict['xyz'] = xyz_to_str(self.xyz) | ||
else: | ||
input_dict['xyz'] = list() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can simplify these five lines as:
input_dict['xyz'] = [xyz_to_str(xyz) for xyz in self.xyz] if self.run_multi_species else xyz_to_str(self.xyz)
To be extra careful, we can also change the above else
to elif isinstance(xyz, list):
species = self.species_dict[label] | ||
elif isinstance(label, list): | ||
species = [spc for spc in self.species_list if spc.label in label] | ||
run_multi_species = all([spc.multi_species is not None for spc in species]) if isinstance(species, list) else False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we can strengthen this check by making sure all multi_species
clusters are the same string:
run_multi_species = all([spc.multi_species == spc.multi_species[0] for spc in species]) and species[0].multi_species is not None if isinstance(species, list) and len(species) else False
I hope it's not too complicated, but it should be more robust
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the advice, I agree with you this idea can make sure all multi_species
clusters are the same string, but I have two questions here.
1.
For the all
function,
all([spc.multi_species == spc.multi_species[0] for spc in species])
should it be the following instead
all([spc.multi_species == species[0].multi_species for spc in species])
2.
If we only focus on the run_opt_job
, where we have the following lines:
if self.species_dict[label].multi_species:
##Some code##
label = [species.label for species in self.species_list
if species.multi_species == self.species_dict[label].multi_species]
self.run_job(label=label,
xyz=self.species_dict[label].initial_xyz if isinstance(label, str) else None,
level_of_theory=self.opt_level,
job_type='opt',
fine=fine)
The label
argument is already been filtered to make sure multi_species
clusters are the same string. Then I think we don't really need this augmented version.
On the other hand, if we want to look somewhere else run_job
is being called, this is necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was this addressed?
arc/scheduler.py
Outdated
@@ -1419,12 +1438,12 @@ def spawn_post_opt_jobs(self, | |||
return None | |||
|
|||
# Spawn IRC if requested and if relevant. | |||
if self.job_types['irc'] and self.species_dict[label].is_ts: | |||
if label in self.output and self.job_types['irc'] and self.species_dict[label].is_ts: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is OK, but perhaps we can be clearer by writing if label in self.output.keys()
, which is more "formal", but probably the same as what's written already. If we make the change, then we also need to modify below
arc/scheduler.py
Outdated
@@ -2244,62 +2266,91 @@ def parse_opt_geo(self, | |||
success = False | |||
logger.debug(f'parsing opt geo for {job.job_name}') | |||
if job.job_status[1]['status'] == 'done': | |||
opt_xyz = parser.parse_xyz_from_file(path=job.local_path_to_xyz or job.local_path_to_output_file) | |||
multi_species = any(spc.multi_species == label for spc in self.species_list) | |||
species_labels = [spc.label for spc in self.species_list if spc.multi_species == label] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would add at the end of this line: if multi_species else list()
to avoid running the loop if multi_species is False
arc/job/adapters/common.py
Outdated
multiplicity = obj.multiplicity if species is None else species.multiplicity | ||
number_of_radicals = obj.species[0].number_of_radicals if species is None else species.number_of_radicals | ||
species_label = obj.species[0].label if species is None else species.label | ||
if (multiplicity > 1 and obj.level.method_type != 'composite') \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We no longer need and obj.level.method_type != 'composite'
, this is being checked above we can assume here that it is not composite
arc/scheduler.py
Outdated
@@ -13,6 +13,7 @@ | |||
import numpy as np | |||
from IPython.display import display | |||
from typing import TYPE_CHECKING, List, Optional, Tuple, Union | |||
from pathlib import Path |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The import should be added in the commit where Path
is being used (unless we change to using os.path.dirname()
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I think it is used below in make_multi_species_output_file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was this comment addressed? If so, you can mark it as "resolved"
arc/scheduler.py
Outdated
@@ -597,10 +598,15 @@ def schedule_jobs(self): | |||
if not (job.job_id in self.server_job_ids and job.job_id not in self.completed_incore_jobs): | |||
successful_server_termination = self.end_job(job=job, label=label, job_name=job_name) | |||
if successful_server_termination: | |||
multi_species = any(spc.multi_species == label for spc in self.species_list) | |||
if multi_species: | |||
self.multi_species_path_dict = self.make_multi_species_output_file(label=label, job=job) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should call make_multi_species_output_file()
inside parse_opt_geo()
?
THis place seems too generic or too high-level to be calling these functions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason I put it here is that the output from make_multi_species_output_file
is being used by both parse_opt_geo
and parse_opt_e_elect
, as seen below. This also allows us to add more parsing processes between make_multi_species_output_file
and delete_multi_species_output_file
eventually if it is needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was this addressed?
@@ -605,8 +605,8 @@ def schedule_jobs(self): | |||
if success: | |||
self.parse_opt_e_elect(label=label, job=job) | |||
self.spawn_post_opt_jobs(label=label, job_name=job_name) | |||
# if multi_species: | |||
# self.delete_multi_species_output_file(label=label, job=job) | |||
if multi_species: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be squshed with the previous commit where it is added as a comment
Also, we need to consider where to call this. Maybe inside parse_opt_e_elect()
as well?
arc/parser.py
Outdated
except (LogError, NotImplementedError): | ||
logger.warning(f'Could not read e_elect from {path}') | ||
return e_elect_multi | ||
# def parse_multi_species_e_elect(path: str, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this isn't final, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the confusion, I remove this part entirely now.
99566f1
to
658eeea
Compare
Hi guys, I just wanted to ask, what happens if one species in the multi gauss file failes due to a convergence issue etc. How do we troubleshoot it? Do we rerun all the species again, or will ARC be able to recognise that, for example, 1 of the 3 species in the input.gjf failed and it will only troubleshoot that particular species |
From my point of view, it can be done individually instead of the whole group to be run again. Because when we are doing the parsing, we slice the big output file into individual ones and investigate the status of each of the species. So when one of the species goes wrong, we can just relaunch the specific one. Though I am not sure if it is being applied currently, could you please comment on that, @alongd? Thanks! |
I agree, troubleshooting specific species in the future is the way to go |
46022ca
to
e2ec635
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @JintaoWu98 !
I added another round of comments, please take a look
arc/job/adapter.py
Outdated
@@ -375,7 +375,8 @@ def determine_job_array_parameters(self): | |||
ARC will allocate, e.g., 8 workers, to simultaneously get processes (one by one) from the HDF5 bank | |||
and execute them. On average, each worker in this example executes 125 jobs. | |||
""" | |||
if self.execution_type == 'incore': | |||
if self.execution_type == 'incore' \ | |||
or self.species is not None and all([spc.multi_species is not None for spc in self.species]): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use self.run_multi_species
here instead of self.species is not None and all([spc.multi_species is not None for spc in self.species])
?
arc/job/adapter.py
Outdated
@@ -545,7 +549,7 @@ def set_file_paths(self): | |||
|
|||
if self.server is not None: | |||
# Parentheses don't play well in folder names: | |||
species_name_remote = self.species_label.replace('(', '_').replace(')', '_') | |||
species_name_remote = self.species_label.replace('(', '_').replace(')', '_') if isinstance(self.species_label, str) else '' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will make species_name_remote
an empty sting in case we have a multispecies. Is that desirable?
arc/job/adapters/gaussian.py
Outdated
@@ -245,8 +248,8 @@ def write_input_file(self) -> None: | |||
if self.level.method_type in ['semiempirical', 'force_field']: | |||
self.checkfile = None | |||
|
|||
if not is_restricted(self): | |||
input_dict['restricted'] = 'u' | |||
# if self.run_multi_species is False and not is_restricted(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why did this commented-out code remain here? Can we remove it?
@@ -263,8 +271,10 @@ def write_input_file(self) -> None: | |||
self.level.method = 'cbs-qb3' | |||
|
|||
# Job type specific options | |||
# max_cycles 500 | |||
max_c = self.args['trsh'].split()[1] if 'max_cycles' in self.args['trsh'] else 100 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was this comment addressed?
arc/job/adapters/gaussian.py
Outdated
if self.job_type in ['opt', 'conformers', 'optfreq', 'composite']: | ||
keywords = ['ts', 'calcfc', 'noeigentest', 'maxcycles=100'] if self.is_ts else ['calcfc'] | ||
keywords = ['ts', 'calcfc', 'noeigentest', f'maxcycles={max_c}' + '}'] if self.is_ts else ['calcfc'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need the extra + '}'
?
arc/scheduler.py
Outdated
@@ -13,6 +13,7 @@ | |||
import numpy as np | |||
from IPython.display import display | |||
from typing import TYPE_CHECKING, List, Optional, Tuple, Union | |||
from pathlib import Path |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was this comment addressed? If so, you can mark it as "resolved"
arc/scheduler.py
Outdated
@@ -597,10 +598,15 @@ def schedule_jobs(self): | |||
if not (job.job_id in self.server_job_ids and job.job_id not in self.completed_incore_jobs): | |||
successful_server_termination = self.end_job(job=job, label=label, job_name=job_name) | |||
if successful_server_termination: | |||
multi_species = any(spc.multi_species == label for spc in self.species_list) | |||
if multi_species: | |||
self.multi_species_path_dict = self.make_multi_species_output_file(label=label, job=job) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was this addressed?
species = self.species_dict[label] | ||
elif isinstance(label, list): | ||
species = [spc for spc in self.species_list if spc.label in label] | ||
run_multi_species = all([spc.multi_species is not None for spc in species]) if isinstance(species, list) else False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was this addressed?
arc/scheduler.py
Outdated
@@ -3503,6 +3615,74 @@ def save_restart_dict(self): | |||
logger.debug(f'Dumping restart dictionary:\n{self.restart_dict}') | |||
save_yaml_file(path=self.restart_path, content=self.restart_dict) | |||
|
|||
def make_multi_species_output_file(self, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you try relocating this function from Scheduler to plotter?
Also please add a test for it
e2ec635
to
fac6d6e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please let me know if there are any other comments. Thanks!
species = self.species_dict[label] | ||
elif isinstance(label, list): | ||
species = [spc for spc in self.species_list if spc.label in label] | ||
run_multi_species = all([spc.multi_species is not None for spc in species]) if isinstance(species, list) else False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the advice, I agree with you this idea can make sure all multi_species
clusters are the same string, but I have two questions here.
1.
For the all
function,
all([spc.multi_species == spc.multi_species[0] for spc in species])
should it be the following instead
all([spc.multi_species == species[0].multi_species for spc in species])
2.
If we only focus on the run_opt_job
, where we have the following lines:
if self.species_dict[label].multi_species:
##Some code##
label = [species.label for species in self.species_list
if species.multi_species == self.species_dict[label].multi_species]
self.run_job(label=label,
xyz=self.species_dict[label].initial_xyz if isinstance(label, str) else None,
level_of_theory=self.opt_level,
job_type='opt',
fine=fine)
The label
argument is already been filtered to make sure multi_species
clusters are the same string. Then I think we don't really need this augmented version.
On the other hand, if we want to look somewhere else run_job
is being called, this is necessary.
arc/parser.py
Outdated
software: Optional[str] = 'gaussian', | ||
) -> Optional[Dict[str, tuple]]: | ||
""" | ||
Parse the electronic energy from an opt job output file. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, sorry for the confusion, in the current version of the code, I remove parse_multi_species_e_elect
entirely. The original code had two goals.
The first one is to slice the big cluster output file, which is being handled by Scheduler.make_multi_species_output_file
now.
The other is to parse the e_elect
, which is handled by Scheduler.parse_opt_e_elect
if multi_species:
for spc in self.species_list:
if spc.multi_species == label:
multi_species_opt_xyzs[spc.label] = parser.parse_xyz_from_file(path=self.multi_species_path_dict[spc.label])
The new version of the branch doesn't need to modify the parser.py
at all.
arc/parser.py
Outdated
|
||
Args: | ||
path (str): The ESS log file to parse from. | ||
zpe_scale_factor (float): The ZPE scaling factor, used only for composite methods in Gaussian via Arkane. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as the above comment.
arc/parser.py
Outdated
zpe_scale_factor (float): The ZPE scaling factor, used only for composite methods in Gaussian via Arkane. | ||
software (str, optional): The ESS. | ||
|
||
Returns: Optional[float] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as the above comment.
arc/parser.py
Outdated
output_file.write(software + '\n' + line + '\n') | ||
e_elect_multi[species_name_list[i]] = parse_e_elect(path=output_file_path) | ||
os.remove(output_file_path) | ||
except (LogError, NotImplementedError): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, same as the previous comment about removing this entire block, this is being handled by parse_e_elect
in Scheduler.parse_opt_e_elect
directly.
arc/scheduler.py
Outdated
@@ -13,6 +13,7 @@ | |||
import numpy as np | |||
from IPython.display import display | |||
from typing import TYPE_CHECKING, List, Optional, Tuple, Union | |||
from pathlib import Path |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I think it is used below in make_multi_species_output_file
arc/scheduler.py
Outdated
@@ -597,10 +598,15 @@ def schedule_jobs(self): | |||
if not (job.job_id in self.server_job_ids and job.job_id not in self.completed_incore_jobs): | |||
successful_server_termination = self.end_job(job=job, label=label, job_name=job_name) | |||
if successful_server_termination: | |||
multi_species = any(spc.multi_species == label for spc in self.species_list) | |||
if multi_species: | |||
self.multi_species_path_dict = self.make_multi_species_output_file(label=label, job=job) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason I put it here is that the output from make_multi_species_output_file
is being used by both parse_opt_geo
and parse_opt_e_elect
, as seen below. This also allows us to add more parsing processes between make_multi_species_output_file
and delete_multi_species_output_file
eventually if it is needed.
arc/parser.py
Outdated
except (LogError, NotImplementedError): | ||
logger.warning(f'Could not read e_elect from {path}') | ||
return e_elect_multi | ||
# def parse_multi_species_e_elect(path: str, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the confusion, I remove this part entirely now.
arc/scheduler.py
Outdated
@@ -596,6 +596,7 @@ def schedule_jobs(self): | |||
if successful_server_termination: | |||
success = self.parse_opt_geo(label=label, job=job) | |||
if success: | |||
self.parse_opt_e_elect(label=label, job=job) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if not self.job_types['sp']:
seems to lead to functional test failure for both Kinetics and Thermo, for now I keep the if success
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this contribution, @JintaoWu98 !
Looking good. Please go ahead and squash the fixup commits
make_multi_species_output_file is created to first slice the big cluster file. delete_multi_species_output_file is created to delete the temporary generated files.
In addition, define a new method "post_opt_geo_work" for post processing
fac6d6e
to
394e65b
Compare
make_multi_species_output_file is created to first slice the big cluster file. delete_multi_species_output_file is created to delete the temporary generated files.
In case of given initial xyzs, we can now prevent them to run multiple times
Currently only implemented for Gaussian Modify the gaussian.py for multiple species, including rewriting the template
In addition, add few method types leading to restricted cases fixup! multi_species "is_restricted" in common.py and common_test.py
394e65b
to
4aa5282
Compare
@JintaoWu98, is this PR ready to be merged? Can you run a multispecies project using the updated brunch to confirm it works as expected? |
Hi @alongd, a multi species project is working as expected. I think it is ready to be merged. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this contribution!
This branch allows us to create a cluster species including multi-individual species. To be more specific, the current arc creates different input Gaussian input files if it receives multispecies. With this new feature, we can combine all the species in one Gaussian input file, followed by the relevant post-processing steps.