-
Notifications
You must be signed in to change notification settings - Fork 183
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add an option to read in a list of points to scan from a .csv file #320
Conversation
Documentation added in Combine docs: cms-analysis/HiggsAnalysis-CombinedLimit#937 |
I think this should now be ready to merge, any suggestions for improvements? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @runtingt, thanks for the PR. This is useful functionality, but I've added some comments that should be addressed first before we merge. Note also that combineTool.py already has a (not well documented) --generate
option, which can be used to construct multiple jobs out of a series of options, e.g.:
> combineTool.py -M MultiDimFit --algo fixed --generate n:fixedPointPOIs::0,,r1=1,r2=2:1,,r1=2,r2=3 --dry-run
[DRY-RUN]: combine --algo fixed -M MultiDimFit --fixedPointPOIs r1=1,r2=2 -n .Test.0
[DRY-RUN]: combine --algo fixed -M MultiDimFit --fixedPointPOIs r1=2,r2=3 -n .Test.1
One alternative could be to simply append an extra argument to the internal list of generate options, e.g.
self.args.generate.append(
"n:fixedPointPOIs::" + ":".join(["{},,".format(i) + ",".join(["{}={}".format(k,v) for k,v in X.items()]) for i, X in enumerate(pd.read_csv("test.csv").to_dict(orient="records"))]))
but also fine to stick with the current implementation.
@@ -32,6 +33,8 @@ def attach_intercept_args(self, group): | |||
'-m', '--mass', help='Supports range strings for multiple masses, e.g. "120:130:5,140 will produce three combine calls with mass values of 120, 125, 130 and 140"') | |||
group.add_argument( | |||
'--points', help='For use with "-M MultiDimFit --algo grid" to split scan points into separate jobs') | |||
group.add_argument( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function describes what we call intercept
args - ones that are already used by combine that we want to extract/manipulate. Since you are adding a new arg, this should go in the attach_args
method below.
@@ -88,6 +91,32 @@ def run_method(self): | |||
subbed_vars[('SEED',)] = [(sval,) for sval in seed_vals] | |||
self.passthru.extend(['-s', '%(SEED)s']) | |||
|
|||
# Handle the --fromfile option | |||
if '--algo' in self.passthru: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Conversely, it seems you want to intercept the usual --algo
option used by combine. Better to add this to the attach_intercept_args
method above (and remember to put it back in the final args again)
self.passthru.extend(['--points', self.args.points]) | ||
groupby = 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move this declaration closer to where it used for the first time (though I'm not sure its needed)
with open(self.args.fromfile) as f: | ||
points_df = pd.read_csv(f) | ||
if not self.args.points: | ||
self.args.points = len(points_df) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it would be better not to mix your new implementation with the existing one for --points
and --split-points
- these are specifically about controlling the options in --algo grid
. If you need additional options for your method you can add new arguments. Though it's not totally clear you need this. Is your intention that the argument to --points
will limit the number of lines in your csv file which is ready?
if (self.args.split_points is not None and | ||
self.args.split_points > 0 and | ||
self.args.points is not None): | ||
self.args.points is not None | ||
and self.args.fromfile is None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as above, better not to have to modify this code
points_dict = points_df.iloc[r[0]].to_dict() | ||
points_str = ','.join(f'{k}={v}' for k, v in points_dict.items()) | ||
points_array.append((r[0], r[1], points_str)) | ||
subbed_vars[('P_START', 'P_END', 'fixedPointPOIs')] = points_array |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you need to track start and end indexes here? If I understand correctly, you just want to create one combine call for each row of your CSV file. In that case, you only need one index (to set in args.name
), not two
# can only put the name option back now because we might have modified | ||
# it from what the user specified | ||
self.put_back_arg('name', '-n') | ||
proto = 'combine ' + (' '.join(self.passthru)) | ||
if self.args.there: | ||
proto = 'pushd %(DIR)s; combine ' + (' '.join(self.passthru))+'; popd' | ||
|
||
for it in itertools.product(*list(subbed_vars.values())): | ||
job_str = '' | ||
for idx, it in enumerate(itertools.product(*list(subbed_vars.values()))): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please avoid changing this part of the code, I don't think it's necessary - combineTool.py already has a mechanism to group individual combine calls into single jobs (--merge X
).
Hi @ajgilbert, thanks for the review and apologies for the delay. Since the scripts have now moved to Combine, I've actioned your comments there: cms-analysis/HiggsAnalysis-CombinedLimit#937 |
Adds option
--fromfile
to EnhancedCombine that allows the reading of points from a.csv
file. Some checks for number of points and the requested algo (i.e. only 'fixed' is allowed) are performed.