New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Add an option to read in a list of points to scan from a .csv file #320

Closed

runtingt wants to merge 2 commits into cms-analysis:main from runtingt:main

runtingt commented Apr 8, 2024

Adds option --fromfile to EnhancedCombine that allows the reading of points from a .csv file. Some checks for number of points and the requested algo (i.e. only 'fixed' is allowed) are performed.

runtingt marked this pull request as draft

April 9, 2024 16:26


          Add an option to read in a list of points to scan from a .csv file

acaf745

Fix typo

runtingt force-pushed the main branch from c7c4734 to acaf745 Compare

April 9, 2024 16:36

runtingt marked this pull request as ready for review

April 9, 2024 16:37


          Handle default options

d548d0c

runtingt mentioned this pull request

Add an option to read in a list of points from a .csv file (--fromfile) cms-analysis/HiggsAnalysis-CombinedLimit#937

Merged

Author

runtingt commented Apr 11, 2024

Documentation added in Combine docs: cms-analysis/HiggsAnalysis-CombinedLimit#937

Author

runtingt commented May 1, 2024

I think this should now be ready to merge, any suggestions for improvements?

ajgilbert requested changes

View reviewed changes

Collaborator

ajgilbert left a comment

Hi @runtingt, thanks for the PR. This is useful functionality, but I've added some comments that should be addressed first before we merge. Note also that combineTool.py already has a (not well documented) --generate option, which can be used to construct multiple jobs out of a series of options, e.g.:

> combineTool.py -M MultiDimFit --algo fixed --generate n:fixedPointPOIs::0,,r1=1,r2=2:1,,r1=2,r2=3 --dry-run
[DRY-RUN]: combine --algo fixed -M MultiDimFit --fixedPointPOIs r1=1,r2=2 -n .Test.0
[DRY-RUN]: combine --algo fixed -M MultiDimFit --fixedPointPOIs r1=2,r2=3 -n .Test.1

One alternative could be to simply append an extra argument to the internal list of generate options, e.g.

self.args.generate.append(
            "n:fixedPointPOIs::" + ":".join(["{},,".format(i) + ",".join(["{}={}".format(k,v) for k,v in X.items()]) for i, X in enumerate(pd.read_csv("test.csv").to_dict(orient="records"))]))

but also fine to stick with the current implementation.

CombineTools/python/combine/EnhancedCombine.py

@@ @@ -32,6 +33,8 @@ def attach_intercept_args(self, group): @@
                           '-m', '--mass', help='Supports range strings for multiple masses, e.g. "120:130:5,140 will produce three combine calls with mass values of 120, 125, 130 and 140"')
                       group.add_argument(
                           '--points', help='For use with "-M MultiDimFit --algo grid" to split scan points into separate jobs')
+                      group.add_argument(

Collaborator

ajgilbert May 1, 2024

This function describes what we call intercept args - ones that are already used by combine that we want to extract/manipulate. Since you are adding a new arg, this should go in the attach_args method below.

CombineTools/python/combine/EnhancedCombine.py

@@ @@ -88,6 +91,32 @@ def run_method(self): @@
                           subbed_vars[('SEED',)] = [(sval,) for sval in seed_vals]
                           self.passthru.extend(['-s', '%(SEED)s'])
+                      # Handle the --fromfile option
+                      if '--algo' in self.passthru:

Collaborator

ajgilbert May 1, 2024

Conversely, it seems you want to intercept the usual --algo option used by combine. Better to add this to the attach_intercept_args method above (and remember to put it back in the final args again)

CombineTools/python/combine/EnhancedCombine.py

		self.passthru.extend(['--points', self.args.points])
		groupby = 1

Collaborator

ajgilbert May 1, 2024

move this declaration closer to where it used for the first time (though I'm not sure its needed)

CombineTools/python/combine/EnhancedCombine.py

+                          with open(self.args.fromfile) as f:
+                              points_df = pd.read_csv(f)
+                              if not self.args.points:
+                                  self.args.points = len(points_df)

Collaborator

ajgilbert May 1, 2024

it would be better not to mix your new implementation with the existing one for --points and --split-points - these are specifically about controlling the options in --algo grid. If you need additional options for your method you can add new arguments. Though it's not totally clear you need this. Is your intention that the argument to --points will limit the number of lines in your csv file which is ready?

CombineTools/python/combine/EnhancedCombine.py

                       if (self.args.split_points is not None and
                               self.args.split_points > 0 and
-                              self.args.points is not None):
+                              self.args.points is not None
+                              and self.args.fromfile is None):

Collaborator

ajgilbert May 1, 2024

as above, better not to have to modify this code

CombineTools/python/combine/EnhancedCombine.py

+                              points_dict = points_df.iloc[r[0]].to_dict()
+                              points_str = ','.join(f'{k}={v}' for k, v in points_dict.items())
+                              points_array.append((r[0], r[1], points_str))
+                          subbed_vars[('P_START', 'P_END', 'fixedPointPOIs')] = points_array

Collaborator

ajgilbert May 1, 2024

Why do you need to track start and end indexes here? If I understand correctly, you just want to create one combine call for each row of your CSV file. In that case, you only need one index (to set in args.name), not two

CombineTools/python/combine/EnhancedCombine.py

                       # can only put the name option back now because we might have modified
                       # it from what the user specified
                       self.put_back_arg('name', '-n')
                       proto = 'combine ' + (' '.join(self.passthru))
                       if self.args.there:
                           proto = 'pushd %(DIR)s; combine ' + (' '.join(self.passthru))+'; popd'
-                      for it in itertools.product(*list(subbed_vars.values())):
+                      job_str = ''
+                      for idx, it in enumerate(itertools.product(*list(subbed_vars.values()))):

Collaborator

ajgilbert May 1, 2024

Please avoid changing this part of the code, I don't think it's necessary - combineTool.py already has a mechanism to group individual combine calls into single jobs (--merge X).

Author

runtingt commented Jun 19, 2024

Hi @ajgilbert, thanks for the review and apologies for the delay. Since the scripts have now moved to Combine, I've actioned your comments there: cms-analysis/HiggsAnalysis-CombinedLimit#937

runtingt closed this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet