Convert workflow to use Pegasus WMS #5

spigo900 · 2020-07-10T18:15:13Z

No description provided.

spigo900 · 2020-07-27T14:48:18Z

Possible "future work"/room to improve on this:

Currently there are hardcoded (yet simple) exceptions re: AlphaNLI and HellaSwag training parameters.
- It might be handy to be able to specify simple exceptions like these in the Pegasus parameters file somehow.
The parameters files that the training script gets are "polluted" with whatever extra stuff is floating around in the Pegasus script's parameters file.
- For example, the training script gets the number of bootstrapping samples to take when calculating accuracy in ensemble.py.
- This is because of how the model parameters are set up.
- It might be a good idea to put the model parameters in their own space so that they can be passed to the training script without all the other junk.

spigo900 · 2020-07-30T18:03:22Z

@denizbeser This should be ready for review now.

I haven't tested the rebased version, but the original worked correctly.

I've implemented my "future work" ideas in a separate branch plus a few other improvements, but I'm going to hold off on merging those until I've tested that they work with the full workflow and not just the dev parameters. That may end up being a separate pull request.

spigo900 · 2020-08-04T15:30:55Z

I've merged the various improvements (mentioned above) into this branch. The updated version works on the smaller development config (pegasus-dev.params), however I haven't finished testing it with the bigger config (pegasus-dev-full.params). (I ran it yesterday, but about eight of the jobs didn't finish, I think because they hit the job time limit.)

Training takes long enough for the failed models (training on AlphaNLI and HellaSwag) that I'm going to hold off on re-launching the full test run until tomorrow. For now I'm going to work on job throttling. At some point I will also add some better documentation on how to use the Pegasus script and workflows.

spigo900 · 2020-08-10T14:44:32Z

This is confirmed to work with the full workflow.

Previously, the model could not find the task data.

np.random.random_integers is deprecated, apparently

This should speed up testing in the future, although I think there's no more need for testing right now.

These still run the full workflow, but don't run anything on MICS.

This will probably not work. Once fan-out is working I can set it to use MICS without disrupting other users.

This should make the script fail faster when it doesn't get the right parameters.

This should prevent typo-related problems and ensure that the references get updated if the module names ever change again.

Thinking about it, probably the smarter thing to do would be to simply not specify it in parameters_combinations but just use the keys from tasks_to_thresholds. But I don't want to attempt that right now, since the workflow is working now.

spigo900 marked this pull request as ready for review July 30, 2020 17:34

spigo900 force-pushed the base-pegasized branch from 89946bb to 2e4109d Compare July 30, 2020 17:58

spigo900 requested a review from denizbeser August 5, 2020 13:37

denizbeser approved these changes Aug 10, 2020

View reviewed changes

spigo900 force-pushed the base-pegasized branch from 9169ec3 to 83533ae Compare August 11, 2020 19:43

Joseph Cecil added 21 commits August 18, 2020 13:57

Add Pegasus wrapper requirement

da52c7c

Add parameter files

d7ca092

Add vistautils requirement

b2e862b

Initial Pegasus skeleton

e1adb5a

Update root parameters file

b01a1db

Pass real job parameters to Pegasus

57fe8b2

Clean up some

f9cd332

Store more job info

bae796e

Fix _includes in runner.params

6fcd5d1

More pegasus stuff

feb5167

Use simple loop to read in job parameters for Pegasus runner

ed3b77c

Create parameters files mirroring Hydra configs

57d8c85

Clean up SAGA code

1963323

Modify train.py to take parameters files as input

0a23da0

Don't overwrite actual parameters with combinations parameters.

fc03fb3

Fix style issues

917c595

Move Slurm configuration into configuration file

57750a8

Remove old TODO

5beadd0

Fix usage of parameters_only_entry_point()

c5bdf99

Fix runner params.

55ef547

Fix resource request creation.

b9e4cc4

spigo900 added 29 commits August 18, 2020 13:58

Ensemble: Fix try_without

69ce3b1

Fix dev ensembling params

1d1814d

Fix how eval.py gets model name

6b77a09

Fix how eval.py loads the model configuration

cbe7e6e

eval.py: Include other parameters in model configuration

b869efc

Previously, the model could not find the task data.

Fix deprecation warning in eval.py

c1814e0

np.random.random_integers is deprecated, apparently

Override testing: Change nonsense parameter instead of real one

0482a81

This should speed up testing in the future, although I think there's no more need for testing right now.

Rename nonsense parameter

4e35777

Fix typo

3d9128d

Use ephemeral in pegasus-dev configuration

c04cf32

Fix partition setup in Pegasus workflow

56a39b9

Add full-workflow development parameters

3d7c7b4

These still run the full workflow, but don't run anything on MICS.

Shorter time limit for AlphaNLI in pegasus-dev-full.params

976009e

This will probably not work. Once fan-out is working I can set it to use MICS without disrupting other users.

Fix how gold labels are found and frontload parameter-getting

9183aef

This should make the script fail faster when it doesn't get the right parameters.

Catch only FileNotFoundError in "couldn't find preds" exception block

d1e9e4a

Fix how model_without_seed is constructed

4b8f34a

Shorten line

7e20de8

Pegasus: Pass gold labels to ensemble script in new way

da14165

Ensemble: Rename variable to match parameter task_to_threshold

e7e8f39

Pegasus: Remove unnecessary line

24983ad

Ensemble: Include parameters in the dict of successful models

5a7582c

Pegasus: Fix how iteration over ensembling tasks is done

1c69984

Pegasus: Actually add task_to_gold parameters to ensembling parameters

9b192bd

Pegasus: Do job throttling

0ed048e

Pegasus: Pass imported modules instead of strings to run

bf5665f

This should prevent typo-related problems and ensure that the references get updated if the module names ever change again.

Check the list of tasks against

95510e2

Thinking about it, probably the smarter thing to do would be to simply not specify it in parameters_combinations but just use the keys from tasks_to_thresholds. But I don't want to attempt that right now, since the workflow is working now.

Rearrange comment

0ab6443

Document how to run the ensembling workflow

8f36dc3

Change AlphaNLI, HellaSwag, and SIQA to use internal dev sets

0eb9806

spigo900 force-pushed the base-pegasized branch from 6e6d8b4 to 0eb9806 Compare August 18, 2020 18:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Convert workflow to use Pegasus WMS #5

Convert workflow to use Pegasus WMS #5

spigo900 commented Jul 10, 2020

spigo900 commented Jul 27, 2020 •

edited

Loading

spigo900 commented Jul 30, 2020 •

edited

Loading

spigo900 commented Aug 4, 2020 •

edited

Loading

spigo900 commented Aug 10, 2020

Convert workflow to use Pegasus WMS #5

Are you sure you want to change the base?

Convert workflow to use Pegasus WMS #5

Conversation

spigo900 commented Jul 10, 2020

spigo900 commented Jul 27, 2020 • edited Loading

spigo900 commented Jul 30, 2020 • edited Loading

spigo900 commented Aug 4, 2020 • edited Loading

spigo900 commented Aug 10, 2020

spigo900 commented Jul 27, 2020 •

edited

Loading

spigo900 commented Jul 30, 2020 •

edited

Loading

spigo900 commented Aug 4, 2020 •

edited

Loading