Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert workflow to use Pegasus WMS #5

Open
wants to merge 161 commits into
base: base
Choose a base branch
from
Open

Convert workflow to use Pegasus WMS #5

wants to merge 161 commits into from

Conversation

spigo900
Copy link
Collaborator

No description provided.

@spigo900
Copy link
Collaborator Author

spigo900 commented Jul 27, 2020

Possible "future work"/room to improve on this:

  • Currently there are hardcoded (yet simple) exceptions re: AlphaNLI and HellaSwag training parameters.
    • It might be handy to be able to specify simple exceptions like these in the Pegasus parameters file somehow.
  • The parameters files that the training script gets are "polluted" with whatever extra stuff is floating around in the Pegasus script's parameters file.
    • For example, the training script gets the number of bootstrapping samples to take when calculating accuracy in ensemble.py.
    • This is because of how the model parameters are set up.
    • It might be a good idea to put the model parameters in their own space so that they can be passed to the training script without all the other junk.

@spigo900 spigo900 marked this pull request as ready for review July 30, 2020 17:34
@spigo900
Copy link
Collaborator Author

spigo900 commented Jul 30, 2020

@denizbeser This should be ready for review now.

I haven't tested the rebased version, but the original worked correctly.

I've implemented my "future work" ideas in a separate branch plus a few other improvements, but I'm going to hold off on merging those until I've tested that they work with the full workflow and not just the dev parameters. That may end up being a separate pull request.

@spigo900
Copy link
Collaborator Author

spigo900 commented Aug 4, 2020

I've merged the various improvements (mentioned above) into this branch. The updated version works on the smaller development config (pegasus-dev.params), however I haven't finished testing it with the bigger config (pegasus-dev-full.params). (I ran it yesterday, but about eight of the jobs didn't finish, I think because they hit the job time limit.)

Training takes long enough for the failed models (training on AlphaNLI and HellaSwag) that I'm going to hold off on re-launching the full test run until tomorrow. For now I'm going to work on job throttling. At some point I will also add some better documentation on how to use the Pegasus script and workflows.

@spigo900
Copy link
Collaborator Author

This is confirmed to work with the full workflow.

Previously, the model could not find the task data.
np.random.random_integers is deprecated, apparently
This should speed up testing in the future, although I think there's no
more need for testing right now.
These still run the full workflow, but don't run anything on MICS.
This will probably not work.

Once fan-out is working I can set it to use MICS without disrupting
other users.
This should make the script fail faster when it doesn't get the right
parameters.
This should prevent typo-related problems and ensure that the references
get updated if the module names ever change again.
Thinking about it, probably the smarter thing to do would be to simply
not specify it in parameters_combinations but just use the keys from
tasks_to_thresholds. But I don't want to attempt that right now, since
the workflow is working now.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants