Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add best metric support checkpoint #679

Closed
wants to merge 1 commit into from

Conversation

JKSenthil
Copy link
Contributor

Summary:

Context

This diff stack is aimed to implement best metric checkpoint support.

This Diff

This diff adds the best metric support checkpoint functionality to the base checkpointer.

  1. Adds BestCheckpointConfig struct to configure how to checkpoint best metric. It takes a metric name and mode (ie min or max) which indicates which is used to determine best value
  2. In the BaseCheckpointer constructor:
  • Adds save_every_n_eval_epochs to BaseCheckpointer, which can be used to save at end of eval epoch during fit
  • If best checkpoint config is used, will sort existing checkpoint paths by metric value, as opposed to recency
  1. Adds self._should_save_checkpoint() to check if checkpoint needs to be saved or not (is used for both normal and best checkpoint configs)
  2. Augments self._generate_checkpoint_and_upkeep() to consider best checkpoint config and calls appropriate logic

Next Diff

Forwards these args for best metric checkpoint to the derivative checkpointers (TSS and DCPSaver)

Future

Can consider case of passing lambda function instead of reading the metric value from the unit

Differential Revision: D52739597

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D52739597

JKSenthil added a commit to JKSenthil/tnt that referenced this pull request Jan 17, 2024
Summary:

# Context
This diff stack is aimed to implement best metric checkpoint support. 

# This Diff
This diff adds the best metric support checkpoint functionality to the base checkpointer.

1. Adds `BestCheckpointConfig` struct to configure how to checkpoint best metric. It takes a metric name and mode (ie min or max) which indicates which is used to determine best value
2. In the BaseCheckpointer constructor:
* Adds `save_every_n_eval_epochs` to BaseCheckpointer, which can be used to save at end of eval epoch during fit
*  If best checkpoint config is used, will sort existing checkpoint paths by metric value, as opposed to recency
3. Adds `self._should_save_checkpoint()` to check if checkpoint needs to be saved or not (is used for both normal and best checkpoint configs)
4. Augments `self._generate_checkpoint_and_upkeep()` to consider best checkpoint config and calls appropriate logic

# Next Diff
Forwards these args for best metric checkpoint to the derivative checkpointers (TSS and DCPSaver)

# Future
Can consider case of passing lambda function instead of reading the metric value from the unit

Reviewed By: galrotem

Differential Revision: D52739597
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D52739597

JKSenthil added a commit to JKSenthil/tnt that referenced this pull request Jan 17, 2024
Summary:

# Context
This diff stack is aimed to implement best metric checkpoint support. 

# This Diff
This diff adds the best metric support checkpoint functionality to the base checkpointer.

1. Adds `BestCheckpointConfig` struct to configure how to checkpoint best metric. It takes a metric name and mode (ie min or max) which indicates which is used to determine best value
2. In the BaseCheckpointer constructor:
* Adds `save_every_n_eval_epochs` to BaseCheckpointer, which can be used to save at end of eval epoch during fit
*  If best checkpoint config is used, will sort existing checkpoint paths by metric value, as opposed to recency
3. Adds `self._should_save_checkpoint()` to check if checkpoint needs to be saved or not (is used for both normal and best checkpoint configs)
4. Augments `self._generate_checkpoint_and_upkeep()` to consider best checkpoint config and calls appropriate logic

# Next Diff
Forwards these args for best metric checkpoint to the derivative checkpointers (TSS and DCPSaver)

# Future
Can consider case of passing lambda function instead of reading the metric value from the unit

Reviewed By: galrotem

Differential Revision: D52739597
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D52739597

JKSenthil added a commit to JKSenthil/tnt that referenced this pull request Jan 17, 2024
Summary:

# Context
This diff stack is aimed to implement best metric checkpoint support. 

# This Diff
This diff adds the best metric support checkpoint functionality to the base checkpointer.

1. Adds `BestCheckpointConfig` struct to configure how to checkpoint best metric. It takes a metric name and mode (ie min or max) which indicates which is used to determine best value
2. In the BaseCheckpointer constructor:
* Adds `save_every_n_eval_epochs` to BaseCheckpointer, which can be used to save at end of eval epoch during fit
*  If best checkpoint config is used, will sort existing checkpoint paths by metric value, as opposed to recency
3. Adds `self._should_save_checkpoint()` to check if checkpoint needs to be saved or not (is used for both normal and best checkpoint configs)
4. Augments `self._generate_checkpoint_and_upkeep()` to consider best checkpoint config and calls appropriate logic

# Next Diff
Forwards these args for best metric checkpoint to the derivative checkpointers (TSS and DCPSaver)

# Future
Can consider case of passing lambda function instead of reading the metric value from the unit

Reviewed By: galrotem

Differential Revision: D52739597
JKSenthil added a commit to JKSenthil/tnt that referenced this pull request Jan 18, 2024
Summary:

# Context
This diff stack is aimed to implement best metric checkpoint support. 

# This Diff
This diff adds the best metric support checkpoint functionality to the base checkpointer.

1. Adds `BestCheckpointConfig` struct to configure how to checkpoint best metric. It takes a metric name and mode (ie min or max) which indicates which is used to determine best value
2. In the BaseCheckpointer constructor:
* Adds `save_every_n_eval_epochs` to BaseCheckpointer, which can be used to save at end of eval epoch during fit
*  If best checkpoint config is used, will sort existing checkpoint paths by metric value, as opposed to recency
3. Adds `self._should_save_checkpoint()` to check if checkpoint needs to be saved or not (is used for both normal and best checkpoint configs)
4. Augments `self._generate_checkpoint_and_upkeep()` to consider best checkpoint config and calls appropriate logic

# Next Diff
Forwards these args for best metric checkpoint to the derivative checkpointers (TSS and DCPSaver)

# Future
Can consider case of passing lambda function instead of reading the metric value from the unit

Reviewed By: galrotem

Differential Revision: D52739597
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D52739597

Summary:

# Context
This diff stack is aimed to implement best metric checkpoint support. 

# This Diff
This diff adds the best metric support checkpoint functionality to the base checkpointer.

1. Adds `BestCheckpointConfig` struct to configure how to checkpoint best metric. It takes a metric name and mode (ie min or max) which indicates which is used to determine best value
2. In the BaseCheckpointer constructor:
* Adds `save_every_n_eval_epochs` to BaseCheckpointer, which can be used to save at end of eval epoch during fit
*  If best checkpoint config is used, will sort existing checkpoint paths by metric value, as opposed to recency
3. Adds `self._should_save_checkpoint()` to check if checkpoint needs to be saved or not (is used for both normal and best checkpoint configs)
4. Augments `self._generate_checkpoint_and_upkeep()` to consider best checkpoint config and calls appropriate logic

# Next Diff
Forwards these args for best metric checkpoint to the derivative checkpointers (TSS and DCPSaver)

# Future
Can consider case of passing lambda function instead of reading the metric value from the unit

Reviewed By: galrotem

Differential Revision: D52739597
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D52739597

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants