Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Detachment of env and proxy; avoiding copies of the environments #299

Merged
merged 73 commits into from
May 22, 2024

Conversation

alexhernandezgarcia
Copy link
Owner

@alexhernandezgarcia alexhernandezgarcia commented Mar 19, 2024

This PR refactors several important aspects of the code base:

  • Avoid unnecessary copies of environments to sample trajectories.
  • Detachment of environment and proxy
  • Dropping the convention of range of values in the proxies

Copies of environments

Each trajectory in a GFlowNet batch is constructed by evolving one environment, either forwards or backwards. So far, to construct each trajectory, a reference environment was copied and reset. This PR replaces this copying operation by storing an "environment maker" as an attribute of the agent (as though it was just the environment class) and instead creating a new environment instance for each trajectory.

Detachment of environment and proxy

This addresses Issue #288.

So far, the proxy has been part of GFlowNet environments as an attribute. The rationale was to be able to compute the reward of a trajectory directly from the environment, for example with env.reward(). While this is a nice feature, it comes at a cost and it has prevented extending the flexibility to use different proxy-to-reward functions, a better handling of the computation of log-rewards for better numerical stability, as well the simple use of alternative baselines to GFlowNets.

In this PR, the proxy and the environments have been detached and become (almost fully) independent of each other. Together with this change, I have also tried to improve the implementation for handling the conversion of proxy values to rewards or log rewards, to make it for efficient, numerically more stable and more flexible to extend.

These are some of the new features:

  • Several proxy-to-reward functions are transparently supported: identity, absolute value, power (to β), exponential (Boltzmann), shift (sum of β) and product (with β).
  • All this is now implemented in the base Proxy class.
  • One can easily create a new custom proxy-to-reward function in their custom proxy class and use it by simply setting it as a callable to in self.reward_function.
  • Since all losses are computed in log domain, now the log-rewards can directly be computed from the proxy values without first computing rewards and then taking the log. This may be particularly important for numerical stability when using exponential proxy-to-reward functions.

Dropping convention of range of values in the proxies

So far, we (sort of) followed a convention by which lower proxy values (and negative) were better. This has created a lot of confusion because not all proxies make much sense in this convention.

With this PR, this convention is dropped altogether and the guiding principle now is that the user is free to create any proxy with any range of values and any direction (maximisation or minimisation), but they are responsible to use an appropriate proxy-to-reward function. This is now much easier given the changes described in the previous section.

To-do list

Some of these things may be left for a separate PR.

  • Sanity check experiments
  • IMPORTANT: fix FL loss with exponential proxy2reward (see Tetris) - currently optimises opposite direction
  • Fix sanity check experiments with tetris plotting
  • Adapt methods in environments (e.g. sample_from_reward(), compute_train_energy_proxy_and_rewards()...) that still use the old methods in the base env.
  • Consider getting rid of self.reward_norm and co. in the base env.
  • Fix broken common tests with minimal runs.
  • Update all configs...
  • Avoid re-computing proxy values for Buffer This is left for a future PR.
  • Update outputs of all proxies: the output does not need to be negative anymore, then we can make identity the default.
  • Write doc in Proxy base about reward_func and rest of attributes.

@alexhernandezgarcia alexhernandezgarcia changed the title [Small] Alternative to copying the environments in the GFN Agent [WIP] Alternative to copying the environments in the GFN Agent Mar 19, 2024
@alexhernandezgarcia alexhernandezgarcia marked this pull request as draft March 19, 2024 20:08
@alexhernandezgarcia alexhernandezgarcia changed the title [WIP] Detachment of env and proxy; avoiding copies of the environments Detachment of env and proxy; avoiding copies of the environments May 17, 2024
@alexhernandezgarcia
Copy link
Owner Author

Ok, to me this is now ready to merge. @carriepl @AlexandraVolokhova I'm tagging you as reviewers in case you want to take a quick look and to let you know. Feel free to go ahead and approve and merge unless you have any concerns. Sanity checks look perfect: https://wandb.ai/alexhg/gfn_sanity_checks (the issue with FL was that the sanity checks config files was incorrect for the new proxy - fixed in 361cdc0)

@alexhernandezgarcia alexhernandezgarcia marked this pull request as ready for review May 17, 2024 02:47
@alexhernandezgarcia
Copy link
Owner Author

alexhernandezgarcia commented May 17, 2024

@ginihumer I am tagging you here because this will be merged soon into the main branch and then into the activelearning branch, which will force making changes (for the better) on the active learning repo. No need to check the code details, but it's worth reading the description at least to understand what will change. In a nutshell, that proxies don't need to be negative and the lower the better, but can be anything and can be controlled by reward_function. Also, that the proxy (e.g. acquisition function) won't be part of the environment any more.

Copy link
Collaborator

@carriepl carriepl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a huge amount of work in this PR! Overall, it looks pretty good to me.

config/logger/base.yaml Outdated Show resolved Hide resolved
config/proxy/base.yaml Show resolved Hide resolved
config/experiments/icml23/ctorus.yaml Outdated Show resolved Hide resolved
gflownet/envs/htorus.py Outdated Show resolved Hide resolved
format.
"""
samples_final = []
max_reward = self.proxy.proxy2reward(self.proxy.min)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line is mysterious to me. Not sure if I'll understand it later when I read the remaining changes so... just leaving a comment here as a reminder.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh dear, good catch! This is pretty wrong and it's gonna need a bunch of changes that will delay things a bit...

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about this? a1462f1

tests/gflownet/proxy/test_base.py Outdated Show resolved Hide resolved
tests/gflownet/proxy/test_base.py Show resolved Hide resolved
tests/gflownet/proxy/test_base.py Show resolved Hide resolved
@alexhernandezgarcia
Copy link
Owner Author

@carriepl Thank you so much for the review! Would you mind taking a quick look at my comments and resolving yours if the comment is addressed. There should be one remaining issue only, if I am not mistaken.

@carriepl
Copy link
Collaborator

@carriepl Thank you so much for the review! Would you mind taking a quick look at my comments and resolving yours if the comment is addressed. There should be one remaining issue only, if I am not mistaken.

I just looked. It all looks great. Only that one issue remaining.

@alexhernandezgarcia
Copy link
Owner Author

alexhernandezgarcia commented May 21, 2024 via email

Copy link
Collaborator

@carriepl carriepl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Many thanks!

@carriepl carriepl merged commit bcb1443 into main May 22, 2024
1 check passed
@carriepl carriepl deleted the dont-copy-envs branch May 22, 2024 14:42
@alexhernandezgarcia alexhernandezgarcia restored the dont-copy-envs branch May 27, 2024 13:48
@alexhernandezgarcia alexhernandezgarcia deleted the dont-copy-envs branch May 28, 2024 02:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants