Detachment of env and proxy; avoiding copies of the environments #299

alexhernandezgarcia · 2024-03-19T19:17:45Z

This PR refactors several important aspects of the code base:

Avoid unnecessary copies of environments to sample trajectories.
Detachment of environment and proxy
Dropping the convention of range of values in the proxies

Copies of environments

Each trajectory in a GFlowNet batch is constructed by evolving one environment, either forwards or backwards. So far, to construct each trajectory, a reference environment was copied and reset. This PR replaces this copying operation by storing an "environment maker" as an attribute of the agent (as though it was just the environment class) and instead creating a new environment instance for each trajectory.

Detachment of environment and proxy

This addresses Issue #288.

So far, the proxy has been part of GFlowNet environments as an attribute. The rationale was to be able to compute the reward of a trajectory directly from the environment, for example with env.reward(). While this is a nice feature, it comes at a cost and it has prevented extending the flexibility to use different proxy-to-reward functions, a better handling of the computation of log-rewards for better numerical stability, as well the simple use of alternative baselines to GFlowNets.

In this PR, the proxy and the environments have been detached and become (almost fully) independent of each other. Together with this change, I have also tried to improve the implementation for handling the conversion of proxy values to rewards or log rewards, to make it for efficient, numerically more stable and more flexible to extend.

These are some of the new features:

Several proxy-to-reward functions are transparently supported: identity, absolute value, power (to β), exponential (Boltzmann), shift (sum of β) and product (with β).
All this is now implemented in the base Proxy class.
One can easily create a new custom proxy-to-reward function in their custom proxy class and use it by simply setting it as a callable to in self.reward_function.
Since all losses are computed in log domain, now the log-rewards can directly be computed from the proxy values without first computing rewards and then taking the log. This may be particularly important for numerical stability when using exponential proxy-to-reward functions.

Dropping convention of range of values in the proxies

So far, we (sort of) followed a convention by which lower proxy values (and negative) were better. This has created a lot of confusion because not all proxies make much sense in this convention.

With this PR, this convention is dropped altogether and the guiding principle now is that the user is free to create any proxy with any range of values and any direction (maximisation or minimisation), but they are responsible to use an appropriate proxy-to-reward function. This is now much easier given the changes described in the previous section.

To-do list

Some of these things may be left for a separate PR.

…ultiple instances of the env in the agent instead of copying it

…roxy.

…ewards() via a log parameter; fixes

…roxy config files

…ts are WIP and everything needs further testing.

…eward instead of in rewards()

…proxy2reward and beta values in configs but as proxy config

alexhernandezgarcia · 2024-05-17T02:47:12Z

Ok, to me this is now ready to merge. @carriepl @AlexandraVolokhova I'm tagging you as reviewers in case you want to take a quick look and to let you know. Feel free to go ahead and approve and merge unless you have any concerns. Sanity checks look perfect: https://wandb.ai/alexhg/gfn_sanity_checks (the issue with FL was that the sanity checks config files was incorrect for the new proxy - fixed in 361cdc0)

alexhernandezgarcia · 2024-05-17T02:59:24Z

@ginihumer I am tagging you here because this will be merged soon into the main branch and then into the activelearning branch, which will force making changes (for the better) on the active learning repo. No need to check the code details, but it's worth reading the description at least to understand what will change. In a nutshell, that proxies don't need to be negative and the lower the better, but can be anything and can be controlled by reward_function. Also, that the proxy (e.g. acquisition function) won't be part of the environment any more.

carriepl

There's a huge amount of work in this PR! Overall, it looks pretty good to me.

config/logger/base.yaml

config/proxy/base.yaml

config/experiments/icml23/ctorus.yaml

gflownet/envs/htorus.py

carriepl · 2024-05-16T19:04:24Z

gflownet/gflownet.py

+            format.
+        """
+        samples_final = []
+        max_reward = self.proxy.proxy2reward(self.proxy.min)


This line is mysterious to me. Not sure if I'll understand it later when I read the remaining changes so... just leaving a comment here as a reminder.

Oh dear, good catch! This is pretty wrong and it's gonna need a bunch of changes that will delay things a bit...

How about this? a1462f1

tests/gflownet/proxy/test_base.py

…hecks configs

…g methods

alexhernandezgarcia · 2024-05-21T16:55:40Z

@carriepl Thank you so much for the review! Would you mind taking a quick look at my comments and resolving yours if the comment is addressed. There should be one remaining issue only, if I am not mistaken.

carriepl · 2024-05-21T17:07:43Z

@carriepl Thank you so much for the review! Would you mind taking a quick look at my comments and resolving yours if the comment is addressed. There should be one remaining issue only, if I am not mistaken.

I just looked. It all looks great. Only that one issue remaining.

alexhernandezgarcia · 2024-05-21T17:52:39Z

Yes this is nice! I use it more and more, for example in the tests of the new Crystal subenvs.

On 21 May 2024 12:59:01 GMT-04:00, carriepl ***@***.***> wrote: @carriepl commented on this pull request. > + 1.0, + 8.1031e03, + ], + [-11, -2, -1.5, -1.1, -1, -0.9, -0.5, 0, 9], + ), + ], +) +def test_reward_function_callable__behaves_as_expected( + proxy_callable, + reward_function, + logreward_function, + proxy_values, + rewards_exp, + logrewards_exp, +): + proxy = proxy_callable Ok, I wasn't aware that Pytest fixtures could implicitely use the test parameters. Good to know! -- Reply to this email directly or view it on GitHub: #299 (comment) You are receiving this because you authored the thread. Message ID: ***@***.***>

-- Sent from /e/ Mail.

carriepl

Looks good to me. Many thanks!

alexhernandezgarcia added 2 commits March 19, 2024 14:59

Use a partial instantiation of the environment and use it to create m…

92f4d83

…ultiple instances of the env in the agent instead of copying it

Adapt common tests to new GFlowNetAgent env_maker parameter

9d9a3d7

alexhernandezgarcia changed the title ~~[Small] Alternative to copying the environments in the GFN Agent~~ [WIP] Alternative to copying the environments in the GFN Agent Mar 19, 2024

alexhernandezgarcia marked this pull request as draft March 19, 2024 20:08

alexhernandezgarcia and others added 26 commits March 29, 2024 18:15

WIP: progress in implementing reward-handling functionality in base p…

9349497

…roxy.

Basic test for base proxy and fixes.

d0db503

Remove higher_is_better parameter

c4b6e5c

Add basic log_rewards()

f1ca7ef

Implement power proxy2reward

c0be4fb

Implement exponential proxy2reward

fb011ae

Implement shift proxy2reward; fixes and doscstring

2c8ef54

Implement product proxy2reward

8440998

Add identity proxy2reward

f23dead

WIP: handle log rewards

615e08c

Handle log rewards properly

e239d66

Implement functionality for passing callable instead of string

fbff130

log_rewards uses proxy2logreward

c4144d0

Add min reward as attribute

ecc9c56

Add absolute proxy-to-reward function; integrate log_rewards() into r…

4c53cfb

…ewards() via a log parameter; fixes

Make default base proxy config file and set it as default for other p…

34275f3

…roxy config files

Adapt Batch to compute rewards from proxy and handle log rewards. Tes…

911e0b7

…ts are WIP and everything needs further testing.

Implement get_min_reward() in base proxy.

060a82e

Use get_min_reward() in batch.

25cf127

Adapt Batch tests

26fc8ea

Handle clipping of rewards

a9174eb

Add TODO

9906093

Clip and replace nans in (log) rewards in proxy2reward and proxy2logr…

5d44764

…eward instead of in rewards()

Adapt gflownet.py

7ff0206

Adapt main.py

84dbf46

Replace torch.equal by isclose in Batch tests of rewards

51939bb

alexhernandezgarcia added 7 commits May 6, 2024 23:15

Update env base config

d69950e

Remove reward_beta and reward_func from config files

7f9e136

Update proxies: outputs are not negative by default anymore; Restore …

4267acb

…proxy2reward and beta values in configs but as proxy config

black

7e9cc44

Fix scrabble proxy and tests

10e5d78

Fix sanity check runs config

361cdc0

Docstring of base Proxy

4c1b1de

alexhernandezgarcia changed the title ~~[WIP] Detachment of env and proxy; avoiding copies of the environments~~ Detachment of env and proxy; avoiding copies of the environments May 17, 2024

alexhernandezgarcia marked this pull request as ready for review May 17, 2024 02:47

alexhernandezgarcia requested review from carriepl and AlexandraVolokhova May 17, 2024 02:47

carriepl reviewed May 17, 2024

View reviewed changes

alexhernandezgarcia added 6 commits May 21, 2024 10:58

Set efault number of grid points for reward density to 40000; Fix typo

c9879de

Documentation about arguments of proxy-to-reward functions

54ffaf1

Revert test period in icml torus config file and apply it in sanity c…

73a8749

…hecks configs

Extend documentation and use of rewards and sample_rewards in plottin…

fa40c51

…g methods

Fix fixture in tests of base proxy

a9ef286

Extend tests of identity reward_function with another value of beta

abe50cb

Implement functionality to handle max_reward

a1462f1

carriepl approved these changes May 22, 2024

View reviewed changes

carriepl merged commit bcb1443 into main May 22, 2024
1 check passed

carriepl deleted the dont-copy-envs branch May 22, 2024 14:42

alexhernandezgarcia restored the dont-copy-envs branch May 27, 2024 13:48

alexhernandezgarcia deleted the dont-copy-envs branch May 28, 2024 02:59

alexhernandezgarcia mentioned this pull request Jun 4, 2024

The proxy should not be copied into each environment instance #288

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Detachment of env and proxy; avoiding copies of the environments #299

Detachment of env and proxy; avoiding copies of the environments #299

alexhernandezgarcia commented Mar 19, 2024 •

edited

Loading

alexhernandezgarcia commented May 17, 2024

alexhernandezgarcia commented May 17, 2024 •

edited

Loading

carriepl left a comment

carriepl May 16, 2024

alexhernandezgarcia May 21, 2024

alexhernandezgarcia May 21, 2024

alexhernandezgarcia commented May 21, 2024

carriepl commented May 21, 2024

alexhernandezgarcia commented May 21, 2024 via email

carriepl left a comment

Detachment of env and proxy; avoiding copies of the environments #299

Detachment of env and proxy; avoiding copies of the environments #299

Conversation

alexhernandezgarcia commented Mar 19, 2024 • edited Loading

Copies of environments

Detachment of environment and proxy

Dropping convention of range of values in the proxies

To-do list

alexhernandezgarcia commented May 17, 2024

alexhernandezgarcia commented May 17, 2024 • edited Loading

carriepl left a comment

Choose a reason for hiding this comment

carriepl May 16, 2024

Choose a reason for hiding this comment

alexhernandezgarcia May 21, 2024

Choose a reason for hiding this comment

alexhernandezgarcia May 21, 2024

Choose a reason for hiding this comment

alexhernandezgarcia commented May 21, 2024

carriepl commented May 21, 2024

alexhernandezgarcia commented May 21, 2024 via email

carriepl left a comment

Choose a reason for hiding this comment

alexhernandezgarcia commented Mar 19, 2024 •

edited

Loading

alexhernandezgarcia commented May 17, 2024 •

edited

Loading