[Feat] Decoding refactoring #152

fedebotu · 2024-04-05T20:35:02Z

Description

Several refactorings and features made for decoding in RL4CO.

[Refactoring] Now models return logits by default (e.g. here). We do so since logits represent the raw outputs from the model, and we would like to decouple the modeling part to how we sample distributions. The function handling the transfer from logits to probabilities (hence the "log_p") is logit_to_probs
[Feat] New decoding strategy: we introduce nucleus sampling (i.e. top-p sampling) which discards from the distribution values under a certain threshold in the CDF before sampling. This can be used by simply passing a top_p > 0 to the DecodingStrategy, i.e. to the model deooder. This is ubiquitous in LLMs and it is about time to have it!
[Refactoring]: for simplicity we now default to handling probabilities instead of log probabilities (example here). This is a minor change, but it can make the code more readable and avoid having to do logp.exp() when sampling. This is also more in line with recent works in e.g. LLM
[Refactoring, breaking change] now by default any mask has the same behavior (example here), i.e., the value 1 means keep (i.e. feasible action) while 0 means to remove, i.e. infeasible. This is both similar to TorchRL's action mask and importantly to PyTorch's scaled_dot_product_attention: "A boolean mask where a value of True indicates that the element should take part in attention. " (ref). For this reason, masks that used to have inconsistent namings now have the same behavior
[Minor] Rename LogitAttention to PointerAttention (for consistency with the Pointer mechanism in Vinyals et al., 2015)

Warning

Work in progress. Do not merge yet. Some checks and training still have some bugs that need to be fixed (most probably due to the new masking

Types of changes

New feature (non-breaking change which adds core functionality)
Breaking change (fix or feature that would cause existing functionality to change)

Checklist

My change requires a change to the documentation.
I have updated the tests accordingly (required for a bug fix or a new feature).
I have updated the documentation accordingly.

CC: @LTluttmann could you have a look if you spot some inefficiencies or if you have some ideas?
CC: @Furffico @cbhua these changess are what I was talking about yesterday (note that in this case running the softmax normalization inside the Sampling in ACO might not be needed)

…ies; default to probs instead of log_p

…on; default to returning logits instead of log_p

…ling strategies in decoding strategies

fedebotu · 2024-04-09T18:30:02Z

Closed in favor of #161

fedebotu added 7 commits April 5, 2024 13:57

[Naming] logit_attention -> pointer

f53f9a6

[Feat,Refactor] add nucleus sampling; move sampling params to strateg…

a8687ac

…ies; default to probs instead of log_p

[Feat,Refactor] add nucleus sampling; move sampling params to strateg…

96e6f2c

…ies; default to probs instead of log_p

[Refactor] move sampling out; rename LogitAttention to PointerAttenti…

ada0d2f

…on; default to returning logits instead of log_p

[Feat,Refactor] nucleus sampling; default returning logits; move samp…

af131a8

…ling strategies in decoding strategies

[Merge] main

ecb6622

[Feat,Refactor] nucleus sampling; default returning logits; move samp…

2c6984f

…ling strategies in decoding strategies

fedebotu requested review from LTluttmann, Furffico and cbhua and removed request for Furffico April 5, 2024 20:35

fedebotu mentioned this pull request Apr 9, 2024

[Feat, Refactor] Decoding: logits handling, new decoding strategies, standardize masks #161

Merged

4 tasks

fedebotu closed this Apr 9, 2024

fedebotu deleted the refactor-decoding branch September 3, 2024 05:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feat] Decoding refactoring #152

[Feat] Decoding refactoring #152

fedebotu commented Apr 5, 2024 •

edited

Loading

fedebotu commented Apr 9, 2024

[Feat] Decoding refactoring #152

[Feat] Decoding refactoring #152

Conversation

fedebotu commented Apr 5, 2024 • edited Loading

Description

Types of changes

Checklist

fedebotu commented Apr 9, 2024

fedebotu commented Apr 5, 2024 •

edited

Loading