Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
Several refactorings and features made for decoding in RL4CO.
logit_to_probs
top_p
> 0 to theDecodingStrategy
, i.e. to the model deooder. This is ubiquitous in LLMs and it is about time to have it!logp.exp()
when sampling. This is also more in line with recent works in e.g. LLMscaled_dot_product_attention
: "A boolean mask where a value of True indicates that the element should take part in attention. " (ref). For this reason, masks that used to have inconsistent namings now have the same behaviorLogitAttention
toPointerAttention
(for consistency with the Pointer mechanism in Vinyals et al., 2015)Warning
Work in progress. Do not merge yet. Some checks and training still have some bugs that need to be fixed (most probably due to the new masking
Types of changes
Checklist
CC: @LTluttmann could you have a look if you spot some inefficiencies or if you have some ideas?
CC: @Furffico @cbhua these changess are what I was talking about yesterday (note that in this case running the softmax normalization inside the
Sampling
in ACO might not be needed)