-
-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Neurips24 #1970
Neurips24 #1970
Conversation
Fixed tiny_tournesol.zip file for testing. Added data_analysis for dataset submission. WIP Runtime error on icml24 experiments to be fixed
…than additional term. This implies that the addition of a new user with huge uncertainties will not affect the quantile much.
if quantile == 0.5: | ||
return regularization + forces.sum() | ||
|
||
left_strength = min(1.0, quantile / (1-quantile)) | ||
right_strength = min(1.0, (1-quantile) / quantile) | ||
|
||
forces = np.where( | ||
forces < 0, | ||
forces * left_strength, | ||
forces * right_strength, | ||
) | ||
|
||
return regularization + quantile_term + forces.sum() | ||
return regularization + forces.sum() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lenhoanglnh This change seems to change significantly the behaviour of the "zero shift" on current Tournesol data. Is it expected? Should we adjust the quantile parameter?
On "main", after applying the shift with score_shift_quantile = 0.15
, about 13% of the individual scores are negative. On this branch "neurips24", that would be 37%.
As a consequence the distribution of Tournesol would be modified, with fewer videos reaching the recommendability threshold (1238 instead of 3013).
(I used the "legacy2023" pipeline, currently deployed on production. But I expect it would similar with the new pipeline).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is unsatisfactory indeed.
I'm a bit disturbed. It feels like the quantile is now poorly estimated.
Maybe this is because videos with lower scores have higher uncertainty? Or less trust?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK I looked at the data and indeed, the uncertainties for bad videos are smaller than for good videos, which explains why the quantile increased with the new quantile definition. I see two simple fixes:
- Reduce
score_shift_quantile = 0.15
toscore_shift_quantile = 0.05
. - Remove uncertainties in quantile estimation.
The former is much more satisfactory.
… of neg. log likelihood by 1 (#1973) --------- Co-authored-by: Louis Faucon <lpfaucon@gmail.com>
…ournesol into solidago-pipeline-docs-1
[solidago] Update docstrings and add simple API for `Pipeline`
…nge_threshold in gbt args
…_score different from 0
…tent with existing tournesol tests
Description
Initially the goal was mostly data analysis and experiments for NeurIPS 24 submissions.
However, this has spurred two changes to Solidago source files:
Checklist
❤️ Thank you for your contribution!