-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make the Validation Use TabPFN Sklearn Interface #18
Comments
@LennartPurucker is there any progress on that. I might have some time would be happy to tackle this. |
Give me a day to share the code I created for this at some point, and then you could bootstrap based on that! |
@iivalchev it seems I already did this here: https://github.com/LennartPurucker/finetune_tabpfn_v2/blob/stop_ot_testing/finetuning_scripts/training_utils/validation_utils.py#L62 But only for binary classification and very hacky but this is how I used it. (@AlexanderPfefferle also take a look at this) |
@LennartPurucker thanks will take a look! |
@LennartPurucker let's see if I read the changes right. In I like the practicality of the current approach. I was thinking is it possible to extract the sklearn preprocessing and invoke it on its own. However this could be fragile I guess using the standard TabPFN model api is safer. What needs to happen next?
Few things puzzle me.
|
That should be possible as well. I know @MagnusBuehler did something like this.
You can ignore these things. The branch has some "secret" research code I tested. Ideally, I suggest creating a new branch and only extracting the |
@LennartPurucker thank you for the clarifications. I would then cherry pick just the portions related to validating against the full-blown TabPFN model ignoring the rest, so no additional validation data sets will be needed. Will go for the simple solution of not trying to extract the pipeline and glueing it to the fine-tuning. If that happens to not work well will dig into how to improve it. |
Sounds good, let me know when I should take a look at a draft PR or similar! |
I have extracted the preprocessing into a separate class so that it can be easily applied to the finetuning data. I am happy to share this code if you are interested. @LennartPurucker I have seen that in the validation code n_estimators is fixed to 1. Is there a reason for this? (
|
Mostly for the sake of speed 🤔 |
iivalchev#1 started porting the initial changes by @LennartPurucker. Needs to be tested. @MagnusBuehler that would be great if you can share, but is the change in the main TabPFN codebase? Also some fine-tunning effort is being done here as well: PriorLabs/TabPFN#273 FYI |
I am aware of this, but the primary focus seems to be on being able to fine-tune the sklearn interface. Last time I checked, the code disabled preprocessing to achieve this. It is orthogonal to this code base in the sense that the PR does not focus on the training but on making it trainable, so to speak. |
@iivalchev Here is a slightly modified version, which I use (https://gist.github.com/MagnusBuehler/3e62613b7f0f7556eacb16653e584533) |
@MagnusBuehler thanks! |
See the title; I have to push this code to the main branch at some point.
Right now, we validate without the preprocessing and ensembling of the sklearn interface.
Ideally, we want to check if finetuning improves over this baseline and not just over no preprocessing or ensembling.
The text was updated successfully, but these errors were encountered: