Make the Validation Use TabPFN Sklearn Interface #18

LennartPurucker · 2025-03-20T15:15:17Z

See the title; I have to push this code to the main branch at some point.

Right now, we validate without the preprocessing and ensembling of the sklearn interface.
Ideally, we want to check if finetuning improves over this baseline and not just over no preprocessing or ensembling.

iivalchev · 2025-04-08T19:42:05Z

@LennartPurucker is there any progress on that. I might have some time would be happy to tackle this.

LennartPurucker · 2025-04-09T06:56:59Z

Give me a day to share the code I created for this at some point, and then you could bootstrap based on that!

LennartPurucker · 2025-04-10T09:38:15Z

@iivalchev it seems I already did this here: https://github.com/LennartPurucker/finetune_tabpfn_v2/blob/stop_ot_testing/finetuning_scripts/training_utils/validation_utils.py#L62

But only for binary classification and very hacky but this is how I used it.

(@AlexanderPfefferle also take a look at this)

iivalchev · 2025-04-10T18:21:25Z

@LennartPurucker thanks will take a look!

iivalchev · 2025-04-11T11:03:03Z

@LennartPurucker let's see if I read the changes right. In validation_utils.py use_native_validation if True will skip the sklearn preprocessing and retain the current validation behavior. Otherwise it will fit/predict on the full TabPFN with sklearn preprocessing done.

I like the practicality of the current approach. I was thinking is it possible to extract the sklearn preprocessing and invoke it on its own. However this could be fragile I guess using the standard TabPFN model api is safer.

What needs to happen next?

support regression
make validation method configurable from the fine-tunning entry point
testing

Few things puzzle me.

from autogluon.core.utils.early_stopping import ESWrapperOOF can't find this also is autogluon dependency desirable?
this line looks funny es_wrapper_oof.update(y=y_val, y_score=y_pred_proba_val, cur_round=0, y_pred_proba=y_pred_proba_val)
Why did you add X_test and X_ubiased_val data sets? Shouldn't one be sufficient?

LennartPurucker · 2025-04-11T13:30:05Z

I like the practicality of the current approach. I was thinking is it possible to extract the sklearn preprocessing and invoke it on its own. However this could be fragile I guess using the standard TabPFN model api is safer.

That should be possible as well. I know @MagnusBuehler did something like this.

Few things puzzle me.

You can ignore these things. The branch has some "secret" research code I tested. Ideally, I suggest creating a new branch and only extracting the use_native_validation logic.

iivalchev · 2025-04-12T09:23:29Z

@LennartPurucker thank you for the clarifications. I would then cherry pick just the portions related to validating against the full-blown TabPFN model ignoring the rest, so no additional validation data sets will be needed. Will go for the simple solution of not trying to extract the pipeline and glueing it to the fine-tuning. If that happens to not work well will dig into how to improve it.

LennartPurucker · 2025-04-12T10:39:10Z

Sounds good, let me know when I should take a look at a draft PR or similar!

MagnusBuehler · 2025-04-14T06:48:58Z

That should be possible as well. I know @MagnusBuehler did something like this.

I have extracted the preprocessing into a separate class so that it can be easily applied to the finetuning data. I am happy to share this code if you are interested.

@LennartPurucker I have seen that in the validation code n_estimators is fixed to 1. Is there a reason for this? (

finetune_tabpfn_v2/finetuning_scripts/training_utils/validation_utils.py

Line 142 in a1be33e

    
           clf = TabPFNClassifier(model_path=save_path,n_estimators=1, categorical_features_indices=categorical_features_indices, device="cuda")

)

LennartPurucker · 2025-04-14T07:08:04Z

Mostly for the sake of speed 🤔

iivalchev · 2025-04-14T07:14:38Z

iivalchev#1 started porting the initial changes by @LennartPurucker. Needs to be tested.

@MagnusBuehler that would be great if you can share, but is the change in the main TabPFN codebase?

Also some fine-tunning effort is being done here as well: PriorLabs/TabPFN#273 FYI

LennartPurucker · 2025-04-14T07:25:08Z

Also some fine-tunning effort is being done here as well: PriorLabs/TabPFN#273 FYI

I am aware of this, but the primary focus seems to be on being able to fine-tune the sklearn interface. Last time I checked, the code disabled preprocessing to achieve this.

It is orthogonal to this code base in the sense that the PR does not focus on the training but on making it trainable, so to speak.

MagnusBuehler · 2025-04-14T07:47:04Z

@iivalchev Here is a slightly modified version, which I use (https://gist.github.com/MagnusBuehler/3e62613b7f0f7556eacb16653e584533)
I replicated the default sklearn parameters to align the preprocessing with the default TabPFNClassifier. One detail is that the set n_estimators have an effect on how many different augmented views of the data set are generated. (with n_estimators=4, 4 different xs,ys pairs are generated).

iivalchev · 2025-04-14T08:14:37Z

@MagnusBuehler thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make the Validation Use TabPFN Sklearn Interface #18

Make the Validation Use TabPFN Sklearn Interface #18

LennartPurucker commented Mar 20, 2025

iivalchev commented Apr 8, 2025

LennartPurucker commented Apr 9, 2025

LennartPurucker commented Apr 10, 2025

iivalchev commented Apr 10, 2025

iivalchev commented Apr 11, 2025

LennartPurucker commented Apr 11, 2025

iivalchev commented Apr 12, 2025

LennartPurucker commented Apr 12, 2025

MagnusBuehler commented Apr 14, 2025

LennartPurucker commented Apr 14, 2025

iivalchev commented Apr 14, 2025 •

edited

Loading

LennartPurucker commented Apr 14, 2025

MagnusBuehler commented Apr 14, 2025

iivalchev commented Apr 14, 2025

Make the Validation Use TabPFN Sklearn Interface #18

Make the Validation Use TabPFN Sklearn Interface #18

Comments

LennartPurucker commented Mar 20, 2025

iivalchev commented Apr 8, 2025

LennartPurucker commented Apr 9, 2025

LennartPurucker commented Apr 10, 2025

iivalchev commented Apr 10, 2025

iivalchev commented Apr 11, 2025

LennartPurucker commented Apr 11, 2025

iivalchev commented Apr 12, 2025

LennartPurucker commented Apr 12, 2025

MagnusBuehler commented Apr 14, 2025

LennartPurucker commented Apr 14, 2025

iivalchev commented Apr 14, 2025 • edited Loading

LennartPurucker commented Apr 14, 2025

MagnusBuehler commented Apr 14, 2025

iivalchev commented Apr 14, 2025

iivalchev commented Apr 14, 2025 •

edited

Loading