-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CUDA error #17
Comments
Heyho! As far as I know from my experience of encountering CUDA VRAM issues, How many features do you have? The VRAM requirements scale quadratically (i.e., very expensive) in terms of the number of features and samples. Moreover, for the backprop this becomes even more expensive. So, if you have either a lot of samples or a lot of features, you will quickly run into VRAM issues. If you have a small matrix (small number of cells, whereby cells = number of features x number of samples), then it could be a VRAM leakage problem, but I would not be aware of a problem in my code for this. To reduce VRAM, you could try to train with even smaller precision or use more GPUs. However, I have not explored many options in this regard so far.
Indeed, this is a known bug / intended behavior for sklearn splitting, and I have not used more clever splitting techniques in this repo. I would recommend manually editing the splitting strategy or folds for now (e.g., see the |
Hi Lennart, The number of features varies across datasets, but in all cases, they are under 500, as I filtered them to remain within TabPFN's recommended limits. For personal learning, I created a smaller version of this repository focused specifically on classification tasks following your code and logic, (here a reference if you want to take a peak: https://github.com/mike-bioinf/itertabpfn/tree/master/src/itertabpfn/finetune ), and strangely the issue does not occur with this implementation. One difference that comes to my mind is that I used the classic AdamW optimizer instead of the free-scheduled since i am unfamiliar with it. However, I’m unsure if this has anything to do with the problem. Anyway I’ll dig further in the matter and will come back to you if i find something new. Thanks for your time! |
Nice, great to hear.
No, that makes sense. The implementation of schedulefree is likely not on a PyTorch-like quality level so far. Thus, this might be a bug for them. Or there is a problem with the requirements or platform, as schedulefree is more specific. |
Hello,
I tried the fine-tuning procedure on different datasets and noticed that it fails for the larger ones (starting from around 600 samples). Initially, I was using a previous commit of this package, and the error I encountered was:
Then cloning and using the actual version i obtain the following :
I tried also with very small batch sizes but the problem remains.
In addition, I encountered a second issue related to the batch division process for small datasets. From my understanding (sorry if this is not correct) the data loader uses a sklearn stratified splitter with a fixed number of 10 folds regardless of the specified batch size. However, for small datasets, this can be problematic since the smallest class may not have at least 10 observations, and sklearn enforce this.
The text was updated successfully, but these errors were encountered: