-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reproduce the results in the paper #17
Comments
Dear @HuuAnnnn, First of all, thank you for your interest in our work and for your report. Concerning the hyperparameters described in the paper we did mention that we used 4 layers which translates to 4 blocks in |
Thank you for your reply. I will try the way you suggested. I have another question: I can't find the GLU function or the CTMixer layer applied in the code as in the paper. Is the code missing the fusion model? |
I have checked the code and effectively I pushed an old experimental version that doesn't contain some part of the model I will push the full model soon. I apologize for any inconvenience. |
@Mouradost Is there an update on this? |
Dear Mr.@Mouradost,
I tried to reproduce the result reported in the paper, but I could not do it. I used the hyperparameters described in the paper to train the model many times. However, My model does not converge and the model's performance is bad.
I follow your instructions in issue #10 , I just train the STGM (w/o Estimator). I will give my configuration and the log of the latest training below. Thank you for taking the time to read and reply to me.
Hope to receive a response from you soon.
HuuAn,
My Configuration
Hardware
GPU: RTX 3080 Ti
RAM: 32GB
CPU: AMD Ryzen 9 5900X 12-Core Processor
OS: Linux
Adjustment
: Change AdamW to Adamconfig/config.yaml
config/trainers/default
config/models/stgm.yaml
Execution command
Traning log
The text was updated successfully, but these errors were encountered: