-
Notifications
You must be signed in to change notification settings - Fork 61
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to train a BERT model from scratch #27
Comments
We don't have any example code for this, but it is possible. You'll need to do a few things:
It is worth being aware of the conclusions in section 5 of RoBERTa, particularly for setting up the pre-training tasks in step 2. There's definitely a lot of work here, so I think we should keep this issue open as an enhancement to add a pre-training script. |
The given example uses bert model from a pretrained struct. |
If you can use the same initializer for every parameter then the quickest thing you can do is something like:
This is a little limited if you need to do something like use different initializers for the embeddings, linear layers, layer norms, etc. For that case I would write a suite of functions to initialize the struct, that might start like:
You have to implement |
Thanks, it is clear. As the bert model has been trained, there should include the model creating scripts. I wonder why the demo does not give the function for model creating. More generally, if we create a different transformer model, should we need to implement something like createParameterStruct()? Demos including creating general transformer model functions would be helpful for popularizing. |
For the record, we didn't train this ourselves, we imported the pre-trained weights for the original BERT models. That's why we didn't need to initialize the model ourselves, and don't have a nice pre-training demo. I agree it would be nice for us to add initializer functions for the parameters that the layers in Could you describe what you mean by a general transformer? I know of the BERT encoder-only type, GPT-2 decoder-only type, and encoder-decoder type like the original. Is there something else beyond those? |
sorry for the unclear, "general transformer" means to create custom transformer models using the basic modules just like creating kinds of cnn networks. |
How can I train a BERT model from scratch?
The text was updated successfully, but these errors were encountered: