Neural Network, Backpropagation, and Transformer Decoder
neural-networks derivatives gardient-descent masked-attention positional-encoding- transformer-decoder-model
-
Updated
Mar 19, 2024 - Jupyter Notebook