From 1e48c13e893f6f013887787149a4d167dcae3a37 Mon Sep 17 00:00:00 2001 From: rasbt Date: Sun, 8 Sep 2024 15:49:44 -0500 Subject: [PATCH] update gpt-2 paper link --- ch04/01_main-chapter-code/ch04.ipynb | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/ch04/01_main-chapter-code/ch04.ipynb b/ch04/01_main-chapter-code/ch04.ipynb index e8439595..03a575bb 100644 --- a/ch04/01_main-chapter-code/ch04.ipynb +++ b/ch04/01_main-chapter-code/ch04.ipynb @@ -106,7 +106,7 @@ "source": [ "- In previous chapters, we used small embedding dimensions for token inputs and outputs for ease of illustration, ensuring they fit on a single page\n", "- In this chapter, we consider embedding and model sizes akin to a small GPT-2 model\n", - "- We'll specifically code the architecture of the smallest GPT-2 model (124 million parameters), as outlined in Radford et al.'s [Language Models are Unsupervised Multitask Learners](https://d4mucfpksywv.cloudfront.net/better-language-models/language_models_are_unsupervised_multitask_learners.pdf) (note that the initial report lists it as 117M parameters, but this was later corrected in the model weight repository)\n", + "- We'll specifically code the architecture of the smallest GPT-2 model (124 million parameters), as outlined in Radford et al.'s [Language Models are Unsupervised Multitask Learners](https://www.semanticscholar.org/paper/Language-Models-are-Unsupervised-Multitask-Learners-Radford-Wu/9405cc0d6169988371b2755e573cc28650d14dfe) (note that the initial report lists it as 117M parameters, but this was later corrected in the model weight repository)\n", "- Chapter 6 will show how to load pretrained weights into our implementation, which will be compatible with model sizes of 345, 762, and 1542 million parameters" ] }, @@ -1271,7 +1271,7 @@ "id": "309a3be4-c20a-4657-b4e0-77c97510b47c", "metadata": {}, "source": [ - "- Exercise: you can try the following other configurations, which are referenced in the [GPT-2 paper](https://d4mucfpksywv.cloudfront.net/better-language-models/language_models_are_unsupervised_multitask_learners.pdf), as well.\n", + "- Exercise: you can try the following other configurations, which are referenced in the [GPT-2 paper](https://www.semanticscholar.org/paper/Language-Models-are-Unsupervised-Multitask-Learners-Radford-Wu/9405cc0d6169988371b2755e573cc28650d14dfe), as well.\n", "\n", " - **GPT2-small** (the 124M configuration we already implemented):\n", " - \"emb_dim\" = 768\n",