diff --git a/Alvaro_Menendez_CV.pdf b/Alvaro_Menendez_CV.pdf new file mode 100644 index 0000000..0ef2da0 Binary files /dev/null and b/Alvaro_Menendez_CV.pdf differ diff --git a/blogs.html b/blogs.html new file mode 100644 index 0000000..bc0354b --- /dev/null +++ b/blogs.html @@ -0,0 +1,39 @@ + + + + + + + + + Blog Posts - Álvaro Menéndez + + + + + + +
+

Blog Posts

+ +
+ +
+
+ + +
+
+ + \ No newline at end of file diff --git a/blogs/GPT/image 1.png b/blogs/GPT/image 1.png new file mode 100644 index 0000000..20c9295 Binary files /dev/null and b/blogs/GPT/image 1.png differ diff --git a/blogs/GPT/image 2.png b/blogs/GPT/image 2.png new file mode 100644 index 0000000..16c8cfb Binary files /dev/null and b/blogs/GPT/image 2.png differ diff --git a/blogs/GPT/image.png b/blogs/GPT/image.png new file mode 100644 index 0000000..9cc46c9 Binary files /dev/null and b/blogs/GPT/image.png differ diff --git a/blogs/GPT/index.html b/blogs/GPT/index.html new file mode 100644 index 0000000..cd5cd3c --- /dev/null +++ b/blogs/GPT/index.html @@ -0,0 +1,161 @@ + + + + + + + + + GPT Models and PyTorch - Álvaro Menéndez + + + + + + + + +
+

Blog Posts

+ +
+ +
+
+

Humble introduction to GPT models and PyTorch

+ +

In this article I will go through the simplest GPT implementation made by Andrej Karpathy, you can find all the code here. Only prerequisite is to code more or less good in python.

+ +

Section 1. Overall structure

+ +

When building a GPT model (or any other Neural model) we are going to use the nn module from pytorch library:

+ +
import torch.nn as nn
+ +

The basic structure of our model will have to look like the following (Meaning you simply copy paste this and then you can add things on top. Later on you'll understand why the super.init is necessary):

+ +
class MyModel(nn.Module):
+    def __init__(self):
+        # Always call parent's init first
+        super().__init__()
+        # Define layers here
+        
+    def forward(self, x):
+        # Define forward pass
+        return x
+ +

The reason why we will always inherit the nn.module is because this allows us to create many different building blocks for our model. Actually, the building blocks CAN be (and will be) more things apart from models. For example: layers, activation functions, loss functions, etc.. everything that can make up a model. Then, in our main model, we will include those when initialising it:

+ +
class MicroGPT(nn.Module):
+    def __init__(self):
+        super().__init__()
+        self.token_embedding_table = nn.Embedding(vocab_size, n_embd)
+        self.position_embedding_table = nn.Embedding(block_size, n_embd)
+        self.blocks = nn.Sequential(*[Block(n_embd, n_head=n_head) for _ in range(n_layer)])
+        self.ln_f = nn.LayerNorm(n_embd) # final layer norm
+        self.lm_head = nn.Linear(n_embd, vocab_size)
+ +

Now, simply focus on the bold nn's. The fact that we are assigning to our model some components that inherit the nn module, allows us to get the overall model parameters as follows:

+ +
for name, module in model.named_children():
+    params = sum(p.numel() for p in module.parameters())
+    print(f"{name}: {params/1e6}M parameters")
+ +

This will print something like:

+ +
token_embedding_table: 0.00416M parameters
+position_embedding_table: 0.002048M parameters
+blocks: 0.199168M parameters
+ln_f: 0.000128M parameters
+lm_head: 0.004225M parameters
+ +

So, when they say a model has 100M parameters, they mean that the sum of all smaller components of the bigger model, sum up to that amount. Now, if you had forgotten the super().__init__(), it wouldn't track the parameters for that component.

+ +

Another important part about this modularization is that now you should more or less understand these scary diagrams:

+ +
+ Original transformer architecture +
Original transformer architecture from 'Attention Is All You Need'
+
+ +

As the caption says, the above diagram shows a representation of the original transformer model, that uses a encoder-decoder architecture, however, GPT models, DONT exactly use this architecture, but rather only the right part (the decoding part). Its architecture looks like this instead:

+ +
+ GPT architecture +
Retrieved from ericjwang.com/2023/01/22/transformers.html
+
+ +

Where the big block with a 'Nx' to the right is the transformer block and its repeated N times. Much better! Another thing we still need to cover is, how to interpret arrows: "What does it mean having a line from Outputs to Output Embedding" or even explaining what the diagram itself represents.

+ +

To answer these questions, let's go to the part in the notebook in the 'Full finished code, for reference' block, inside the 'BigramLanguageModel' class (which actually is now a GPT model instead!) I will copy paste the code here:

+ +
# Actually its a GPT model not a bigram!
+class BigramLanguageModel(nn.Module):
+
+    def __init__(self):
+        super().__init__()
+        # each token directly reads off the logits for the next token from a lookup table
+        self.token_embedding_table = nn.Embedding(vocab_size, n_embd)
+        self.position_embedding_table = nn.Embedding(block_size, n_embd)
+        self.blocks = nn.Sequential(*[Block(n_embd, n_head=n_head) for _ in range(n_layer)])
+        self.ln_f = nn.LayerNorm(n_embd) # final layer norm
+        self.lm_head = nn.Linear(n_embd, vocab_size)
+
+    def forward(self, idx, targets=None):
+        B, T = idx.shape
+
+        # idx and targets are both (B,T) tensor of integers
+        tok_emb = self.token_embedding_table(idx) # (B,T,C)
+        pos_emb = self.position_embedding_table(torch.arange(T, device=device)) # (T,C)
+        x = tok_emb + pos_emb # (B,T,C)
+        x = self.blocks(x) # (B,T,C)
+        x = self.ln_f(x) # (B,T,C)
+        logits = self.lm_head(x) # (B,T,vocab_size)
+
+        if targets is None:
+            loss = None
+        else:
+            B, T, C = logits.shape
+            logits = logits.view(B*T, C)
+            targets = targets.view(B*T)
+            loss = F.cross_entropy(logits, targets)
+
+        return logits, loss
+
+    def generate(self, idx, max_new_tokens):
+        # idx is (B, T) array of indices in the current context
+        for _ in range(max_new_tokens):
+            # crop idx to the last block_size tokens
+            idx_cond = idx[:, -block_size:]
+            # get the predictions
+            logits, loss = self(idx_cond)
+            # focus only on the last time step
+            logits = logits[:, -1, :] # becomes (B, C)
+            # apply softmax to get probabilities
+            probs = F.softmax(logits, dim=-1) # (B, C)
+            # sample from the distribution
+            idx_next = torch.multinomial(probs, num_samples=1) # (B, 1)
+            # append sampled index to the running sequence
+            idx = torch.cat((idx, idx_next), dim=1) # (B, T+1)
+        return idx
+ +

Okay, there are many things to grasp here. But first, note how the model inhertits the nn.Module and follows the format I showed you in the beggining. The 'generate' method is used to generate new tokens based on the current ones (More on that later).

+ +

The 'forward' method is used to pass an input through our model (or component). Note: self(x) is the same as self.forward(x). If we look at the forward method and diagram more closely, we can see that they actually represents the 'path' of the input from the beginning up until generating the logits! We still haven't covered the logits yet, but they are used to get the probabilities of the following words. Note that the code doesn't completely follow the structure but it's stil a valid implementation of the GPT architecture, just organized slightly differently for practical considerations.

+
+ GPT implementation diagram + +
+ +

In the following sections, we will go through each component of the model and better understand each part by itself, but by now it should be much easier to go through similar models by yourself!

+
+
+ + \ No newline at end of file diff --git a/index.html b/index.html index 264897c..b58cc53 100644 --- a/index.html +++ b/index.html @@ -20,6 +20,9 @@

Álvaro Menéndez

+
diff --git a/styles.css b/styles.css index 7aa0950..74a7dbd 100644 --- a/styles.css +++ b/styles.css @@ -35,7 +35,7 @@ main { /* Header styles */ header { max-width: var(--container-width); - margin: 0 auto 3rem auto; + margin: 0 auto 1rem auto; text-align: center; } @@ -49,7 +49,8 @@ header h1 { nav { display: flex; justify-content: center; - gap: 2rem; + gap: 0.1rem; + margin-bottom: 2rem; } nav a { @@ -67,28 +68,69 @@ nav a:hover, nav a.active { border-bottom: 2px solid var(--accent-color); } -/* Blog preview styles */ +.nav-link { + color: var(--secondary-color); + text-decoration: none; + font-size: 0.95rem; + font-weight: 500; + padding: 0.4rem 0; + position: relative; + transition: color 0.2s ease; +} + +.nav-link::after { + content: ''; + position: absolute; + width: 100%; + height: 1px; + bottom: 0; + left: 0; + background-color: var(--accent-color); + transform: scaleX(0); + transform-origin: right; + transition: transform 0.3s ease; +} + +.nav-link:hover { + color: var(--text-color); +} + +.nav-link:hover::after { + transform: scaleX(1); + transform-origin: left; +} + +/* Blog list styles */ +.blog-list { + display: flex; + flex-direction: column; + gap: 2rem; +} + .blog-preview { - margin-bottom: 3rem; - padding-bottom: 3rem; - border-bottom: 1px solid #e1e1e1; + background: var(--card-bg); + padding: 1.5rem; + border-radius: 12px; + box-shadow: var(--card-shadow); + transition: transform 0.3s ease, box-shadow 0.3s ease; } -.blog-preview:last-child { - border-bottom: none; +.blog-preview:hover { + transform: translateY(-5px); + box-shadow: var(--hover-shadow); } -.blog-preview h2 { - margin-bottom: 0.5rem; +.blog-preview h2, +.blog-preview h2:first-child { + margin: 0 0 0.5rem 0 !important; + font-size: 1.8rem; + letter-spacing: -0.5px; } .blog-preview h2 a { - text-decoration: none; color: var(--text-color); - font-size: 1.5rem; - font-weight: 700; - letter-spacing: -0.5px; - transition: color 0.2s ease; + text-decoration: none; + transition: color 0.3s ease; } .blog-preview h2 a:hover { @@ -96,9 +138,20 @@ nav a:hover, nav a.active { } .post-meta { - color: var(--secondary-color); + color: var(--accent-color); font-size: 0.9rem; margin-bottom: 1rem; + font-weight: 500; +} + +.blog-preview p { + color: var(--secondary-color); + margin-bottom: 1.5rem; + line-height: 1.6; +} + +.read-more:hover { + transform: translateX(5px); } /* Blog post styles */ @@ -205,10 +258,30 @@ footer { transform: translateY(-2px); } +.social-icon:hover { + fill: var(--accent-color); + transform: translateY(-2px); +} + +.social-icon-text { + color: var(--secondary-color); + text-decoration: none; + font-size: 1rem; + font-weight: 500; + transition: all 0.2s ease; + text-transform: lowercase; +} + +.social-icon-text:hover { + color: var(--accent-color); + transform: translateY(-2px); + display: inline-block; +} + /* Timeline styles */ .timeline { max-width: var(--container-width); - margin: 0 auto; + margin: 3rem auto; position: relative; padding: 0 0 2rem 0; } @@ -366,4 +439,223 @@ footer { strong { color: var(--accent-color); font-weight: 600; -} \ No newline at end of file +} + +.blog-container { + max-width: 800px; + margin: 2rem auto; + padding: 0 1rem; + color: var(--text-color); +} + +.blog-container article { + margin-bottom: 4rem; +} + +.blog-container h1 { + font-size: 2.5rem; + margin-bottom: 2rem; + letter-spacing: -0.5px; +} + +.blog-container h2 { + font-size: 1.8rem; + margin: 3rem 0 1.5rem; + letter-spacing: -0.3px; +} + +.blog-container h3 { + font-size: 1.4rem; + margin: 2rem 0 1rem; +} + +.blog-container p { + color: var(--secondary-color); + margin-bottom: 1.5rem; + line-height: 1.8; + font-size: 1.1rem; +} + +.blog-container a { + color: var(--accent-color); + text-decoration: none; + border-bottom: 1px solid transparent; + transition: border-color 0.3s; +} + +.blog-container a:hover { + border-bottom-color: var(--accent-color); +} + +.blog-container figure { + margin: 2rem 0; +} + +.blog-container figure img { + max-width: 100%; + height: auto; + border-radius: 8px; + box-shadow: var(--card-shadow); + display: block; + margin: 0 auto; +} + +.blog-container figcaption { + text-align: center; + color: var(--secondary-color); + font-size: 0.9rem; + margin-top: 1rem; + font-style: italic; +} + +.blog-container code { + font-family: 'Fira Mono', monospace; + background: var(--card-bg); + padding: 0.2em 0.4em; + border-radius: 4px; + font-size: 0.9em; + color: var(--accent-color); +} + +.blog-container pre { + background: var(--card-bg); + padding: 1.5rem; + border-radius: 8px; + overflow-x: auto; + margin: 1.5rem 0; + box-shadow: var(--card-shadow); +} + +.blog-container pre code { + background: none; + padding: 0; + font-family: 'Fira Mono', monospace; + font-size: 0.95rem; + line-height: 1.6; +} + +/* Custom Prism theme */ +code[class*="language-"], +pre[class*="language-"] { + color: var(--text-color); + text-shadow: none; + font-family: 'Fira Mono', monospace; + font-size: 0.95rem; + line-height: 1.6; + white-space: pre; + word-spacing: normal; + word-break: normal; + word-wrap: normal; + tab-size: 4; + hyphens: none; +} + +.token.comment, +.token.prolog, +.token.doctype, +.token.cdata { + color: #636e7b; +} + +.token.punctuation { + color: var(--secondary-color); +} + +.token.property, +.token.tag, +.token.constant, +.token.symbol, +.token.deleted { + color: #ef4444; +} + +.token.boolean, +.token.number { + color: #c084fc; +} + +.token.selector, +.token.attr-name, +.token.string, +.token.char, +.token.builtin, +.token.inserted { + color: #22c55e; +} + +.token.operator, +.token.entity, +.token.url, +.language-css .token.string, +.style .token.string { + color: var(--accent-color); + background: none; +} + +.token.atrule, +.token.attr-value, +.token.keyword { + color: var(--accent-color); +} + +.token.function, +.token.class-name { + color: #f59e0b; +} + +.token.regex, +.token.important, +.token.variable { + color: #ec4899; +} + +.token.important, +.token.bold { + font-weight: bold; +} + +.token.italic { + font-style: italic; +} + +.blog-container em { + color: var(--accent-color); + font-style: italic; +} + +.blog-container strong { + color: var(--accent-color); + font-weight: 600; +} + +.download-cv { + display: inline-block; + padding: 0.75rem 1.5rem; + background-color: #333; + color: white; + text-decoration: none; + border-radius: 4px; + margin: 1rem 0; + transition: background-color 0.3s; +} + +.download-cv:hover { + background-color: #555; +} + +.blog-container pre code.language-python { + color: var(--text-color); + font-family: 'Fira Mono', monospace; + font-size: 0.95rem; + line-height: 1.6; +} + +/* Python syntax highlighting */ +.language-python .keyword { color: #ff79c6; } /* Python keywords */ +.language-python .builtin { color: #8be9fd; } /* Built-in functions */ +.language-python .string { color: #f1fa8c; } /* Strings */ +.language-python .number { color: #bd93f9; } /* Numbers */ +.language-python .comment { color: #6272a4; } /* Comments */ +.language-python .operator { color: #ff79c6; } /* Operators */ +.language-python .function { color: #50fa7b; } /* Function names */ +.language-python .class { color: #8be9fd; } /* Class names */ \ No newline at end of file