Skip to content

Latest commit

 

History

History
35 lines (27 loc) · 1.04 KB

README.md

File metadata and controls

35 lines (27 loc) · 1.04 KB

DumbleLLM - Custom Large Language Model

This repository contains the code for a decoder-only transformer, similar to Llama or GPT. It was trained on an English corpus built from the seven Harry Potter books and has roughly 75M trainable parameters.

Technical Features

  • Tokenization: Byte pair encoding (sentencepiece)
  • FlashAttention, Grouped Query Attention
  • Rotary Position Embeddings
  • Key Value Cache
  • Sampling: top-p, top-k

Training Configuration

Parameter Value
Layer 4
Model Dimension 768
Context Length 1024
Attention Heads 8
Key/Value Heads 4
Vocabulary Size 32000
RoPE Theta 10000

Roadmap

  • Grouped Query Attention
  • Rotary Position Embeddings
  • Key Value Cache
  • Distributed training
  • Finetuning with (Q)LoRA
  • Add Mixture of Experts model

Example Prompts

TODO