DumbleLLM - Custom Large Language Model

This repository contains the code for a decoder-only transformer, similar to Llama or GPT. It was trained on an English corpus built from the seven Harry Potter books and has roughly 75M trainable parameters.

Technical Features

Tokenization: Byte pair encoding (sentencepiece)
FlashAttention, Grouped Query Attention
Rotary Position Embeddings
Key Value Cache
Sampling: top-p, top-k

Training Configuration

Parameter	Value
Layer	4
Model Dimension	768
Context Length	1024
Attention Heads	8
Key/Value Heads	4
Vocabulary Size	32000
RoPE Theta	10000

Roadmap

Example Prompts

TODO

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
config.py		config.py
data_loader.py		data_loader.py
logger.py		logger.py
pyproject.toml		pyproject.toml
tokenizer.py		tokenizer.py
train.py		train.py
transformer_model.py		transformer_model.py
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DumbleLLM - Custom Large Language Model

Technical Features

Training Configuration

Roadmap

Example Prompts

About

Languages

LukasDrews97/DumbleLLM

Folders and files

Latest commit

History

Repository files navigation

DumbleLLM - Custom Large Language Model

Technical Features

Training Configuration

Roadmap

Example Prompts

About

Topics

Resources

Stars

Watchers

Forks

Languages